Re: Why are my files in in my list - os module used with sys argv

2016-04-19 Thread Sayth Renshaw
On Tuesday, 19 April 2016 23:46:01 UTC+10, Peter Otten  wrote:
> Sayth Renshaw wrote:
> 
> > Thanks for the insight, after doing a little reading I found this post
> > which uses both argparse and glob and attempts to cover the windows and
> > bash expansion of wildcards,
> > http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
> 
> I hope you read the comment section of that page carefully.
> On Linux your script's behaviour will be surprising.

Yes I have gone your way now and am parsing the files, where my data is going 
will have to wait till after I sleep.

Thanks for the advice.

from pyquery import PyQuery as pq
import pandas as pd
import argparse
# from glob import glob


parser = argparse.ArgumentParser(description=None)


def GetArgs(parser):
"""Parser function using argparse"""
# parser.add_argument('directory', help='directory use',
# action='store', nargs='*')
parser.add_argument("files", nargs="+")
return parser.parse_args()

fileList = GetArgs(parser)
print(fileList.files)
# d = pq(filename='20160319RHIL0_edit.xml')
data = []
attrs = ('id', 'horse')


for items in fileList.files:
d = pq(filename=items)
res = d('nomination')
dataSets = [[res.eq(i).attr(x)
 for x in attrs] for i in range(len(res))]
resultList = data.append(dataSets)

frames = pd.DataFrame(resultList)
print(frames)

--
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ python jqxml.py samples/*.xml
['samples/20160319RHIL0_edit.xml', 'samples/20160402RAND0.xml', 
'samples/20160409RAND0.xml', 'samples/20160416RAND0.xml']
Empty DataFrame
Columns: []
Index: []
(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ 

Thanks

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why are my files in in my list - os module used with sys argv

2016-04-19 Thread Peter Otten
Sayth Renshaw wrote:

> Thanks for the insight, after doing a little reading I found this post
> which uses both argparse and glob and attempts to cover the windows and
> bash expansion of wildcards,
> http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html

I hope you read the comment section of that page carefully.
On Linux your script's behaviour will be surprising.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why are my files in in my list - os module used with sys argv

2016-04-19 Thread Sayth Renshaw
On Tuesday, 19 April 2016 23:21:42 UTC+10, Sayth Renshaw  wrote:
> On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten  wrote:
> > Steven D'Aprano wrote:
> > 
> > > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> > > 
> > >> Hi
> > >> 
> > >> Why would it be that my files are not being found in this script?
> > > 
> > > You are calling the script with:
> > > 
> > > python jqxml.py samples *.xml
> > > 
> > > This does not do what you think it does: under Linux shells, the glob
> > > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > > no files in the current directory matching the glob *.xml, so it is not
> > > expanded and the arguments your script receives are:
> > > 
> > > 
> > > "python jqxml.py"  # not used
> > > 
> > > "samples"  # dir
> > > 
> > > "*.xml"  # mask
> > > 
> > > 
> > > You then call:
> > > 
> > > fileResult = filter(lambda x: x.endswith(mask), files)
> > > 
> > > which looks for file names which end with a literal string (asterisk, dot,
> > > x, m, l) in that order. You have no files that match that string.
> > > 
> > > At the shell prompt, enter this:
> > > 
> > > touch samples/junk\*.xml
> > > 
> > > and run the script again, and you should see that it now matches one file.
> > > 
> > > Instead, what you should do is:
> > > 
> > > 
> > > (1) Use the glob module:
> > > 
> > > https://docs.python.org/2/library/glob.html
> > > https://docs.python.org/3/library/glob.html
> > > 
> > > https://pymotw.com/2/glob/
> > > https://pymotw.com/3/glob/
> > > 
> > > 
> > > (2) When calling the script, avoid the shell expanding wildcards by
> > > escaping them or quoting them:
> > > 
> > > python jqxml.py samples "*.xml"
> > 
> > (3) *Use* the expansion mechanism provided by the shell instead of fighting 
> > it:
> > 
> > $ python jqxml.py samples/*.xml
> > 
> > This requires that you change your script
> > 
> > from pyquery import PyQuery as pq
> > import pandas as pd
> > import sys
> > 
> > fileResult = sys.argv[1:]
> > 
> > if not fileResult:
> >  print("no files specified")
> >  sys.exit(1)
> > 
> > for file in fileResult:
> > print(file)
> > 
> > for items in fileResult:
> > try:
> > d = pq(filename=items)
> > except FileNotFoundError as e:
> > print(e)
> > continue
> > res = d('nomination')
> > # you could move the attrs definition before the loop
> > attrs = ('id', 'horse')
> > # probably a bug: you are overwriting data on every iteration
> > data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> > 
> > I think this is the most natural approach if you are willing to accept the 
> > quirk that the script tries to process the file 'samples/*.xml' if the 
> > samples directory doesn't contain any files with the .xml suffix. Common 
> > shell tools work that way:
> > 
> > $ ls samples/*.xml
> > samples/1.xml  samples/2.xml  samples/3.xml
> > $ ls samples/*.XML
> > ls: cannot access samples/*.XML: No such file or directory
> > 
> > Unrelated: instead of working with sys.argv directly you could use argparse 
> > which is part of the standard library. The code to get at least one file is
> > 
> > import argparse
> > 
> > parser = argparse.ArgumentParser()
> > parser.add_argument("files", nargs="+")
> > args = parser.parse_args()
> > 
> > print(args.files)
> > 
> > Note that this doesn't fix the shell expansion oddity.
> 
> Hi
> 
> Thanks for the insight, after doing a little reading I found this post which 
> uses both argparse and glob and attempts to cover the windows and bash 
> expansion of wildcards, 
> http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
> 
> import argparse  
> from glob import glob  
>
> def main(file_names):  
> print file_names  
>
> if __name__ == "__main__":  
> parser = argparse.ArgumentParser()  
> parser.add_argument("file_names", nargs='*') 
> #nargs='*' tells it to combine all positional arguments into a single 
> list  
> args = parser.parse_args()  
> file_names = list()  
>
> #go through all of the arguments and replace ones with wildcards with the 
> expansion
> #if a string does not contain a wildcard, glob will return it as is.
> for arg in args.file_names:  
> file_names += glob(arg)  
>  
> main(file_names)
> 
> And way beyond my needs for such a tiny script but I think tis is the flask 
> developers python cli creation package Click 
> http://click.pocoo.org/5/why/#why-not-argparse based of optparse.
> 
> 
> > # probably a bug: you are overwriting data on every iteration
> > data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> 
> Thanks for picking this up will have to append to it on each iteration for 
> each attribute.
> 
> Thank You
> 
> Sayth

Scratch that bit about the code for 
http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
 can't get it to work, good general direction

Re: Why are my files in in my list - os module used with sys argv

2016-04-19 Thread Sayth Renshaw
On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten  wrote:
> Steven D'Aprano wrote:
> 
> > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> > 
> >> Hi
> >> 
> >> Why would it be that my files are not being found in this script?
> > 
> > You are calling the script with:
> > 
> > python jqxml.py samples *.xml
> > 
> > This does not do what you think it does: under Linux shells, the glob
> > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > no files in the current directory matching the glob *.xml, so it is not
> > expanded and the arguments your script receives are:
> > 
> > 
> > "python jqxml.py"  # not used
> > 
> > "samples"  # dir
> > 
> > "*.xml"  # mask
> > 
> > 
> > You then call:
> > 
> > fileResult = filter(lambda x: x.endswith(mask), files)
> > 
> > which looks for file names which end with a literal string (asterisk, dot,
> > x, m, l) in that order. You have no files that match that string.
> > 
> > At the shell prompt, enter this:
> > 
> > touch samples/junk\*.xml
> > 
> > and run the script again, and you should see that it now matches one file.
> > 
> > Instead, what you should do is:
> > 
> > 
> > (1) Use the glob module:
> > 
> > https://docs.python.org/2/library/glob.html
> > https://docs.python.org/3/library/glob.html
> > 
> > https://pymotw.com/2/glob/
> > https://pymotw.com/3/glob/
> > 
> > 
> > (2) When calling the script, avoid the shell expanding wildcards by
> > escaping them or quoting them:
> > 
> > python jqxml.py samples "*.xml"
> 
> (3) *Use* the expansion mechanism provided by the shell instead of fighting 
> it:
> 
> $ python jqxml.py samples/*.xml
> 
> This requires that you change your script
> 
> from pyquery import PyQuery as pq
> import pandas as pd
> import sys
> 
> fileResult = sys.argv[1:]
> 
> if not fileResult:
>  print("no files specified")
>  sys.exit(1)
> 
> for file in fileResult:
> print(file)
> 
> for items in fileResult:
> try:
> d = pq(filename=items)
> except FileNotFoundError as e:
> print(e)
> continue
> res = d('nomination')
> # you could move the attrs definition before the loop
> attrs = ('id', 'horse')
> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
> 
> I think this is the most natural approach if you are willing to accept the 
> quirk that the script tries to process the file 'samples/*.xml' if the 
> samples directory doesn't contain any files with the .xml suffix. Common 
> shell tools work that way:
> 
> $ ls samples/*.xml
> samples/1.xml  samples/2.xml  samples/3.xml
> $ ls samples/*.XML
> ls: cannot access samples/*.XML: No such file or directory
> 
> Unrelated: instead of working with sys.argv directly you could use argparse 
> which is part of the standard library. The code to get at least one file is
> 
> import argparse
> 
> parser = argparse.ArgumentParser()
> parser.add_argument("files", nargs="+")
> args = parser.parse_args()
> 
> print(args.files)
> 
> Note that this doesn't fix the shell expansion oddity.

Hi

Thanks for the insight, after doing a little reading I found this post which 
uses both argparse and glob and attempts to cover the windows and bash 
expansion of wildcards, 
http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html

import argparse  
from glob import glob  
   
def main(file_names):  
print file_names  
   
if __name__ == "__main__":  
parser = argparse.ArgumentParser()  
parser.add_argument("file_names", nargs='*') 
#nargs='*' tells it to combine all positional arguments into a single list  
args = parser.parse_args()  
file_names = list()  
   
#go through all of the arguments and replace ones with wildcards with the 
expansion
#if a string does not contain a wildcard, glob will return it as is.
for arg in args.file_names:  
file_names += glob(arg)  
 
main(file_names)

And way beyond my needs for such a tiny script but I think tis is the flask 
developers python cli creation package Click 
http://click.pocoo.org/5/why/#why-not-argparse based of optparse.


> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]

Thanks for picking this up will have to append to it on each iteration for each 
attribute.

Thank You

Sayth
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why are my files in in my list - os module used with sys argv

2016-04-19 Thread Peter Otten
Steven D'Aprano wrote:

> On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> 
>> Hi
>> 
>> Why would it be that my files are not being found in this script?
> 
> You are calling the script with:
> 
> python jqxml.py samples *.xml
> 
> This does not do what you think it does: under Linux shells, the glob
> *.xml will be expanded by the shell. Fortunately, in your case, you have
> no files in the current directory matching the glob *.xml, so it is not
> expanded and the arguments your script receives are:
> 
> 
> "python jqxml.py"  # not used
> 
> "samples"  # dir
> 
> "*.xml"  # mask
> 
> 
> You then call:
> 
> fileResult = filter(lambda x: x.endswith(mask), files)
> 
> which looks for file names which end with a literal string (asterisk, dot,
> x, m, l) in that order. You have no files that match that string.
> 
> At the shell prompt, enter this:
> 
> touch samples/junk\*.xml
> 
> and run the script again, and you should see that it now matches one file.
> 
> Instead, what you should do is:
> 
> 
> (1) Use the glob module:
> 
> https://docs.python.org/2/library/glob.html
> https://docs.python.org/3/library/glob.html
> 
> https://pymotw.com/2/glob/
> https://pymotw.com/3/glob/
> 
> 
> (2) When calling the script, avoid the shell expanding wildcards by
> escaping them or quoting them:
> 
> python jqxml.py samples "*.xml"

(3) *Use* the expansion mechanism provided by the shell instead of fighting 
it:

$ python jqxml.py samples/*.xml

This requires that you change your script

from pyquery import PyQuery as pq
import pandas as pd
import sys

fileResult = sys.argv[1:]

if not fileResult:
 print("no files specified")
 sys.exit(1)

for file in fileResult:
print(file)

for items in fileResult:
try:
d = pq(filename=items)
except FileNotFoundError as e:
print(e)
continue
res = d('nomination')
# you could move the attrs definition before the loop
attrs = ('id', 'horse')
# probably a bug: you are overwriting data on every iteration
data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]

I think this is the most natural approach if you are willing to accept the 
quirk that the script tries to process the file 'samples/*.xml' if the 
samples directory doesn't contain any files with the .xml suffix. Common 
shell tools work that way:

$ ls samples/*.xml
samples/1.xml  samples/2.xml  samples/3.xml
$ ls samples/*.XML
ls: cannot access samples/*.XML: No such file or directory

Unrelated: instead of working with sys.argv directly you could use argparse 
which is part of the standard library. The code to get at least one file is

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("files", nargs="+")
args = parser.parse_args()

print(args.files)

Note that this doesn't fix the shell expansion oddity.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why are my files in in my list - os module used with sys argv

2016-04-18 Thread MRAB

On 2016-04-19 00:44, Sayth Renshaw wrote:

Hi

Why would it be that my files are not being found in this script?

from pyquery import PyQuery as pq
import pandas as pd
import os
import sys

if len(sys.argv) == 2:
 print("no params")
 sys.exit(1)

dir = sys.argv[1]
mask = sys.argv[2]

files = os.listdir(dir)

fileResult = filter(lambda x: x.endswith(mask), files)

# d = pq(filename='20160319RHIL0_edit.xml')
data = []

for file in fileResult:
 print(file)

for items in fileResult:
 d = pq(filename=items)
 res = d('nomination')
 attrs = ('id', 'horse')
 data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]

 # from nominations
# res = d('nomination')
# nomID = [res.eq(i).attr('id') for i in range(len(res))]
# horseName = [res.eq(i).attr('horse') for i in range(len(res))]

# attrs = ('id', 'horse')

frames = pd.DataFrame(data)
print(frames)


I am running this from the bash prompt as

(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ python jqxml.py samples *.xml

my directory structure

(pyquery)sayth@sayth-E6410:~/Projects/pyquery$ ls -a
.  ..  environment.yml  .git  .gitignore  #jqxml.py#  jqxml.py  samples

and samples contains

(pyquery)sayth@sayth-E6410:~/Projects/pyquery/samples$ ls -a
.   20160319RHIL0_edit.xml  20160409RAND0.xml
..  20160402RAND0.xml   20160416RAND0.xml

yet I get no files out of the print statement.

Ideas?

I don't use Linux, but I think it might be a problem with what you have 
on the command line. I believe that Linux expands wildcarded names, so 
what you might be getting is "samples" followed by all the names in the 
current directory that match "*.xml".


Even if that isn't the case, and mask is "*.xml", the filtering that 
you're doing is asking for those names that end with "*.xml"; you might 
find a name that ends with ".xml", but I doubt you'll ever find one that 
ends with "*.xml"!


--
https://mail.python.org/mailman/listinfo/python-list


Re: Why are my files in in my list - os module used with sys argv

2016-04-18 Thread Steven D'Aprano
On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:

> Hi
> 
> Why would it be that my files are not being found in this script?

You are calling the script with:

python jqxml.py samples *.xml

This does not do what you think it does: under Linux shells, the glob *.xml
will be expanded by the shell. Fortunately, in your case, you have no files
in the current directory matching the glob *.xml, so it is not expanded and
the arguments your script receives are:


"python jqxml.py"  # not used

"samples"  # dir

"*.xml"  # mask


You then call:

fileResult = filter(lambda x: x.endswith(mask), files)

which looks for file names which end with a literal string (asterisk, dot,
x, m, l) in that order. You have no files that match that string.

At the shell prompt, enter this:

touch samples/junk\*.xml

and run the script again, and you should see that it now matches one file.

Instead, what you should do is:


(1) Use the glob module:

https://docs.python.org/2/library/glob.html
https://docs.python.org/3/library/glob.html

https://pymotw.com/2/glob/
https://pymotw.com/3/glob/


(2) When calling the script, avoid the shell expanding wildcards by escaping
them or quoting them:

python jqxml.py samples "*.xml"



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list