Re: [Tutor] pipes and redirecting
On 27/05/14 21:01, Adam Gold wrote: dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync | pbzip2 1.img.bz2 The first thing I do is break it into two assignments And that's the start of the problem because it should be three: The first command, the second command and the output file. ddIf = shlex.split(dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync) compress = shlex.split(pbzip2 /home/adam/1.img.bz2) compress = pbzip2 outfile = open('/home/adam/1.img.bz2','w') The redirection symbol is not something subprocess can use as an argument. p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE) p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE) Use the output file here. p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=outfile) I think that the '' redirect needs to be dealt with using the subprocess module as well but I can't quite put the pieces together. I'd appreciate any guidance. Thanks. Alternatively read the output into a variable using communicate but then write it out to the file manually at the end. [You might also be able to use shell=TRUE but that introduces other issues. But I don't know whether using shell includes redirection.] -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
On 27May2014 15:27, Degreat Yartey yarteydegre...@gmail.com wrote: I am studying python on my own (i.e. i am between the beginner and intermediate level) and i haven't met any difficulty until i reached the topic 'Generators and Iterators'. I need an explanation so simple as using the expression 'print ()', in this case 'yield'. Python 2.6 here! Thank you. Generators are functions that do a bit of work and then yield a value, then a bit more and so on. This means that you call them once. What you get back is an iterator, not the normal function return value. Whenever you use the iterator, the generator function runs until it hits a yield statement, and the value in theyield statement is what you get for that iteration. Next time you iterate, the function runs a bit more, until it yields again, or returns (end of function, and that causes end of iteration). So the function doesn't even run until you ask for a value, and then it only runs long enough to find the next value. Example (all code illstrative only, untested): Suppose you need to process every second line of a file. You might write it directly like this: def munge_lines(fp): ''' Do stuff with every second line of the already-open file `fp`. ''' lineno = 0 for line in fp: lineno += 1 if lineno % 2 == 0: print lineno, line, That should read lines from the file and print every second one with the line number. Now suppose you want something more complex than every second line, especially something that requires keeping track of some state. In the example above you only need the line number, and using it still consumes 2 of the 3 lines in the loop body. A more common example might be lines between two markers. The more of that you embed in the munge_lines function, the more it will get in the way of seeing what the function actually does. So a reasonable thing might be to write a function that gets the requested lines: def wanted_lines(fp): wanted = [] between = False for line in fp: if between: if 'end_marker' in line: between = False else: wanted.append(line) elif 'start_maker' in line: between = True return wanted This reads the whole file and returns a line of the wanted lines, and munge_lines: might then look like this: for line in wanted_lines(fp): print line However: - that reads the whole file before returning anything - has to keep all the lines in the list wanted Slow in response, heavy in memory cost, and unworkable if fp actually doesn't end (eg reading from a terminal, or a pipeline, or...) What you'd really like is to get each line as needed. We can rewrite wanted_lines as a generator: def wanted_lines(fp): between = False for line in fp: if between: if 'end_marker' in line: between = False else: yield line elif 'start_maker' in line: between = True All we've done is used yield instead of the append and removed the wanted list and the return statement. The calling code is the same. To see the difference, put a print in wanted_lines as the first line of the for loop. With the list version you will see all the prints run before you get the array back. With the generator you will see the print run just before each value you get back. Cheers, Cameron Simpson c...@zip.com.au ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pipes and redirecting
On 27May2014 21:01, Adam Gold a...@gmx.com wrote: I'm trying to run the following unix command from within Python as opposed to calling an external Bash script (the reason being I'm doing it multiple times within a for loop which is running through a list): dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync | pbzip2 1.img.bz2 First off, one expedient way to do this is to generate a shell script and pipe into sh (or sh -uex, my preferred error sensitive invocation). p1 = subprocess.Popen([sh, -uex], stdin=PIPE) for num in range(1,11): print(dd if=/home/adam/%d bs=4k conv=noerror,notrunc,sync | pbzip2 %d.img.bz2, % (num, num), file=p1.stdin) p1.stdin.close() p1.wait() Any quoting issues aside, this is surprisingly useful. Let the shell do what it is good it. And NOTHING you've said here requires using bash. Use sh and say sh, it is very portable and bash is rarely needed for most stuff. However, I gather beyond expediency, you want to know how to assemble pipelines using subprocess anyway. So... The first thing I do is break it into two assignments (I know this isn't strictly necessary but it makes the code easier to deal with): ddIf = shlex.split(dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync) compress = shlex.split(pbzip2 /home/adam/1.img.bz2) This is often worth doing regardless. Longer lines are harder to read. I have looked at the docs here (and the equivalent for Python 3) https://docs.python.org/2/library/subprocess.html. I can get a 'simple' pipe like the following to work: p1 = subprocess.Popen([ps], stdout=PIPE) p2 = subprocess.Popen([grep, ssh], stdin=p1.stdout, stdout=subprocess.PIPE) p1.stdout.close() output = p2.communicate()[0] If you don't care about the stdout of p2 (and you don't, based on your dd|pbzip2 example above) and you have left p2's stdout alone so that it goes to your normal stdout (eg the terminal) then you don't need to waste time with .communicate. I almost never use it myself. As the doco says, prone to deadlock. I prefer to just do the right thing explicitly myself, as needed. I then try to adapt it to my example: p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE) p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE) p1.stdout.close() output = p2.communicate()[0] I get the following error: pbzip2: *ERROR: File [] NOT found! Skipping... --- pbzip2: *ERROR: Input file [/home/adam/1.img.bz2] already has a .bz2 extension! Skipping I think that the '' redirect needs to be dealt with using the subprocess module as well but I can't quite put the pieces together. I'd appreciate any guidance. Thanks. It is as you expect. Consider what the shell does with: pbzip2 1.img.bz2 It invokes the command pbzip2 (no arguments) with its output attached to the file 1.img.bz2. So first up: stay away form shlex. It does _not_ do what you need. Shlex knows about shell string quoting. It does not know about redirections. It is handy for parsing minilanguages on your own concoction where you want to be able to quote strings with spaces. It is not a full on shell parser. So it (may) serve you well for the dd invocation because there are no redirections. But for your usage, so would the .split() method on a string, or even better: don't you already know the arguments for your dd? Just fill them out directly rather than backtracking from a string. However, your recipe is very close. Change: p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE) into: p2 = subprocess.Popen([pbzip2], stdin=p1.stdout, stdout=open(1.img.bz2, w)) Because p2 is writing to 1.img.bz2 you don't need to much about with .communicate either. No output to collect, no input to supply. See where that takes you. Cheers, Cameron Simpson c...@zip.com.au ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] HTML Parsing
Hello Friends, I am using Python 3.3.3 on Windows 7. I would like to know what is the best method to do HTML parsing? For example, I want to connect to www.yahoo.com and get all the tags and their values. Thanks. Warm Regards, *Mitesh H. Budhabhatti* Cell# +91 99040 83855 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
I am completely new to programming! On May 27, 2014 10:54 PM, R. Alan Monroe amon...@columbus.rr.com wrote: I need an explanation so simple as using the expression 'print ()', in this case 'yield'. Python 2.6 here! Ever write any C programs with static variables? Generators can be explained in those terms if you have experience with them. Alan ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
I really love this explanation... that means functions just run till it finishes its duty, then return...and generators just generate one at a time until the 'for' statement asks for __next__(). On May 28, 2014 8:37 AM, Cameron Simpson c...@zip.com.au wrote: On 27May2014 15:27, Degreat Yartey yarteydegre...@gmail.com wrote: I am studying python on my own (i.e. i am between the beginner and intermediate level) and i haven't met any difficulty until i reached the topic 'Generators and Iterators'. I need an explanation so simple as using the expression 'print ()', in this case 'yield'. Python 2.6 here! Thank you. Generators are functions that do a bit of work and then yield a value, then a bit more and so on. This means that you call them once. What you get back is an iterator, not the normal function return value. Whenever you use the iterator, the generator function runs until it hits a yield statement, and the value in theyield statement is what you get for that iteration. Next time you iterate, the function runs a bit more, until it yields again, or returns (end of function, and that causes end of iteration). So the function doesn't even run until you ask for a value, and then it only runs long enough to find the next value. Example (all code illstrative only, untested): Suppose you need to process every second line of a file. You might write it directly like this: def munge_lines(fp): ''' Do stuff with every second line of the already-open file `fp`. ''' lineno = 0 for line in fp: lineno += 1 if lineno % 2 == 0: print lineno, line, That should read lines from the file and print every second one with the line number. Now suppose you want something more complex than every second line, especially something that requires keeping track of some state. In the example above you only need the line number, and using it still consumes 2 of the 3 lines in the loop body. A more common example might be lines between two markers. The more of that you embed in the munge_lines function, the more it will get in the way of seeing what the function actually does. So a reasonable thing might be to write a function that gets the requested lines: def wanted_lines(fp): wanted = [] between = False for line in fp: if between: if 'end_marker' in line: between = False else: wanted.append(line) elif 'start_maker' in line: between = True return wanted This reads the whole file and returns a line of the wanted lines, and munge_lines: might then look like this: for line in wanted_lines(fp): print line However: - that reads the whole file before returning anything - has to keep all the lines in the list wanted Slow in response, heavy in memory cost, and unworkable if fp actually doesn't end (eg reading from a terminal, or a pipeline, or...) What you'd really like is to get each line as needed. We can rewrite wanted_lines as a generator: def wanted_lines(fp): between = False for line in fp: if between: if 'end_marker' in line: between = False else: yield line elif 'start_maker' in line: between = True All we've done is used yield instead of the append and removed the wanted list and the return statement. The calling code is the same. To see the difference, put a print in wanted_lines as the first line of the for loop. With the list version you will see all the prints run before you get the array back. With the generator you will see the print run just before each value you get back. Cheers, Cameron Simpson c...@zip.com.au ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
This means that '...' should generally contain a manipulator then yield generates from where it stopped...*getting it* Thanks for the explanation though! Its seems so simple to digest. Thank you... On May 28, 2014 1:09 AM, Danny Yoo d...@hashcollision.org wrote: On Tue, May 27, 2014 at 12:27 PM, Degreat Yartey yarteydegre...@gmail.com wrote: I am studying python on my own (i.e. i am between the beginner and intermediate level) and i haven't met any difficulty until i reached the topic 'Generators and Iterators'. I need an explanation so simple as using the expression 'print ()', in this case 'yield'. You can think of a generator as almost like a function, except it can return, not just once, but multiple times. Because it can return multiple times, if we squint at it enough, it acts like a _sequence_, just like the other sequence-like things in Python like files and lists and tuples. That is, as a sequence, it's something that we can walk down, element by element. We can loop over it. For example, let's say that we wanted to represent the same sequences as that of range(5). Here's one way we can do it with a generator: # def upToFive(): yield 0 yield 1 yield 2 yield 3 yield 4 # Let's try it. # sequence = upToFive() next(sequence) 0 next(sequence) 1 next(sequence) 2 next(sequence) 3 next(sequence) 4 next(sequence) Traceback (most recent call last): File stdin, line 1, in module StopIteration for x in upToFive(): ... print(I see %d % x) ... I see 0 I see 1 I see 2 I see 3 I see 4 # Now this is a toy example. If we wanted range(5), we'd just say range(5) and be done with it. What's neat about generators is that they make it easy to build these sequences while pretending that we're writing a plain function. All of the even numbers, for examples, is a sequence that we can make with a generator: # def onlyEvens(): n = 0 while True: yield n n = n + 2 # Let's try running it: # sequence = onlyEvens() next(sequence) 0 next(sequence) 2 next(sequence) 4 next(sequence) 6 # And note that this sequence doesn't stop! We can keep calling next() on it and it will continue to run. We _can_ write a loop to run over such infinite sequences, but we'll also have to make sure to stop it manually: it won't exhaust otherwise, so doing something like: # for n in onlyEvens(): ... # better have something in the ... that interrupts or returns, or else that loop will never end. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
On 28/05/14 11:42, Mitesh H. Budhabhatti wrote: Hello Friends, I am using Python 3.3.3 on Windows 7. I would like to know what is the best method to do HTML parsing? For example, I want to connect to www.yahoo.com http://www.yahoo.com and get all the tags and their values. The standard library contains a parser module: html.parser Which can do what you want, although its a non-trivial exercise. Basically you define event handler functions for each type of parser event. In your case you need handlers for starttag and data, and maybe, endtag. Within start-tag you can read the attributes to determine the tag type so it typically looks like def handle_starttag(self, name, attributes): if name == 'p': # process paragraph tag elif name == 'tr': # process table row etc... However, you might find it easier to use BeautifulSoup which is a third-party package you need to download. Soup tends to handle badly formed HTML better than the standard parser and works by reading the whole HTML document into a tree like structure which you can access, search or traverse... HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
On 28/05/14 11:52, Degreat Yartey wrote: This means that '...' should generally contain a manipulator then yield generates from where it stopped...*getting it* It would help if you deleted the irrelevent bits so we can see which '...' you mean. I'm guessing it was this comment, right at the end, by Danny: We _can_ write a loop to run over such infinite sequences, but we'll also have to make sure to stop it manually: it won't exhaust otherwise, so doing something like: # for n in onlyEvens(): ... # better have something in the ... that interrupts or returns, or else that loop will never end. Notice that the ... here is not part of the generator. So the ... here references to how you process the output of the generator - the yielded values. So in this example case you'd have something like # for n in onlyEvens(): if n 1000: break # prevent infinite loop # now process the evens that we are interested in. # The 'break' could also be a 'return' if the loop were inside a function. And it doesn't have to be a direct value check as shown the condition could be based on some external condition or even user input. The important point is to make sure that you have some way to exit the loop if the generator is infinite. hth -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
Dear all, i'd like to thank every answer in this list. Alan Gauld is a fine writer of excellent introductory material on Pyton, and so are a few other members of this list. So, it is always enlightening to read what you all write. Keep up the good work. All the best, hilton On Wed, May 28, 2014 at 8:57 AM, Alan Gauld alan.ga...@btinternet.comwrote: On 28/05/14 11:52, Degreat Yartey wrote: This means that '...' should generally contain a manipulator then yield generates from where it stopped...*getting it* It would help if you deleted the irrelevent bits so we can see which '...' you mean. I'm guessing it was this comment, right at the end, by Danny: We _can_ write a loop to run over such infinite sequences, but we'll also have to make sure to stop it manually: it won't exhaust otherwise, so doing something like: # for n in onlyEvens(): ... # better have something in the ... that interrupts or returns, or else that loop will never end. Notice that the ... here is not part of the generator. So the ... here references to how you process the output of the generator - the yielded values. So in this example case you'd have something like # for n in onlyEvens(): if n 1000: break # prevent infinite loop # now process the evens that we are interested in. # The 'break' could also be a 'return' if the loop were inside a function. And it doesn't have to be a direct value check as shown the condition could be based on some external condition or even user input. The important point is to make sure that you have some way to exit the loop if the generator is infinite. hth -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] pipes and redirecting
On 27/05/14 21:01, Adam Gold wrote: dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync | pbzip2 1.img.bz2 The first thing I do is break it into two assignments And that's the start of the problem because it should be three: The first command, the second command and the output file. ddIf = shlex.split(dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync) compress = shlex.split(pbzip2 /home/adam/1.img.bz2) compress = pbzip2 outfile = open('/home/adam/1.img.bz2','w') The redirection symbol is not something subprocess can use as an argument. p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE) p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=subprocess.PIPE) Use the output file here. p2 = subprocess.Popen(compress, stdin=p1.stdout, stdout=outfile) I think that the '' redirect needs to be dealt with using the subprocess module as well but I can't quite put the pieces together. I'd appreciate any guidance. Thanks. Alternatively read the output into a variable using communicate but then write it out to the file manually at the end. [You might also be able to use shell=TRUE but that introduces other issues. But I don't know whether using shell includes redirection.] Thanks Alan, yes, I realise now I needed a third assignment to make this work. I actually had an exchange with subscriber 'eryksun' yesterday who did a great job of pointing me in the right direction. As a bit of a noob, I think I replied to the individual rather than the list, hence it doesn't seem to be in the thread. For the benefit of the archives I append below eryksun's initial (there was a bit of follow up but nothing too important) reply to me. = Send p2's stdout to a file: import subprocess import shlex ddIf = shlex.split(dd if=/home/adam/1 bs=4k conv=noerror,notrunc,sync) compress = pbzip2 filename = /home/adam/1.img.bz2 p1 = subprocess.Popen(ddIf, stdout=subprocess.PIPE) with p1.stdout as fin, open(filename, w) as fout: p2 = subprocess.Popen(compress, stdin=fin, stdout=fout) ret1 = p1.wait() ret2 = p2.wait() Does this work? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
I am using Python 3.3.3 on Windows 7. I would like to know what is the best method to do HTML parsing? For example, I want to connect to www.yahoo.com and get all the tags and their values. For this purpose, you may want to look at the APIs that the search engines provide, rather than try to web-scrape the human-focused web pages. Otherwise, your program will probably be fragile to changes in the structure of the web site. A search for search APIs comes up with hits like this: https://developer.yahoo.com/boss/search/ https://developers.google.com/web-search/docs/#fonje_snippets http://datamarket.azure.com/dataset/bing/search https://pypi.python.org/pypi/duckduckgo2 If you can say more about what you're planning to do, perhaps someone has already provided a programmatic interface to it. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
Mitesh H. Budhabhatti mitesh.budhabha...@gmail.com Wrote in message: (please post in text email, not html. Doesn't matter for most people on this particular message, but it's the polite thing to do) I see others have answered the programming question, but there's a separate one. What is the license of the particular ste, yahoo in this case. For an occasional scrape, nobody's likely to mind. But if you plan any volume, using the official api is more polite. -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] How parse files in function of number of lines
Dear all! I have two example files: tmp.csv: namevalue root mark34 yes tmp2.csv namevalue root I want to print a different text if I have more than one row and if I have only one row. My code is this: with open(tmp.csv) as p: header =p.next() for i in p: print i g = () if not g: print header mark34 yes no I want to obtain only where I have only the header the header string? How can I do this?Thnks for your great patience and help! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I am having difficulty grasping 'generators'
I'm not going to add too much more to all the replies here already, but one of my students did record a quick 6-minute video in one of my courses where i explained generators. hopefully you find it useful! It's about halfway down the page at http://cyberwebconsulting.com. (Also for those learning Python and in the San Francisco area, I'm offering another intensive 3-day course mid-summer -- more info on the same page. Ping me privately for more details or if you have questions!) Cheers, --Wesley On Tue, May 27, 2014 at 12:27 PM, Degreat Yartey yarteydegre...@gmail.comwrote: I am studying python on my own (i.e. i am between the beginner and intermediate level) and i haven't met any difficulty until i reached the topic 'Generators and Iterators'. I need an explanation so simple as using the expression 'print ()', in this case 'yield'. Python 2.6 here! Thank you. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - A computer never does what you want... only what you tell it. +wesley chun http://google.com/+WesleyChun : wescpy at gmail : @wescpyhttp://twitter.com/wescpy Python training consulting : http://CyberwebConsulting.com Core Python books : http://CorePython.com Python blog: http://wescpy.blogspot.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How parse files in function of number of lines
On 28/05/14 20:16, jarod...@libero.it wrote: Dear all! I have two example files: tmp.csv: namevalue root mark34 yes tmp2.csv namevalue root I understood down to here. I want to print a different text if I have more than one row and if I have only one row. This is not clear. Where do you want to print this text? What kind of text? More than one row where? In file 1? file 2? or both? My code is this: with open(tmp.csv) as p: header =p.next() Probably easier to use readline() for i in p: print i g = () I've no idea what you think this is doing? It creates an empty tuple so will always evaluate to False if not g: print header mark34 yes no huh I want to obtain only where I have only the header the header string? How can I do this?Thnks for your great patience and help! You might want to investigate the csv module and in particular the DictReader class. It might be easier for your purposes. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor