Re: [Tutor] Problem using lxml
Many thanks, Martin! I had indeed skipped creating the tree object and a few other things you pointed out. Here is my finished simple code that actually works: from lxml import html import requests page = requests.get("http://joplin.craigslist.org/search/w4m";) tree = html.fromstring(page.text) titles = tree.xpath('//a[@class="hdrlnk"]/text()') try: for title in titles: print title except: pass Pretty simple. Thanks for the help! On Sat, Aug 22, 2015 at 4:20 PM Martin A. Brown wrote: > > Hi there Anthony, > > > I'm pretty new to lxml but I pretty much thought I'd understood > > the basics. However, for some reason, my first attempt at using it > > is failing miserably. > > > > Here's the deal: > > > > I'm parsing specific page on Craigslist ( > > http://joplin.craigslist.org/search/rea) and trying to retreive the > text of > > each link on that page. When I do an "inspect element" in Firefox, a > sample > > anchor link looks like this: > > > > FIRST > > OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15) > > > > The code I'm using to try to get the link text is this: > > > > from lxml import html > > import requests > > > > page = requests.get("http://joplin.craigslist.org/search/rea";) > > You are missing something here that takes the page.content, parses > it and creates variable called tree. > > > titles = tree.xpath('//a[@title="hdrlnk"]/text()') > > And, your xpath is incorrect. Play with this in the interactive > browser and you will be able to correct your xpath. I think you > will notice from the example anchor link above that the attribute of > the HTML elements you want to grab is "class", not "title". > Therefore: > >titles = tree.xpath('//a[@class="hdrlnk"]/text()') > > Is probably closer. > > > print titles > > > > The last line, where it supposedly will print the text of each anchor > > returns []. > > > > I can't seem to figure out what I'm doing wrong. lmxml seems pretty > > straightforward but I can't seem to get this down. > > Again, I'd recommend playing with the data in an interactive console > session. You will be able to figure out exactly which xpath gets > you the data you would like, and then you can drop it into your > script. > > Good luck, > > -Martin > > -- > Martin A. Brown > http://linux-ip.net/ > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem using lxml
Hi there Anthony, I'm pretty new to lxml but I pretty much thought I'd understood the basics. However, for some reason, my first attempt at using it is failing miserably. Here's the deal: I'm parsing specific page on Craigslist ( http://joplin.craigslist.org/search/rea) and trying to retreive the text of each link on that page. When I do an "inspect element" in Firefox, a sample anchor link looks like this: FIRST OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15) The code I'm using to try to get the link text is this: from lxml import html import requests page = requests.get("http://joplin.craigslist.org/search/rea";) You are missing something here that takes the page.content, parses it and creates variable called tree. titles = tree.xpath('//a[@title="hdrlnk"]/text()') And, your xpath is incorrect. Play with this in the interactive browser and you will be able to correct your xpath. I think you will notice from the example anchor link above that the attribute of the HTML elements you want to grab is "class", not "title". Therefore: titles = tree.xpath('//a[@class="hdrlnk"]/text()') Is probably closer. print titles The last line, where it supposedly will print the text of each anchor returns []. I can't seem to figure out what I'm doing wrong. lmxml seems pretty straightforward but I can't seem to get this down. Again, I'd recommend playing with the data in an interactive console session. You will be able to figure out exactly which xpath gets you the data you would like, and then you can drop it into your script. Good luck, -Martin -- Martin A. Brown http://linux-ip.net/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem using lxml
On Sat, Aug 22, 2015 at 5:05 PM, Anthony Papillion wrote: > Hello Everyone, > > I'm pretty new to lxml but I pretty much thought I'd understood the basics. > However, for some reason, my first attempt at using it is failing miserably. > > Here's the deal: > > I'm parsing specific page on Craigslist ( > http://joplin.craigslist.org/search/rea) and trying to retreive the text of > each link on that page. When I do an "inspect element" in Firefox, a sample > anchor link looks like this: > > FIRST > OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15) > > The code I'm using to try to get the link text is this: > > from lxml import html > import requests > > page = requests.get("http://joplin.craigslist.org/search/rea";) > titles = tree.xpath('//a[@title="hdrlnk"]/text()') > print titles > > The last line, where it supposedly will print the text of each anchor > returns []. > > I can't seem to figure out what I'm doing wrong. lmxml seems pretty > straightforward but I can't seem to get this down. > > Can anyone make any suggestions? > > Thanks! > Anthony Not an answer, but have you checked out Beautiful Soup? It is a great html parsing tool, with a good tutorial: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor -- Joel Goldstick http://joelgoldstick.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Problem using lxml
Hello Everyone, I'm pretty new to lxml but I pretty much thought I'd understood the basics. However, for some reason, my first attempt at using it is failing miserably. Here's the deal: I'm parsing specific page on Craigslist ( http://joplin.craigslist.org/search/rea) and trying to retreive the text of each link on that page. When I do an "inspect element" in Firefox, a sample anchor link looks like this: FIRST OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15) The code I'm using to try to get the link text is this: from lxml import html import requests page = requests.get("http://joplin.craigslist.org/search/rea";) titles = tree.xpath('//a[@title="hdrlnk"]/text()') print titles The last line, where it supposedly will print the text of each anchor returns []. I can't seem to figure out what I'm doing wrong. lmxml seems pretty straightforward but I can't seem to get this down. Can anyone make any suggestions? Thanks! Anthony ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] filtering listed directories
In a message of Sat, 22 Aug 2015 14:32:56 +0100, Alan Gauld writes: >But maybe some questions on a Tix (or Tk) forum might >get more help? Once you know how to do it in native >Tcl/Tk/Tix you can usually figure out how to do it >in Python. > >-- >Alan G I asked the question on tkinter-discuss, but the question hasn't shown up yet. In the meantime, I have found this: http://www.ccs.neu.edu/research/demeter/course/projects/demdraw/www/tickle/u3/tk3_dialogs.html which looks like, if we converted it to tkinter, would do the job, since all it wants is a list of files. I have guests coming over for dinner, so it will be much later before I can work on this. (And I will be slow -- so if you are a wizard at converting tk to tkinter, by all means feel free to step in here. :) ) Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] filtering listed directories
On 22/08/15 11:43, Laura Creighton wrote: How can I filter out these hidden directories? Help(tkFileDialog) doesn't help me as it just shows **options, but doesn't show what these options might be. tix (tkinter extensions) https://wiki.python.org/moin/Tix have some more file dialogs, so maybe there is joy there. There is a FileSelectDialog in Tix that has a dircmd option according to the Tix documentation. However, I've played about with it and can't figure out how to make it work! There is also allegedly a 'hidden' check-box subwidget that controls whether hidden files are shown. Again I couldn't find how to access this. But maybe some questions on a Tix (or Tk) forum might get more help? Once you know how to do it in native Tcl/Tk/Tix you can usually figure out how to do it in Python. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Can someone explain this to me please
On Fri, Aug 21, 2015 at 1:04 PM, Jon Paris wrote: > > import sys > x = sys.maxsize > print ("Max size is: ", x) > y = (x + 1) > print ("y is", type(y), "with a value of", y) > > Produces this result: > > Max size is: 9223372036854775807 > y is with a value of 9223372036854775808 > > I was expecting it to error out but instead it produces a value greeter than > the > supposed maximum while still keeping it as an int. I’m confused. If > sys.maxsize _isn’t_ the largest possible value then how do I determine what > is? sys.maxsize is the "maximum size lists, strings, dicts, and many other containers can have". This value is related to the theoretical maximum of Python's built-in arbitrary precision integer type (i.e. long in Python 2 and int in Python 3), which can be thought of as a 'container' for 15-bit or 30-bit "digits". For example, in a 64-bit version of Python 3 that's compiled to use 30-bit digits in its int objects, the limit is about (sys.maxsize bytes) // (4 bytes / 30bit_digit) * (9 decimal_digits / 30bit_digit) == 20752587082923245559 decimal digits. In practice you'll get a MemoryError (or probably a human impatience KeyboardInterrupt) long before that. sys.maxint (only in Python 2) is the largest positive value for Python 2's fixed-precision int type. In pure Python code, integer operations seamlessly transition to using arbitrary-precision integers, so you have no reason to worry, practically speaking, about reaching the "largest possible value". As a matter of trivia, sys.maxint in CPython corresponds to the maximum value of a C long int. In a 64-bit Windows process, a C long int is 32-bit, which means sys.maxint is 2**31 - 1. In every other supported OS, sys.maxint is 2**63 - 1 in a 64-bit process. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] filtering listed directories
In a message of Sat, 22 Aug 2015 12:20:31 +1000, Chris Roy-Smith writes: >Hi, >environment: Python 2.7, Ubuntu 12.4 Linux > >I am trying to get the list of directories shown by >tkFileDialog.askdirectory to not show hidden files (starting with .) > >this code results in lots of hidden directories listed in the interface >making things harder than they need to be for the user. > >#! /usr/bin/python >import Tkinter, tkFileDialog >root = Tkinter.Tk() >root.withdraw() > >dirname = >tkFileDialog.askdirectory(parent=root,initialdir="/home/chris/",title='Pick >a directory') > >How can I filter out these hidden directories? >Help(tkFileDialog) doesn't help me as it just shows **options, but >doesn't show what these options might be. The options are listed here: http://effbot.org/tkinterbook/tkinter-file-dialogs.htm or http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/tkFileDialog.html Unfortunately, they do not help. There is all sorts of help for 'only show things that match a certain pattern' but not for 'only show things that do not match a certain pattern'. (Or maybe my pattern-making skill is at fault, but I don't think so.) tix (tkinter extensions) https://wiki.python.org/moin/Tix have some more file dialogs, so maybe there is joy there. This seems utterly crazy to me -- you surely aren't the first person who wanted to exclude certain directories in a file dialog. I will look more later this afternoon in my old tkinter files. I must have wanted to do this at one point, mustn't I? puzzled, Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] filtering listed directories
Hi, environment: Python 2.7, Ubuntu 12.4 Linux I am trying to get the list of directories shown by tkFileDialog.askdirectory to not show hidden files (starting with .) this code results in lots of hidden directories listed in the interface making things harder than they need to be for the user. #! /usr/bin/python import Tkinter, tkFileDialog root = Tkinter.Tk() root.withdraw() dirname = tkFileDialog.askdirectory(parent=root,initialdir="/home/chris/",title='Pick a directory') How can I filter out these hidden directories? Help(tkFileDialog) doesn't help me as it just shows **options, but doesn't show what these options might be. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Do not understand why test is running.
boB Stepp wrote: > In the cold light of morning, I see that in this invocation, the path > is wrong. But even if I correct it, I get the same results: > > e:\Projects\mcm>py -m unittest ./test/db/test_manager.py [...] > ValueError: Empty module name Make sure that there are files ./test/__init__.py ./test/db/__init__.py and then try py -m unittest test.db.test_manager > e:\Projects\mcm>py ./test/db/test_manager.py > Traceback (most recent call last): > File "./test/db/test_manager.py", line 16, in > import mcm.db.manager > ImportError: No module named 'mcm' Make sure the parent directory of the mcm package (I believe this is E:\Projects\mcm) is in your PYTHONPATH, then try again. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Complications Take Two (Long) Frustrations.
In a message of Sat, 22 Aug 2015 17:00:55 +1000, "Steven D'Aprano" writes: >On Fri, Aug 21, 2015 at 11:29:52PM +0200, Roel Schroeven wrote: >> Joel Goldstick schreef op 2015-08-21 23:22: >> >so: >> >print -max(-A, -B) >> >> That's what I mean, yes. I haven't tried it, but I don't see why it >> wouldn't work. > >It won't work with anything which isn't a number: > >py> min("hello", "goodbye") >'goodbye' > > >But the max trick fails: > >py> -max(-"hello", -"goodbye") >Traceback (most recent call last): > File "", line 1, in >TypeError: bad operand type for unary -: 'str' > > >If you want to write your own min without using the built-in, there is >only one correct way to do it that works for all objects: > >def min(a, b): >if a < b: return a >return b > >Well, more than one way -- you can change the "a < b" to "a <= b" if you >prefer. Or reverse the test and use >, or similar, but you know what I >mean. > >-- >Steve Yes, but I think the OP's problem is that he has a fool for a teacher, or a course designer at any rate. For some reason the author thinks that the fact that max(A, B) == -max(-A, -B) (for integers) is very, very clever. And somehow the teacher hasn't learnt that his or her job is to make students question 'clever programming' while not distroying the enthusiasm of any students who come up with clever solutions on their own. Cleverness is the consolation prize in this business -- what you want to write is code that demonstrates wisdom, not cleverness. They are fun to write, though. But remember: Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? — Brian Kernighan The Elements of Programming Style So the reflex you want to develop is 'I just did something clever. Hmmm. Maybe _too_ clever. Let's see ...' The cleverer you are as a person, the more you have to develop this reflex, because after all, somebody much less clever -- or experienced -- than you are may have to fix a bug in your code some day. so the max(A, B) == -max(-A, -B) trick has everything to do with 'Watch me pull a rabbit out of this hat' and nothing to do with 'good programming style'. Too much education of the sort that rewards cleverness and penalises wisdom means we end up with a lot of smart people in this world who have managed to get the idea that 'Wisdom is something that only stupid people need. It is optional for smart people, and I am smart enough to do without!' Some people _never_ unlearn this one. My family is, alas, full of them. Laura ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Complications Take Two (Long) Frustrations.
On Fri, Aug 21, 2015 at 11:29:52PM +0200, Roel Schroeven wrote: > Joel Goldstick schreef op 2015-08-21 23:22: > >so: > >print -max(-A, -B) > > That's what I mean, yes. I haven't tried it, but I don't see why it > wouldn't work. It won't work with anything which isn't a number: py> min("hello", "goodbye") 'goodbye' But the max trick fails: py> -max(-"hello", -"goodbye") Traceback (most recent call last): File "", line 1, in TypeError: bad operand type for unary -: 'str' If you want to write your own min without using the built-in, there is only one correct way to do it that works for all objects: def min(a, b): if a < b: return a return b Well, more than one way -- you can change the "a < b" to "a <= b" if you prefer. Or reverse the test and use >, or similar, but you know what I mean. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor