Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Peter Otten, I typed in (and did not copy and paste) the code as you suggested just now (6.28 pm, Sunday 12th July 2015), this is the result I got: Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r")as f: ... soup = BeautifulSoup(f,"lxml") File "", line 2 soup = BeautifulSoup(f,"lxml") ^ IndentationError: expected an indented block >>> soup = BeautifulSoup(f,"lxml") Traceback (most recent call last): File "", line 1, in NameError: name 'f' is not defined >>> The first time I typed in the second line, I got the "Indentation error" the second time I typed in exactly the same code, I got the: "NameError:name 'f' is not defined" -- https://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Peter Otten, Yes, I have been copying and pasting, as it saves typing. I do get 'indented block' error responses as a small price to pay for the time and energy thus saved. Also Console seems to reject for 'indented block' reasons better known to itself, copy and pasted lines that it accepts and are exactly the same on the following line of input. Maybe it is an inbuilt feature of Python's to discourage copy and pasting. -- https://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Peter Otten, Incidentally, you have discovered a fault in that there is an erroneous difference in my code of 'ecologicalpyramid.html' and that given in the text, in the first few lines re: plants 10 algae 10 plants 10 algae 10 I have removed the line to the right html code of the lower version. Now there is a string ("plants") between the <"li class producerlist"> and Sorry about that. However as you said, the input code as quoted in the text, still won't return 'plants' re: Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid,","lxml") >>> producer_entries = soup.find("ul") >>> print(producer_entries.li.div.string) Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'li' >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Peter Otten, thank you for your reply that I have not gone very far into the detail of which, as it seems Python console cannot recognise the name 'f' as given it, re output below : Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r")as f: >>> soup = BeautifulSoup(f, "lxml") Traceback (most recent call last): File "", line 1, in NameError: name 'f' is not defined >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Mark Lawrence, thank you for your advice. I take it that I use the input you suggest for the line : soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml") seeing as I have to give the file's full address I therefore have to modify your : soup = BeautifulSoup(ecological_pyramid,"lxml") to : soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml") otherwise I get : >>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>> ecological_pyramid: >>> soup = BeautifulSoup(ecological_pyramid,"lxml") Traceback (most recent call last): File "", line 1, in NameError: name 'ecological_pyramid' is not defined so anyway with the input therefore as: >>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as >>> ecological_pyramid: >>> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid,","lxml") >>> producer_entries = soup.find("ul") >>> print(producer_entries.li.div.string) I still get the following output from the console: Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'li' >>> As is probably evident, what is the problem Python has with finding the required html code within the 'ecologicalpyramid' html file, or more specifically why does it respond that the html file has no such attribute as 'li' ? Incidentally I have installed all the xml, lxml, html, and html5 TreeBuilders/ Parsers. I am using lxml as that is the format specified in the text. I may as well quote the text on the page in question in 'Getting Started with Beautiful Soup': 'Since producers come as the first entry for the tag, we can use the find() method, which normally searches fo ronly the first occurrance of a particular tag in a BeautifulSoup object. We store this in producer_entries. The next line prints the name of the first producer. From the previous HTML diagram we can understand that the first producer is stored inside the first tag of the first tag that immediately follows the first tag , as shown inthe following code: plants 10 So after running the preceding code, we will get plants, which is the first producer, as the output.' (page 30) -- https://mail.python.org/mailman/listinfo/python-list
Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Dear Programmers, Thank you for your advice regarding giving the console a current address in the code for it to access the html file. The console seems to accept the code to that extent, but when I input the two lines of code intended to access the location of a required word, the console rejects it re : AttributeError:'NoneType' object has no attribute 'li' However the document 'EcologicalPyramid.html' does contain the words 'li' and 'ul', in its text. I am not sure as to how the input is arranged to output 'plants' which is also in the documents text, but that is the word the code is meant to elicit. I enclose the pertinent code as input and output from the console, and the html code for the document 'EcologicalPyramid.html' Thank you in advance for your help. - >>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as >>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html","lxml") ... producer_entries = soup.find("ul") File "", line 2 producer_entries = soup.find("ul") ^ SyntaxError: invalid syntax >>> producer_entries = soup.find("ul") >>> print (producer_entries.li.div.string) Traceback (most recent call last): File "", line 1, in AttributeError: 'NoneType' object has no attribute 'li' -- prin plants 10 algae 10 deer 1000 deer 1000 rabbit 2000 fox 100 bear 100 lion 80 tiger 50 -- https://mail.python.org/mailman/listinfo/python-list
Re: Python console rejects an object reference, having made an object with that reference as its name in previous line
Dear Michael Torrie, Thanks for pointing that out to me re: it not being a syntax problem. The thing is there is a file called 'EcologicalPyramid.html'. I put it in a folder called 'Soup' as the text advised on page 28. For what its worth I also shifted the Windows Command Prompt to that folder (re: cd Soup)as instructed on page 30, and put a duplicate file of 'EcologicalPyramid.html' in the python 2.8 directory. I therefore am wondering where I ought put this html file where the Python console will recognize it ? Thank you for your attention, Yours Simon -- https://mail.python.org/mailman/listinfo/python-list
Re: Python console rejects an object reference, having made an object with that reference as its name in previous line
@Steven D'Aprano, I input the following to Python 2.7, which got the following:- >>> from bs4 import BeautifulSoup >>> with open("ecologicalpyramid.html","r") as ecological_pyramid: ... soup= next(ecological_pyramid,"lxml") ... producer_entries = soup.find("ul") ... Traceback (most recent call last): File "", line 1, in IOError: [Errno 2] No such file or directory: 'ecologicalpyramid.html' >>> - I kept to your instructions to input the 'Enter' after the fourth line and then before the fifth line, ie between the indented block and the unindented one, which as above, doesn't give me a chance to actually input the fifth line. If I do it both ways, ie: pressing enter after the fourth and before the fifth or just pressing enter after the fourth and then after the fifth line of input, which again it won't actually let me input because before I do, I still get an error return. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python console rejects an object reference, having made an object with that reference as its name in previous line
Dear Jussi, and Billy I have changed the input in accordance with your advice, re: -- Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> with open("ecologicalpyramid.html","r") as ecological_pyramid: ... soup = next(ecological_pyramid,"lxml") ... producer_entries = soup.find("ul") ... print(producer_entries.li.div.string) ... print(producer_entries.li.div.string) File "", line 5 print(producer_entries.li.div.string) ^ SyntaxError: invalid syntax >>> print (producer_entries.li.div.string) Traceback (most recent call last): File "", line 1, in NameError: name 'producer_entries' is not defined >>> from bs4 import BeautifulSoup >>> with open("ecologicalpyramid.html","r") as ecological_pyramid: ... soup = next(ecological_pyramid,"lxml") ... producer_entries = soup.find("ul") ... print(producer_entries.li.div.string) ... As no doubt you can see, the last line, indented as it is, does not provide the output that the book's text says it will return - ie the word 'plants' If I do not indent it, it returns an 'invalid syntax error' stating that 'producer_entries' is not defined. Though code in the previous line is meant to do just that - isn't it ? -- https://mail.python.org/mailman/listinfo/python-list
Re: Python console rejects an object reference, having made an object with that reference as its name in previous line
I had another attempt at inputting the code perhaps with the right indentation, I still get an error return, but not one that indicates that the code has not been read, as you suggested. re:- Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> with open("ecologicalpyramid.html","r") as ecological_pyramid: ... soup = BeautifulSoup(ecological_pyramid,"lxml") ... producer_entries = soup.find("ul") File "", line 3 producer_entries = soup.find("ul") ^ SyntaxError: invalid syntax >>> from bs4 import BeautifulSoup File "", line 1 from bs4 import BeautifulSoup If, as you suggest I left a free line after the "with open( etc" line, console returns an error, if I leave a free line after the "soup = etc" line which comes after, again I get an error return, my only point is that with the above input, console return does not seem to infer that soup has not been defined. You recommend that I put all the code into a file then run it - how do I do that ? I am new to Python, as you might have gathered. Thank you for your help. Yours Simon -- https://mail.python.org/mailman/listinfo/python-list
Python console rejects an object reference, having made an object with that reference as its name in previous line
Dear Python programmers, Having input the line of code in text: cd Soup to the Windows console, and having put the file 'EcologicalPyramid.html' into the Directory 'Soup', on the C drive, in accordance with instructions I input the following code to the Python console, as given on page 30 of 'Getting Started with Beautiful Soup': Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> from bs4 import BeautifulSoup >>> with open("ecologicalpyramid.html","r") as ecological_pyramid: ... soup = BeautifulSoup(ecological_pyramid,"lxml") ... producer_entries = soup.find("ul") ^ SyntaxError: invalid syntax >>> producer_entries = soup.find("ul") Traceback (most recent call last): File "", line 1, in NameError: name 'soup' is not defined >>> ^ so I cannot proceed with the next line withh would 've been : print(producer_entries.li.div.string) which would've given (according to the book) the output: --- plants Maybe that is getting a bit far ahead, but I can't quite see where I have gone wrong - 'soup' has been defined as an object made of file 'EcologicalPyramid.html I hope you can help me on this point. Yours Simon -- https://mail.python.org/mailman/listinfo/python-list
Re: Text Code(from 'Getting Started in Beautiful Soup' re: cd Soup , returns 'Syntax Error, invalid syntax'
Thanks Guys This book keeps swapping from the Python console to the Windows - without telling you, but it is the only book out there on 'Beautiful Soup' so I have got to put up with it. There's more problems with it, but I will start a new thread in regard of, I don't know if its related to the above or not. Yours Simon. -- https://mail.python.org/mailman/listinfo/python-list
Text Code(from 'Getting Started in Beautiful Soup' re: cd Soup , returns 'Syntax Error, invalid syntax'
At the start of Chapter 3 of 'Getting Started in Beautiful Soup' it has said to create a html file, 'ecological pyramid.html' - which I have already done re: plants 10 algae 10 deer 1000 deer 1000 rabbit 2000 fox 100 bear 100 lion 80 tiger 50 and ran it okay in 'Explorer', and text then says to save it to a folder named 'Soup' which I have done. On the next page (30) it says to navigate to that folder with the following code to the python console :- cd Soup however console rejects that code with the following return: - Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> cd Soup File "", line 1 cd Soup ^ SyntaxError: invalid syntax >>> ---- Thank you for reading, hope you can help. Yours Simon Evans -- https://mail.python.org/mailman/listinfo/python-list
Re: python 2.7 and unicode (one more time)
Hi Peter Otten re: There is no assignment soup_atag = whatever but there is one to atag. The whole session should when you omit the offending line > atag = soup_atag.a or insert soup_atag = soup before it. Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib2 >>> from bs4 import BeautifulSoup >>> html_atag = """Test html a tag example ... http://www.packtpub.com'>Home ... >> soup = BeautifulSoup(html_atag,'lxml') >>> atag = soup.aprint(atag) >>> atag = soup.a >>> print(atag) http://www.packtpub.com'>Home >>> type(atag) >>> tagname = atag.name >>> print tagname a >>> atag.name = 'p' >>> print (soup) Test html a tag example http://www.packtpub.com'>Home >>> atag.name = 'p' >>> print(soup) Test html a tag example http://www.packtpub.com'>Home >>> atag.name = 'a' >>> print(soup) Test html a tag example http://www.packtpub.com'>Home >>> soup_atag = soup >>> atag = soup_atag.a >>> print (atag['href']) http://www.packtpub.com'>Home >> Thank you. Yours Simon. -- https://mail.python.org/mailman/listinfo/python-list
Tag objects in Beautiful Soup
Re:'Accessing the Tag object from Beautiful Soup' (page 22-25 - Getting Started with Beautiful Soup) So far the code to python27 runs as given in the book, re: - >>> html_atag = """Test html a tag example ... http://www.packtpub.com'>Home ... >> soup = BeautifulSoup(html_atag,'lxml') >>> atag = soup.a >>> print(atag) http://www.packtpub.com'>Home</a> <a href=" http=""> >>> type(atag) >>> >>> tagname = atag.name >>> print tagname a >>> atag.name = 'p' >>> print (soup) Test html a tag example http://www.packtpub.com'>Home</a> <a href=" http=""> then under the next Sub heading : 'Attributes of a Tag object' text reads : atag = soup_atag.a print (atag['href']) #output http://www.packtpub.com however when I put this code to the console I get error returns at the first line re:- >>> atag = soup_atag.a Traceback (most recent call last): File "", line 1, in NameError: name 'soup_atag' is not defined >>> -------- Can anyone tell me where I am going wrong or where the text is wrong ? So far the given code has run okay, I have put to the console everything the text tells you to. Thank you for reading. Simon Evans -- https://mail.python.org/mailman/listinfo/python-list
Re: How do you download and install HTML5TreeBuilder ?
re: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>pip install html5lib Downloading/unpacking html5lib Running setup.py (path:c:\users\intela~1\appdata\local\temp\pip_build_Intel At om\html5lib\setup.py) egg_info for package html5lib Downloading/unpacking six (from html5lib) Downloading six-1.8.0-py2.py3-none-any.whl Installing collected packages: html5lib, six Running setup.py install for html5lib Successfully installed html5lib six Cleaning up... C:\Users\Intel Atom> - Thanks Mark. -- https://mail.python.org/mailman/listinfo/python-list
How do you download and install HTML5TreeBuilder ?
Dear Programmers, I have installed the HTMLParserTreebuilder and LXMLTreeBuilder downloads to my Python2.7 console, using the Windows Console 'pip install' procedure. I downloaded HTML5 files and installed them to my Python2.7 directory, and went through the 'pip install' procedure, but this did not work. I do not know whether it is because different procedure must be followed for HTML5, or that I downloaded the wrong files, the files I downloaded and attempted to install were the following three :- html5lib-0.999(1).tar.gz html5lib-0.999.tar.gz HTMLParser-0.0.2.tar.gz The Windows 7.0 Console returned the following in response :- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>pip install HTML5 Downloading/unpacking HTML5 Could not find any downloads that satisfy the requirement HTML5 Cleaning up... No distributions at all found for HTML5 Storing debug log for failure in c:\users\intela~1\appdata\local\temp\tmp4pxazz C:\Users\Intel Atom>pip install HTML5 Downloading/unpacking HTML5 Could not find any downloads that satisfy the requirement HTML5 Cleaning up... No distributions at all found for HTML5 Storing debug log for failure in c:\users\intela~1\appdata\local\temp\tmp81fbka C:\Users\Intel Atom>pip install HTML5 Downloading/unpacking HTML5 Could not find any downloads that satisfy the requirement HTML5 Cleaning up... No distributions at all found for HTML5 Storing debug log for failure in c:\users\intela~1\appdata\local\temp\tmphaw01m C:\Users\Intel Atom> I suppose my main conundrum is from where can I download a version of the HTML5 Treebuilder that will install using pip. It doesn't help that HTML5 also happens to be the name of some video editing software. Thank you for reading. PS: If anyone is upset about 'one line paragraphs' and other such petulancies, then please decline to respond, seeing as far as I'm concerned such trivialities are besides the point, and are of no help, so vent your ire elsewhere. YOurs Simon Evans. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
I input to the cmd console 'pip install html5lib' but again got an error return. I thought one of the participants was unhappy about single line spacing (re: single line paragraphs') Okay I will go back to single line spacing, I don't think it is all that important, really. Anyway this is my console's response:- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>pip install html5lib 'pip' is not recognized as an internal or external command, operable program or batch file. C:\Users\Intel Atom> -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
I input 'pip install html5lib' to the Python 2.7 console and got : >>> pip install html5lib File "", line 1 pip install html5lib ^ SyntaxError: invalid syntax >>> I am not sure what you mean about 'single line paragraphs'. I put my text into double line spacing in my last missive, I left the code input/ output in single line spacing as that is how it reads from the console, after all who am I to alter it? Regarding 'context' if you are referring to the text I am using, it is from 'Getting Started in Beautiful Soup' by Vineeth G. Nair. For what its worth some of the subsequent code in the book runs, but not all, and I think this may be due to the parser installation factor, and I wanted to work through the book (112 pages) without any errors being returned. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
What I meant to say was I can't get the html5 or the html parsers to install, I have got their downloads in their respective directories in the downloads directory. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
Oh I don't mind quoting console output, I just thought I'd be sparing you unnecessary detail. output was going nicely as I input text from my 'Getting Started with Beautiful Soup' even when the author reckoned things would go wrong - due to lxml not being installed, things went right, because I had already installed it, re: page 17 Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib2 >>> from bs4 import BeautifulSoup >>> url = "http://www.packtpub.com/books"; >>> page = urllib2.urlopen(url) >>> soup_packtpage = BeautifulSoup(page) >>> with open("foo.html","r") as foo_file: ... soup_foo = Soup(foo_file) File "", line 2 soup_foo = Soup(foo_file) ^ IndentationError: expected an indented block >>> soup_foo= BeautifulSoup("foo.html") page 18 >>> print(soup_foo) foo.html >>> soup_url = BeautifulSoup("http://www.packtpub.com/books";) >>> print(soup_url) http://www.packtpub.com/books >>> helloworld = "Hello World" >>> soup_string = BeautifulSoup(helloworld) >>> print(soup_string) Hello World page 19: no code in text on this page page 20 >>> soup_xml = BeautifulSoup(helloworld,features= "xml") >>> soup_xml = BeautifulSoup(helloworld,"xml") >>> print(soup_xml) Hello World >>> soup_xml = BeautifulSoup(helloworld,features = "xml") >>> print(soup_xml) Hello World >>> Then on bottom of page 20 it says 'we should install the required parsers using easy-install,pip or setup.py install' but as I can't get the downloads of html or html5 parsers, text code halfway down returns statutory response regarding requisite parser needing to be installed, re: page 21 >>> invalid_html = '>> soup_invalid_html = BeautifulSoup(invalid_html,'lxml') >>> print(soup_invalid_html) >>> soup_invalid_html = BeautifulSoup(invalid_html,'html5lib') Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\bs4\__init__.py", line 155, in __init__ % ",".join(features)) ValueError: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library? >>> -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
I have got the html5lib-0.999.tar.gz and the HTMLParser-0.0.2.tar.gz files in my Downloads the problem is how I install them to Python2.7. The lxml-3.3.3.win32-py2.7 is an exe file, which upon clicking will install but obviously the html and the html5 installations are not so straightforward. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
Dear Mark Lawrence, I have tried inputting the code in the first link, re: >>> import lxml >>> import lxml.etree >>> import bs4.builder.htmlparser Traceback (most recent call last): File "", line 1, in ImportError: No module named htmlparser >>> import bs4.builder._lxml >>> import bs4.builder.html5lib Traceback (most recent call last): File "", line 1, in ImportError: No module named html5lib >>> which tells me lxml is installed, but that neither html nor html5 is installed. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
I have proceeded to click on the 'setup.py' in the html5-0.999 lib and got a python console for a few seconds, this may have been the installation of the HTML5 parser/ treebuilder - I will have to put the code that did not work to it previously to it again, hopefully it will. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
Dear Terry Reedy I am using operating system Windows 7. I put the HTML TreeBuilder / htm5 library into the Python2.7 folder. I read that the LXML Treebuilder /lmxl installs itself automatically to the Python2.7 installation, so that is why I am not having difficulty with that installation. I don't think it really matters where the lxml download ended up necessarily, all I want is to know how I can install it so it works, I cannot get any feedback because it isn't working, all I get is the automated inbuilt response about 'Do I want a treebuilder/ parser that is appropriate to the input' or words to that effect. What I want to know is how to get this lxml treebuilder/parser to run, ie: what is the protocol for running the lxml download so's it'll run, or what sort of code to I put to my python console in order to get it to run, seeing as the input suggested by the download site does not get it to run. Maybe I should rephrase my question : how do I install LXMLTreeBuilder/lxml, and how do I download and install HTMLParserTreeBuilder and LXMLTreeBuilderForXML to my Python2.7, please ? I can post the Traceback but all it says is that it doesn't recognise any input with 'html5lib' in it. I will post the console response if it is important, but I can't see how it is relevan t to my request - which is how do I get these 'treebuilder/ parsers' to install and run. -- https://mail.python.org/mailman/listinfo/python-list
Installing Parsers/Tree Builders to, and accessing these packages from Python2.7
Hi Programmers, I have downloaded, installed, and can access the LXMLTreeBuilder/lxml, from Python2.7. however I have also downloaded HTMLTreeBuilder/html5lib but cannot get console to recognize the download, even using the code the download site suggests. I did put it in the Python2.7 directory, but unlike the HTML one, it doesn't recognize it, so the import statement returns an error. Can anyone tell me how I might proceed so's these TreeBuilders/ Parsers will work on my Python console ? I also will have to install HTMLParserTreeBuilder/html.parser and LXMLTreeBuilderForXML/lxml but best to cross that bridge when gotten to, as they say. Thank you for reading.I look forward to hearing from you. Yours Simon Evans -- https://mail.python.org/mailman/listinfo/python-list
Code to Python 27 prompt to access a html file stored on C drive
Dear Programmers, I want to access a html file on my C drive, in the Python 27 prompt, all the examples I come across seem to require for access for the html file be on a server, rather than on the same computer's C drive. I want to do this as a prerequisite to writing webscraping code, surmising that if I can get the Python 27 prompt (inclusive of 'Beautiful Soup''Urllib' 'Requests' downloads ) to output pertinent html code from a html document, then I can proceed to use similar code to ouput html code from URL addresses, such as 'RacingPost.com' 'SportingLife.com''Oddschecker.com' and 'Bestbetting.com' which is what I am interested in working on. Hope you can help. Yours Simon Evans. -- https://mail.python.org/mailman/listinfo/python-list
Re: Suitable Python code to scrape specific details from web pages.
On Tuesday, August 12, 2014 9:00:30 PM UTC+1, Simon Evans wrote: > Dear Programmers, > > I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. > I have tried a few of his python programs in the Python27 command prompt, but > altered them from accessing data using links say from the Dow Jones index, to > accessing the details I would be interested in accessing from the 'Racing > Post' on a daily basis. Anyhow, the code it returns is not in the example I > am going to give, is not the information I am seeking, instead of returning > the given odds on a horse, it only returns a [], which isn't much use. > > I would be glad if you could tell me where I am going wrong. > > Yours faithfully > > Simon Evans. > > > > >>>import urllib > > >>>import re > > >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? > > > > race_id=600048r_date=2014-05-08#raceTabs=sc_") > > htmltext = htmlfile.read() > > regex = '1http://www.racingpost.com/horses/horse_home.sd? > > > > horse_id=758752"onclick="scorecards.send("horse_name":):return > Html.popup(this, > > > > {width:695,height:800})"title="Full details about this HORSE">Lively > > > > Baron9/4F' > > >>>pattern = re.compile(regex) > > >>>odds=re.findall(pattern,htmltext) > > >>>print odds > > [] > > >>> > > > > >>>import urllib > > >>>import re > > >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? > > > > >>>race_id=600048r_date=2014-05-08#raceTabs=sc_") > > >>>htmltext = htmlfile.read() > > >>>regex = '' > > >>>pattern = re.compile(regex) > > >>>odds=re.findall(pattern,htmltext) > > >>>print odds > > [] > > >>> > > --- Dear Programmers, Thank you for your responses. I have installed 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup' book, but can't seem to make any progress with it, I am too thick to make much use of it. I was hoping I could scrape specified stuff off Web pages without using it. I have installed 'Requests' also, is there any code I can use that you can suggest that can access the sort of Web page values that I have referred to ? such as odds, names of runners, stuff like that off the 'inspect element' or 'source' htaml pages, on www.Racingpost.com. -- https://mail.python.org/mailman/listinfo/python-list
Suitable Python code to scrape specific details from web pages.
Dear Programmers, I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. I have tried a few of his python programs in the Python27 command prompt, but altered them from accessing data using links say from the Dow Jones index, to accessing the details I would be interested in accessing from the 'Racing Post' on a daily basis. Anyhow, the code it returns is not in the example I am going to give, is not the information I am seeking, instead of returning the given odds on a horse, it only returns a [], which isn't much use. I would be glad if you could tell me where I am going wrong. Yours faithfully Simon Evans. >>>import urllib >>>import re >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? race_id=600048r_date=2014-05-08#raceTabs=sc_") htmltext = htmlfile.read() regex = '1http://www.racingpost.com/horses/horse_home.sd? horse_id=758752"onclick="scorecards.send("horse_name":):return Html.popup(this, {width:695,height:800})"title="Full details about this HORSE">Lively Baron9/4F' >>>pattern = re.compile(regex) >>>odds=re.findall(pattern,htmltext) >>>print odds [] >>> >>>import urllib >>>import re >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? >>>race_id=600048r_date=2014-05-08#raceTabs=sc_") >>>htmltext = htmlfile.read() >>>regex = '' >>>pattern = re.compile(regex) >>>odds=re.findall(pattern,htmltext) >>>print odds [] >>> --- -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Programmers, I noticed a couple of typos in my previous message, so have now altered them thus :- Dear Programmers, As anticipated, it has not been to long before I have encountered further difficulty. At the top of page 16 of 'Getting Started with Beautiful Soup" it gives code to be input, whether to the Python or Windows command prompt I am not sure, but both seem to be resistant to it. I quote the response to the code below, the code input being :- helloworld = "Hello World" soup_string = BeautifulSoup(helloworld) to Windows Command prompt this gives :- -- SyntaxError: invalid syntax >>> helloworld = "HelloWorld" >>> soup_string = BeautifulSoup(helloworld) Traceback (most recent call last): File "", line 1, in NameError: name 'BeautifulSoup' is not defined -- I have been told by one of the programmers, that I ought be inputting this to the Python command prompt (the book doesn't spacify), but that doesn't take either re:- -- >>>helloworld = HelloWorld" >>>soup_string = BeautifulSoup(helloworld) Traceback (most recent call last): File "", line 1, in NameError: name 'BeautifulSoup' is not defined >>> -- Looking at the bottom of page 16, there is more code for the inputting of, that again does not take to the Windows Command Prompt or the Python command prompt, re: import urllib2 from bs4 import BeautifulSoup url = "http://www.packtpub.com/books"; page = urllib2.urlopen(url) soup_packtpage = BeautifulSoup(page) returns to the Windows Command prompt:- -- >>>import urllib2 Traceback (most recent call last): File "", line1, in ImportError: No module named 'urllib2' >>> -- returns to the Python command prompt :- -- >>> import urllib2 >>> from bs4 import BeautifulSoup >>> url = "http://www.packtpub.com/books"; >>> page = urllib2.urlopen(url) Traceback (most recent call last): File "C\Python27\lib\urllib2.py",line 127, in urlopen return_opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py",line 410, in open response = meth(req, response) File "C:\Python27\lib\urllib2.py", oine 523, in http_response 'http', request, response, code, msg, hdrs) File"C:\Python27\lib\urllib2.py", line 448, in error return self._call_chain(*args) File "C:/Python27/lib/urllib2.py",line 382, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, masg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden - Anway I hope you can tell me what is amiss, there is no point in my proceeding with the book (about 111 pages all told) until I find out why it won't take. I realise I have been told to learn python in order to make things less painful, but I don't see why code written in the book does not take. Thank you for reading. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Programmers, As anticipated, it has not been to long before I have encountered further difficulty. At the top of page 16 of 'Getting Started with Beautiful Soup" it gives code to be input, whether to the Python or Windows command prompt I am not sure, but both seem to be resistant to it. I quote the response to the code below, the code input being :- helloworld = "Hello World" soup_string = BeautifulSoup(helloworld) to Windows Command prompt this gives :- -- SyntaxError: invalid syntax >>> helloworld = "HelloWorld" >>> soup_string = BeautifulSoup(helloworld) Traceback (most recent call last): File "", line 1, in NameError: name 'BeautifulSoup' is not defined -- I have been told by one of the programmers, that I ought be inputting this to the Python command prompt (the book doesn't spacify), but that doesn't take either re:- -- >>>helloworld = HelloWorld" >>>soup_string = BeautifulSoup(helloworld) Traceback (most recent call last): File "", line 1, in NameError: name 'BeautifulSoup' is not defined >>> -- Looking at the bottom of page 16, there is more code for the inputting of, that again does not take to the Windows Command Prompt or the Python command prompt, re: import urllib2 from bs4 import BeautifulSoup url = "http://www.packtpub.com/books"; page = urllib2.urlopen(url) soup_packtpage = BeautifulSoup(page) returns to the Windows Command prompt:- -- >>>import urllib2 Traceback (most recent call last): File "", line1, in ImportError: No module named 'urllib2' >>> -- returns to the Python command prompt :- -- >>> import urllib2 >>> from bs4 import BeautifulSoup >>> url = "http://www.packtpub.com/books"; >>> page = urllib2.urlopen(url) Traceback (most recent call last): File "C\Python27\lib\urllib2.py",line 127, in urlopen return_opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py",line 410, in open response = meth(req, response) File "C:\Pyton27\lib\urllib2.py", oine 523, in http_response 'http', request, response, code, msg, hdrs) File"C:\Python27\lib\urllib2.py", line 448, in error return self._call_chain(*args) File "C:/Python27/lib/urllib2.py",line 382, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, masg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden - Anway I hope you can tell me what is amiss, there is no point in my proceeding with the book (about 111 pages all told) until I find out why it won't take. I realise I have been told to learn python in order to make things less painful, but I don't see why code written in the book does not take. Thank you for reading. I thought I might as well include, so's you might be able to see where things are going astray. The Windows command prompt :- -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Programmers, I downloaded Peazip, which doesn't remove file/ folder hierarchy. I unzipped it and input the same code to the console and it installed Beautiful Soup 4 okay re:- - Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install running install running build running build_py creating build creating build\lib creating build\lib\bs4 copying bs4\dammit.py -> build\lib\bs4 copying bs4\element.py -> build\lib\bs4 copying bs4\testing.py -> build\lib\bs4 copying bs4\__init__.py -> build\lib\bs4 creating build\lib\bs4\builder copying bs4\builder\_html5lib.py -> build\lib\bs4\builder copying bs4\builder\_htmlparser.py -> build\lib\bs4\builder copying bs4\builder\_lxml.py -> build\lib\bs4\builder copying bs4\builder\__init__.py -> build\lib\bs4\builder creating build\lib\bs4\tests copying bs4\tests\test_builder_registry.py -> build\lib\bs4\tests copying bs4\tests\test_docs.py -> build\lib\bs4\tests copying bs4\tests\test_html5lib.py -> build\lib\bs4\tests copying bs4\tests\test_htmlparser.py -> build\lib\bs4\tests copying bs4\tests\test_lxml.py -> build\lib\bs4\tests copying bs4\tests\test_soup.py -> build\lib\bs4\tests copying bs4\tests\test_tree.py -> build\lib\bs4\tests copying bs4\tests\__init__.py -> build\lib\bs4\tests running install_lib creating c:\Python27\Lib\site-packages\bs4 creating c:\Python27\Lib\site-packages\bs4\builder copying build\lib\bs4\builder\_html5lib.py -> c:\Python27\Lib\site-packages\bs4\ builder copying build\lib\bs4\builder\_htmlparser.py -> c:\Python27\Lib\site-packages\bs 4\builder copying build\lib\bs4\builder\_lxml.py -> c:\Python27\Lib\site-packages\bs4\buil der copying build\lib\bs4\builder\__init__.py -> c:\Python27\Lib\site-packages\bs4\b uilder copying build\lib\bs4\dammit.py -> c:\Python27\Lib\site-packages\bs4 copying build\lib\bs4\element.py -> c:\Python27\Lib\site-packages\bs4 copying build\lib\bs4\testing.py -> c:\Python27\Lib\site-packages\bs4 creating c:\Python27\Lib\site-packages\bs4\tests copying build\lib\bs4\tests\test_builder_registry.py -> c:\Python27\Lib\site-pac kages\bs4\tests copying build\lib\bs4\tests\test_docs.py -> c:\Python27\Lib\site-packages\bs4\te sts copying build\lib\bs4\tests\test_html5lib.py -> c:\Python27\Lib\site-packages\bs 4\tests copying build\lib\bs4\tests\test_htmlparser.py -> c:\Python27\Lib\site-packages\ bs4\tests copying build\lib\bs4\tests\test_lxml.py -> c:\Python27\Lib\site-packages\bs4\te sts copying build\lib\bs4\tests\test_soup.py -> c:\Python27\Lib\site-packages\bs4\te sts copying build\lib\bs4\tests\test_tree.py -> c:\Python27\Lib\site-packages\bs4\te sts copying build\lib\bs4\tests\__init__.py -> c:\Python27\Lib\site-packages\bs4\tes ts copying build\lib\bs4\__init__.py -> c:\Python27\Lib\site-packages\bs4 byte-compiling c:\Python27\Lib\site-packages\bs4\builder\_html5lib.py to _html5l ib.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\builder\_htmlparser.py to _html parser.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\builder\_lxml.py to _lxml.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\builder\__init__.py to __init__ .pyc byte-compiling c:\Python27\Lib\site-packages\bs4\dammit.py to dammit.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\element.py to element.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\testing.py to testing.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_builder_registry.py to test_builder_registry.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_docs.py to test_docs .pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_html5lib.py to test_ html5lib.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_htmlparser.py to tes t_htmlparser.pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_lxml.py to test_lxml .pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_soup.py to test_soup .pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\test_tree.py to test_tree .pyc byte-compiling c:\Python27\Lib\site-packages\bs4\tests\__init__.py to __init__.p yc byte-compiling c:\Python27\Lib\site-packages\bs4\__init__.py to __init__.pyc running install_egg_info Writing c:\Python27\Lib\site-packages\beautifulsoup4-4.1.0-py2.7.egg-info c:\Beautiful Soup> Thank you for your thoughtful help, I am sure I will be needing more though, in the not too distant future. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I have input the above code by copy and pasting to the Idle python console, as the python 2.7 command prompt is fussy about the indentation on the eleventh line down, if I then indent it, it replies that the indentation is unnecessary of unexpected, and if I don't it says an indentation is expected. However when I get to the next lines of code - in the Idle prompt re: C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install Again it does not recognise 'bs4'. I think having used 'Just unzip it' instead of 'WinZip' may have caused this problem, in the first place ,as when I looked at the WinZip version at a local net café, it did have a folder hierarchy, however I wanted, and still want to skimp the £25 fee for WinZip, which nowadays you can't seem to be able to do. I never asked for the darn files to be zipped, so why ought I pay to have them unzipped, being my contention. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I downloaded the get-pip.py file. I installed it to the same folder on my C drive as the Beautiful Soup one in which the Beautiful Soup 4 downloads was unzipped to. I changed directory to the folder on the Command Prompt, as you instructed in step 2. I input the code to the console you gave on step 3), that returned some code, as quoted below. I then input the code you gave on step 4) but Console seems to reject or not recognise 'pip' as a term. I am sure quoting the actual prompt response can explain things better than I : --- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>python get-pip.py Downloading/unpacking pip from https://pypi.python.org/packages/py2.py3/p/pip/pi p-1.5.5-py2.py3-none-any.whl#md5=03a932d6f82a3887d8de1cdb837c87ed Installing collected packages: pip Found existing installation: pip 1.5.4 Uninstalling pip: Successfully uninstalled pip Successfully installed pip Cleaning up... c:\Beautiful Soup>pip install beautifulsoup4 'pip' is not recognized as an internal or external command, operable program or batch file. c:\Beautiful Soup> Perhaps I oughtn't have downloaded the pip file to the same directory as the Beautiful Soup ? I will have a try at transferring the file to another folder and running the code you gave again. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Ian, and other programmers, thank you for your advice. I am resending the last message because this twattish cut and paste facility on my computer has a knack of chopping off ones original message, I will try to convey the right message this time : I have removed the original Beautiful Soup 4 download, that I had unzipped to my Beautiful Soup directory on the C drive. I downloaded the latest version of Beautiful Soup 4 from the Crummy site. I unzipped it, and removed the contents of the unzipped directory and placed contents in my Beautiful Soup directory, and again had the same output to my console re: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install running install running build running build_py error: package directory 'bs4' does not exist c:\Beautiful Soup> --- I have made a note of all the contents of the downloaded and unzipped BS4,ie the contents of my Beautiful Soup folder on the C drive, which is as follows: --- running install running build running build_py error: package directory 'bs4' does not existinit _html5lib _htmlparser _lxml 6.1 AUTHORS conf COPYING dammit demonstration_markup element index.rst Makefile NEWS PGK-INFO README setup test_builder_registry test_docs test_html5lib test_htmlparser text_lxml test_soup test_tree testing TODO I can see no bs4 folder within the contents. I can not see any setup.py file either, but this is how I downloaded it. I am only following instructions as suggested. I do not understand why it is not working. I hope someone can direct me in the right direction, as I seem to be stuck, and I don't think it has much bearing on my fluency or lack of it with Python. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I have removed the original Beautiful Soup 4 download, that I had unzipped to my Beautiful Soup directory on the C drive. I downloaded the latest version of Beautiful Soup 4 from the Crummy site. I unzipped it, and removed the contents of the unzipped directory and placed contents in my Beautiful Soup directory, and again had the same output to my console re: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install c:\Beautiful Soup> --- I have made a note of all the contents of the downloaded and unzipped BS4,ie the contents of my Beautiful Soup folder on the C drive, which is as follows: --- running install running build running build_py error: package directory 'bs4' does not existinit _html5lib _htmlparser _lxml 6.1 AUTHORS conf COPYING dammit demonstration_markup element index.rst Makefile NEWS PGK-INFO README setup test_builder_registry test_docs test_html5lib test_htmlparser text_lxml test_soup test_tree testing TODO I can see no bs4 folder within the contents. I can not see any setup.py file either, but this is how I downloaded it. I am only following instructions as suggested. I do not understand why it is not working. I hope someone can direct me in the right direction, as I seem to be stuck, and I don't think it has much bearing on my fluency or lack of it with Python. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I did download the latest version of Beautiful Soup 4 from the download site, as the book suggested. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Ian, The book does recommend to use Python 2.7 (see bottom line of page 10). The book also recommends to use Beautiful Soup 4. You are right that in that I have placed the unzipped BS4 folder within a folder, and I therefore removed the contents of the inner folder and transferred them to the outer folder. The console now can access the contents of the Beautiful Soup folder, but it is still having problems with it as the last output to my console demonstrates : Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install running install running build running build_py error: package directory 'bs4' does not exist c:\Beautiful Soup> -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Thank you for your advice. I did buy a book on Python, 'Hello Python' but the code in it wouldn't run, so I returned it to the shop for a refund. I am going to visit the local library to see if they have any books on Python. I am familiar with Java and Pascal, and looking at a few You tubes on the subject, thought it was not much different, and shares many of the oop concepts (variables, initializing, expressions, methods, and so on, but I realize there is no point in walking backwards in new territory. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
The version of Python the book seems to be referring to is 2.7, re: bottom of page 10- 'Pick the Path variable and add the following section to the Path variable: ;C:\PythonXY for example C:\Python 27' The version of Beautiful Soup seems to be Beautiful Soup 4 as at the top of page 12 it states: '1.Download the latest tarball from https://pypi.python.org/packages/source/b/beautifulsoup4/.' I have downloaded and unzipped to a folder called 'Beautiful Soup' on the C drive the Beautiful Soup 4 version. I am using the Python 2.7 console and IDLE, I have removed the 3.4 version. All the same I seem to be having difficulties again as console wont accept the code it did when it was the previous version of BS that I used yesterday. I realise I would not be having this problem if I proceeded to input the 'Hello World' code on the Python console, but as said, the text never specifically said 'change to Python 2.7 console'. I thought the problem was with the BS version and so changed it, but now can't even get as far as I had before changing it. Anyhow be that as it may, this is the console response to my input: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>Beautiful Soup>c:\Python27\python setup.py install 'Beautiful' is not recognized as an internal or external command, operable program or batch file. c:\Beautiful Soup> -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Hi Ian, thank you for your help. Yes that is the book by Vineeth J Nair. At the top of page 12, at step 1 it says : 1.Download the latest tarball from https://pypi.python.org/packages/source/b/beautifulsoup4/. So yes, the version the book is dealing with is beautiful soup 4. I am using Pyhon 2.7, I have removed Python 3.4. Also on the bottom of page 10, Mr Nair states: Pick the path variagble and add the following section to the Path variable: ;C:\PythonXY for example C:\Python27 Which tells me that the Python version cited in the book must be 2.7 I downloaded beautiful soup 4 last night. I unzipped it with 'Just unzip it' to a folder I called Beautiful Soup, the same as I did with the previous beautiful soup download. The console return is as below, showing that I am now facing the same conundrum as yesterday, before changing my version of Beautiful Soup. re: Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>Beautiful Soup>c:\Python27\python setup.py install 'Beautiful' is not recognized as an internal or external command, operable program or batch file. c:\Beautiful Soup> -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
- but wait a moment 'BeautifulSoup4 works with 2.6+ and 3.x'(Terry Reedy) - doesn't 2.6 + = 2.7, which is what I'm using with BeautifulSoup4. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
On Monday, May 12, 2014 12:19:24 AM UTC+1, Simon Evans wrote: > Yeah well at no point does the book say to start inputting the code mentioned > in Python command prompt rather than the Windows command prompt, but thank > you for your guidance anyway. > > I have downloaded the latest version of Beautiful Soup 4, but am again facing > problems with the second line of code, re:- > > --- > > > Microsoft Windows [Version 6.1.7601] > > Copyright (c) 2009 Microsoft Corporation. All rights reserved. > > > > C:\Users\Intel Atom>cd "c:\Beautiful Soup" > > > > c:\Beautiful Soup>c:\Python27\python setup.py install > > c:\Python27\python: can't open file 'setup.py': [Errno 2] No such file or > direct > > ory > > > > though that was the code I used before which installed okay see above). Can > anyone tell me where I am going wrong ? Thanks. Oh I think I see - I should be using Python 3.4 now, with BS4 ? -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Yeah well at no point does the book say to start inputting the code mentioned in Python command prompt rather than the Windows command prompt, but thank you for your guidance anyway. I have downloaded the latest version of Beautiful Soup 4, but am again facing problems with the second line of code, re:- --- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install c:\Python27\python: can't open file 'setup.py': [Errno 2] No such file or direct ory though that was the code I used before which installed okay see above). Can anyone tell me where I am going wrong ? Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I have downloaded Beautiful Soup 3, I am using Python 2.7. I understand from your message that I ought to use Python 2.6 or Python 3.4 with Beautiful Soup 4, the book I am using 'Getting Started with Beautiful Soup' is for Beautiful Soup 4. Therefore I gather I must re-download Beautiful Soup and get the 4 version, dispose of my Python 2.7 and reinstall Python 3.4. I am sure I can do this, but doesn't the above information suggest that the only Python grade left that might work with Beautiful Soup 3 would by Python 2.7 - which is the configuration I have at present, though I am not perfectly happy, as it is not taking code in the book (meant for BS4) such as the following on page 16 : helloworld = "Hello World" re:- c:\Beautiful Soup>helloworld = "Hello World" 'helloworld' is not recognized as an internal or external command, operable program or batch file. I take it that this response is due to using code meant for BS4 with Python 2.6/ 3.4, rather than BS3 with Python 2.7 which is what I am currently using. If so I will change the configurations. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Dear Chris Angelico, Yes, you are right, I did install Python 3.4 as well as 2.7. I have removed Python 3.4, and input the code you suggested and it looks like it has installed properly, returning the following code:- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>c:\Python27\python setup.py install running install running build running build_py creating build creating build\lib copying BeautifulSoup.py -> build\lib copying BeautifulSoupTests.py -> build\lib running install_lib copying build\lib\BeautifulSoup.py -> c:\Python27\Lib\site-packages copying build\lib\BeautifulSoupTests.py -> c:\Python27\Lib\site-packages byte-compiling c:\Python27\Lib\site-packages\BeautifulSoup.py to BeautifulSoup.p yc byte-compiling c:\Python27\Lib\site-packages\BeautifulSoupTests.py to BeautifulS oupTests.pyc running install_egg_info Writing c:\Python27\Lib\site-packages\BeautifulSoup-3.2.1-py2.7.egg-info c:\Beautiful Soup> Would that things were as straightforward as they are in the books, but anyway thank you much for your assistance, I'd still be typing the zillionth variation on the first line without your help. I don't doubt though that I will be coming unstuck in the not distant future. Until then, again thank you for your selfless help. -- https://mail.python.org/mailman/listinfo/python-list
Re: How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
Thank you everyone who replied, for your help. Using the command prompt console, it accepts the first line of code, but doesn't seem to accept the second line. I have altered it a little, but it is not having any of it, I quote my console input and output here, as it can probably explain things better than I :- Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\Intel Atom>cd"c:\Beautiful Soup" The filename, directory name, or volume label syntax is incorrect. C:\Users\Intel Atom>cd "c:\Beautiful Soup" c:\Beautiful Soup>python setup.py install. File "setup.py", line 22 print "Unit tests have failed!" ^ SyntaxError: invalid syntax c:\Beautiful Soup>python setup.py install" File "setup.py", line 22 print "Unit tests have failed!" ^ SyntaxError: invalid syntax c:\Beautiful Soup> I have tried writing "python setup.py install" ie putting the statement in inverted commas, but the console still seems to reject it re:- c:\Beautiful Soup>"python setup. py install" '"python setup. py install"' is not recognized as an internal or external comman d, operable program or batch file. c:\Beautiful Soup> Again I hope you python practitioners can help. I am only on page 12, and have another 99 pages to go, so can only hope it gets easier. -- https://mail.python.org/mailman/listinfo/python-list
How do I access 'Beautiful Soup' on python 2.7 or 3.4 , console or idle versions.
I am new to Python, but my main interest is to use it to Webscrape. I have downloaded Beautiful Soup, and have followed the instruction in the 'Getting Started with Beautiful Soup' book, but my Python installations keep returning errors, so I can't get started. I have unzipped Beautiful Soup to a folder of the same name on my C drive, in accordance with the first two steps of page 12 of the aforementioned publication, but proceeding to navigate to the program as in step three, re: "Open up the command line prompt and navigate to the folder where you have unzipped the folder as follows: cd Beautiful Soup python setup python install " This returns on my Python 27 : >>> cd Beautiful Soup File "",line 1 cd Beautiful Soup ^ SyntaxError: invalid syntax >>> also I get: >>> cd Beautiful Soup SyntaxError: invalid syntax >>> to my IDLE Python 2.7 version, same goes for the Python 3.4 installations. Hope someone can help. Thanks in advance. -- https://mail.python.org/mailman/listinfo/python-list