Re: [Tutor] sort list alphabetically
Ok. That's the point. I think i meant case-sensitive. There are some ways described here that will me help out. Yes, the list is sorted when i print it out. It was my fault, sorry guys. Thank you a lot. mac ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] sort list alphabetically
Hallo, i have a list with the dirs/files from the current path. When i use sort() to sort the list alphabetically the list is still unsorted. How to use ? dirs_files = os.listdir(os.getcwd()) print dirs_files dirs_files.sort() print dirs_files Thank you. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] again... regular expression
Ok. There is an error i made. The links in the HTML-Site are starting with good.php so there was no way ever to find an link. re_site = re.compile(r"good\.php.+'") for a in file: z = re_site.search(a) if z != None: print z.group(0) This will give me every line starting with "good.php" but does contain not the first ' at the end, there are more tags and text which ends with ' too. So how can i tell in an regex to stop at the first found ' after good.php ??? Thank you. > Hallo. > I want to parse a website for links of this type: > > http://www.example.com/good.php?test=anything&egal=total&nochmal=nummer&so=site&seite=22";> > > - > re_site = re.compile(r'http://\w+.\w+.\w+./good.php?.+";>') > for a in file: > z = re_site.search(a) > if z != None: > print z.group(0) > > - > > I still don't understand RE-Expressions. I tried some other expressions > but didn't get it work. > > The End of the link is ">. So it should not be a problem to extract the > link but it is. > > Thank you for the help. > > mac > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] again... regular expression
Hallo. I want to parse a website for links of this type: http://www.example.com/good.php?test=anything&egal=total&nochmal=nummer&so=site&seite=22";> - re_site = re.compile(r'http://\w+.\w+.\w+./good.php?.+";>') for a in file: z = re_site.search(a) if z != None: print z.group(0) - I still don't understand RE-Expressions. I tried some other expressions but didn't get it work. The End of the link is ">. So it should not be a problem to extract the link but it is. Thank you for the help. mac ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] code improvement for beginner ?
Danny Yoo wrote: > > On Sat, 8 Oct 2005, lmac wrote: > > >>Ok. Here we go. Wanted to start my page long ago. Now is the right time. >> >>http://daderoid.freewebspace24.de/python/python1.html > > > Hi lmac, > > I'll pick out some stuff that I see; I'm sure others will be happy to give > comments too. I'll try to make sure that all the criticism I give is > constructive in nature, and if you have questions on any of it, please > feel free to ask about it. > > > I'll concentrate on the imgreg() function first. The declaration of > 'images' as a global variable looks a little weird. I do see that > 'imgreg' feeds into 'images'. Not a major issue so far, but you might > want to see if it's possible to do without the global, and explicitly pass > in 'images' as another parameter to imgreg. Globals just bother me on > principle. *grin* > The thing is i want to download the images after i got the pages. So i thought i use an global var that it is in scope at the end of the script. > > You may want to document what 'patt' and 'search' are meant to be. A > comment at the top of imgreg, like: > > """imgreg searches for a pattern 'patt' within the text 'search'. If > a match exists, adds it to the set of images, and returns 1. Else, > returns 0.""" > > will help a lot. Documenting the intent of a function is important, > because people are forgetful. No programming language can prevent memory > loss: what we should try to do is to compensate for our forgetfulness. > *grin* > > > Looking at pageimgs(): I'm not sure what 't' means in the open statement: > > f = open(filename, "rt") > > and I think that 't' might be a typo: I'm surprised that Python doesn't > complain. Can anyone confirm this? I think you may have tried to do "r+" > mode, but even then, you probably don't: you're just reading from the > file, and don't need to write back to it. > > > Looking further into pageimgs(): again, I get nervous about globals. The > use of the 'r1' global variable is mysterious. I had to hunt around to > figure out what it was it near the middle of the program. > > If anything, I'd recommend naming your global variables with more meaning. > A name like 'image_regex_patterns' will work better than 'r1'. Also, it > looks like pageimgs() is hardcoded to assume 'r1' has three regular > expressions in it, as it calls imgreg three times for each pattern in r1. > > if imgreg(r1[0],a) == 1: > continue > if imgreg(r1[1],a) == 1: > continue > imgreg(r1[2],a) > > and that looks peculiar. Because this snippet of code is also > copy-and-pasted around line 106, it appears to be a specific kind of > conceptual task that you're doing to register images. > > I think that the use of 'r1' and 'imgreg' should be intrinsically tied. > I'd recommend revising imgreg() so that when we register images, we don't > have to worry that we've called it on all the regular expressions in r1. > That is, let imgreg worry about it, not clients: have imgreg go through > r1[0:3] by itself. > > > If we incorporate these changes, the result might look something like > this: > > ### > image_regex_patterns = map(re.compile, >[r'http://\w+.\w+.\w+/i.+.gif', > r'http://\w+.\w+.\w+/i.+.jpg', > r'http://\w+.\w+.\w+/i.+.png']) This one is very good. I stumbled sometimes over map() but didn't know how to use it. This makes it easier. > def imgreg(search): > """Given a search text, looks for image urls for registration. If > a new one can be found, returns 1. Otherwise, returns 0. > > Possible bug: does not register all images in the search text, but only > the first new one it can find. > """ > for r in image_regex_patterns: > z = r.search(search) > if z != None: > x = z.group(0) > if x not in images: > images.append(x) > return 1 > return 0 > ### The purpose for storing the images in an global list was to download all images after the pages were saved and don't download an image again if it was already saved
Re: [Tutor] code improvement for beginner ?
Ok. Here we go. Wanted to start my page long ago. Now is the right time. http://daderoid.freewebspace24.de/python/python1.html Thank you. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] code improvement for beginner ?
Hi there, i wonder if i could post some of my scripts and anyone can tell me if there is a better way for coding the problem. In the way of some teaching class. ;-) Or is this mailing-list only for specific questions ? Thanks. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] find data in html file
> > >Message: 5 >Date: Fri, 30 Sep 2005 10:32:41 -0400 >From: Kent Johnson <[EMAIL PROTECTED]> >Subject: Re: [Tutor] find data in html file >Cc: tutor@python.org >Message-ID: <[EMAIL PROTECTED]> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >lmac wrote: > > >>> It's not this simple. The whole thing is that i try to use ebay.de for >>> fetching websites >>> when i give an articlenumber. The downloading of the site for a specific >>> article is no problem. >>> But to get the data like price,bidders,shipment etc without the official >>> eBayAPI is hard. >>> Maybe anyone has a solution made ? >>> >>> Thanks anyway. I tried the htmllib. This is a very good lib but i don't get >>> it to work cos >>> there is no thing for the data i want to get. This is for html-tags. >>> And to store data >>> in my own XML-files. (what i am goint to do when i get the data). >> >> > >You can try BeautifulSoup which is designed for screen-scraping: >http://www.crummy.com/software/BeautifulSoup/ > >But looking at the source for an eBay page it looks challenging. I wonder why >you don't use the eBay API to get the information you want? It seems to be >free for up to 10,000 requests a month and there is a python package to access >it. > >Kent > > > >-- > >Message: 6 >Date: Fri, 30 Sep 2005 15:55:26 +0100 >From: paul brian <[EMAIL PROTECTED]> >Subject: Re: [Tutor] find data in html file >To: Python Tutor >Message-ID: > <[EMAIL PROTECTED]> >Content-Type: text/plain; charset=ISO-8859-1 > > > >>> But to get the data like price,bidders,shipment etc without the official >>> eBayAPI is hard. >>> Maybe anyone has a solution made ? >> >> > > Ebay specifically change around their HTML codes, tags and formatting > (in quite a clever way) to stop people doing exactly what you are > trying to do. I think it changes every month. > > Like people say, use the API - You need to become an "ebay developer" > (signup) and can use your own code or the python-ebay thing for free > in "the sandbox", but must pay $100 or so to have your code verified > as "not likey to scrunch our servers" before they give you a key for > the real world. > > Its a bit of a pain, so i just hacked turbo-ebay a while back and made > do. Worked quite well really. > > >-- >-- >Paul Brian >m. 07875 074 534 >t. 0208 352 1741 > > > > I look on it. (BeutifulSoup). At eBay i have now an DevAccount. But if read clearly it is only the sandbox. Not the real eBay database, what means that i have not access to actual ongoing auctions. Am i right ? 10.000 is more then enough. The other thing is i want to write this under Linux. I use only Linux for Internet surfing etc. And the eBay-API is an windows-dll. Of cos pyEbay is working under Linux too. Thanx for the tips. I think "Ich schmeiß die Flinte ins Korn und mache alles manuell". ;-) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] find data in html file
Date: Wed, 28 Sep 2005 09:25:53 +0100 From: Ed Singleton <[EMAIL PROTECTED]> Subject: Re: [Tutor] find data in html file To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=ISO-8859-1 On 27/09/05, lmac <[EMAIL PROTECTED]> wrote: >> Hi there, >> i have a base-question. If i want to read some kind of data out of a line >> which i know the start-tag and the end-tag in an html-file how do i >> recognize >> if it's more than one line ? >> >> Example: >> >> Some textlinktext . DATA etc. >> >> I would use >text as the starting tag to localize the beginning of the DATA. >> And then as the ending tag of the DATA. But if there is \n then >> there are more than >> one line. > > Hopefully it's just a typo or something, but you appear to have your ending and tags the wrong way round. You should be closing the cell before you close the row. How do you want to get the data out? This case is simple enough that you could do a lazy (non-greedy) regex statement for it. Something like "([\s|\S]+?)" would do it. Ed It's not this simple. The whole thing is that i try to use ebay.de for fetching websites when i give an articlenumber. The downloading of the site for a specific article is no problem. But to get the data like price,bidders,shipment etc without the official eBayAPI is hard. Maybe anyone has a solution made ? Thanks anyway. I tried the htmllib. This is a very good lib but i don't get it to work cos there is no thing for the data i want to get. This is for html-tags. And to store data in my own XML-files. (what i am goint to do when i get the data). ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] find data in html file
Hi there, i have a base-question. If i want to read some kind of data out of a line which i know the start-tag and the end-tag in an html-file how do i recognize if it's more than one line ? Example: Some textlinktext . DATA etc. I would use >text as the starting tag to localize the beginning of the DATA. And then as the ending tag of the DATA. But if there is \n then there are more than one line. I hope i explained it well what i am going for. English is not my native language. Thank you. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] find() function an Tupel. Always returns -1.
hi there, i got a problem with Tupel and the find() function. I know in the document are this Keywords which i am looking for but find() always returns -1. Thanks for the help. fedora_user #!/usr/bin/python # -*- coding: utf_8 -*- import string import sys import os import urllib anf_bez = ('Startpreis:','Restzeit:','Angebotsbeginn:','Übersicht:','Artikelstandort:','Versand nach:', 'Artikelnummer:','Kategorie') end_bez = ('','MESZ','MESZ','Gebote','','','','') # Artikelnummer von dem die Infos gespeichert werden artikelno = `sys.argv[1:2]` artikelno = artikelno[2:-2] if len(artikelno) != 0: TARGET_DIR = "/opt/internet/eBay/" EBAY_HTTP = "http://cgi.ebay.de/ws/eBayISAPI.dll?ViewItem&item="; EBAY_PAGE = EBAY_HTTP + artikelno SAVE_PAGE = "" SAVE_PAGE = SAVE_PAGE + "eBay-artikel" + artikelno + ".html" SAVE_PAGE = os.path.join(TARGET_DIR,SAVE_PAGE) # webSite laden und speichern urllib.urlretrieve(EBAY_PAGE,SAVE_PAGE) # webSite öffnen und absuchen file = open(SAVE_PAGE,"rt") for a in file: asi = 0 # Suchindex für 'anf_bez' esi = 0 # Suchindex für 'end_bez' while asi < 8: anf = -1 end = -1 anf = a.find( anf_bez[asi] ) # < always returns -1, never find anything ? - if anf != -1: end = a[anf].find( end_bez[esi] ) if end != -1: print a[anf:end] asi = asi+1 esi = esi+1 print asi,esi print EBAY_PAGE print SAVE_PAGE else: print "Artikelnummer als Argument übergeben." ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] long int in list as argument for seek() function
Hi there, i want to use an long int from an list which i got from my function find_lineno(). But i got this error and i don't understand why i can not use this long as an argument. Where do i find a good documentation on errors so that i complete understand what the heck is going on. Many thanks. ERROR: --- Traceback (most recent call last): File "./extrmails.py", line 42, in ? inputfile.seek(0,li) IOError: [Errno 22] Invalid argument --- CODE-START: - inputfile=open("mails","rt") # -- def reset_inputfile(): inputfile.seek(0,0) # -- def find_lineno(string): f = -1 a = "start" found_lines = [] reset_inputfile() while len(a) != 0: a = inputfile.readline() f = a.find(string) if f != -1: found_lines.append(inputfile.tell()) return found_lines # -- from_lineno=find_lineno("From:") subj_lineno=find_lineno("Subject:") print len(subj_lineno) print len(from_lineno) reset_inputfile() for li in subj_lineno: inputfile.seek(0,li)<-- ??? ... .. -- CODE-END ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor