[Tutor] short url processor
Hey folks, I'm rewriting a short url processor for my job. I had originally written it as a multi-threaded Perl script, which works, but has socket problems causing memory leaks. Since I'm rebuilding it to use memcache, and since I was learning Python outside of work anyway, figured I'd rewrite it in Python. I'm using BaseHTTPServer, overriding do_GET and do_POST, and want to set up a custom logging mechanism so I don't have to rewrite a separate log parser, which I'll eventually rewrite in Python as well. The problem I'm having, though, is that the BaseHTTPServer setup is outputting what appears to be an apache-style log to STDOUT, but the logging.debug or logging.info calls I make in the code are also going to STDOUT despite my attempt to use logging.basicConfig() overrides and setting a filename, etc. Here's the basics of what I'm doing. Forgive my code, I've already been told it's ugly, I'm new to Python and come from a background of Perl/PHP. import struct import string,cgi,time import psycopg import logging import re import memcache from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer from time import strftime,localtime class clientThread(BaseHTTPRequestHandler): def log_my_request(self,method,request,short_url,http_code,long_url,cached,notes): logging.debug(%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s, self.client_address[0], time.strftime(%Y-%m-%d %H:%M:%S,localtime()), method, # get or post request, # requested entity short_url, # matching short_url based on entity, if any http_code, # 200, 301, 302, 404, etc long_url, # url to redirect to, if there was one cached, # 'hit', 'miss', 'miss-db', 'error' notes # extra notes for the log file only ) return def do_GET(self) # logic goes here for finding a short url form memcache, then writing the appropriate # output data to the socket, then logging happens: self.log_my_request(getpost,orig_short_url,short_url,'302',long_url,'hit','') return def main(): if mc.get('dbcheck'): # memcache already has some data print(memcache already primed with data) else: # nothing in memcache, so load it up from database print('Connecting to PG') cur.execute(SELECT count(*) FROM short_urls) ; mycount = cur.fetchone() ; print(fetching %s entries, mycount) cur.execute(SELECT short_url,long_url FROM short_urls) giant_list = cur.fetchall() # cache a marker that tells us we've already initialized memcache with db data mc.set('dbcheck','databasetest',0) # I'm sure there's a MUCH more efficient way of doing this ... multi-set of some sort? for i in giant_list: if i[0]: if i[1]: mc.set(i[0], i[1]) print(finished retrieving %s entries plus set up a new dictionary with all values % mycount) #{{ set up the socket, bind to port, and wait for incoming connections try: server = HTTPServer(('',8083), clientThread) print 'short url processing has begun' # this is where I try to tell Python that I only want my message in my log: # no INFO:username prefix, etc., and also to write it to a file logging.basicConfig(level=logging.DEBUG) logging.basicConfig(format='%(message)s', filename='/tmp/ian.txt') server.serve_forever() except KeyboardInterrupt: print '^C received, shutting down server' server.socket.close() My code runs without any errors, though I have left some code out of this Email that I didn't feel was relevant such as the logic of seeing if a short url exists in memcache, trying to fetch from the db if there was no match, and if the db lookup also fails, force-deleting short urls from memcache based on other instructions, that sort of thing. None of it deals with logging or the BaseHTTPServer code. To recap, the code runs, redirects are working, but ALL output goes to STDOUT. I can understand that print statements would go to STDOUT, but the BaseHTTPServer seems to want to write the Apache-style log to STDOUT, and my logging.info() call also prints to STDOUT instead of my file. I'd love to hear any thoughts from people that have had to deal with this. The logging is the last piece of the puzzle for me. Thanks, Ian ___ Tutor maillist - Tutor@python.org To
Re: [Tutor] short url processor
ian douglas ian.doug...@iandouglas.com wrote outputting what appears to be an apache-style log to STDOUT, but the logging.debug or logging.info calls I make in the code are also going to STDOUT despite my attempt to use logging.basicConfig() overrides and setting a filename, etc. I don;t know anything about BaseHTTPServer and not much about the logging modules however some thoughts are... How do you know they are going to stdout? Are you sure they aren't going to stderr and stderrr is not mapped to stdout (usually the default). Have you tried redirecting stderr to a file for example? As I say, just some thoughts, Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] short url processor
On 05/13/2011 05:03 PM, Alan Gauld wrote: How do you know they are going to stdout? Are you sure they aren't going to stderr and stderrr is not mapped to stdout (usually the default). Have you tried redirecting stderr to a file for example? As I say, just some thoughts, Thanks for your thoughts, Alan. I had done some testing with cmdline redirects and forget which is was, I think my debug log was going to stdout and the apache-style log was going to stderr, or the other way around... After a handful of guys in the #python IRC channel very nearly convinced me that all but 3 stdlib libraries are utter worthless crap, and telling me to download and use third-party frameworks just to fix a simple logging issue, I overrode log_request() and log message() as such: class clientThread(BaseHTTPRequestHandler): #[[[ def log_request(code, size): return def log_message(self, format, *args): open(LOGFILE, a).write(%s\n % (format%args)) ... and now the only logging going on is my own, and it's logged to my external file. Overriding log_request means that BaseHTTPServer no longer outputs its apache-style log, and overriding log_message means my other logging.info() and logging.debug() messages go out to my file as expected. -id ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] short url processor
On 05/13/2011 05:03 PM, Alan Gauld wrote: As I say, just some thoughts, I *am* curious, Alan, whether you or anyone else on the list are able to help me make this a little more efficient: cur.execute(SELECT short_url,long_url FROM short_urls) giant_list = cur.fetchall() for i in giant_list: if i[0]: if i[1]: mc.set(i[0], i[1]) At present, we have about two million short URL's in our database, and I'm guessing there's a much smoother way of iterating through 2M+ rows from a database, and cramming them into memcache. I imagine there's a map function in there that could be much more efficient? v2 of our project will be to join our short_urls table with its 'stats' table counterpart, to where I only fetch the top 10,000 URLs (or some other smaller quantity). Until we get to that point, I need to speed up the restart time if this script ever needs to be restarted. This is partly why v1.5 was to put the database entries into memcache, so we wouldn't need to reload the db into memory on every restart. Thanks, Ian ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] short url processor
On 05/14/2011 03:49 AM, ian douglas wrote: for i in giant_list: if i[0]: if i[1]: mc.set(i[0], i[1]) Until Alan comes with a more round answer, I'd suggest something along the lines of [mc.set(x, y) for (x, y) in giant_list if x and y] I'm writing this by memory, but check list comprehension in the documentation. Anyway, there are map, reduce and such functions in python, but I think that in python 3.x you have to import them. Now, the real question would be, can you use the cursor as an iterator (but without hitting the database for each new record)? Then you can skip the worst part of loading all the values in giant_list. Just an idea for Alan and the others to answer. Nick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor