On Nov 10, 3:34 am, Mark Shroyer <[EMAIL PROTECTED]> wrote: > On 2007-11-10, Jonathan Gardner <[EMAIL PROTECTED]> wrote: > > What would I have done? I wouldn't have had an age matching class. I > > would have had a function that, given the datetime and a range > > specification, would return true or false. Then I would've written > > another function for matching emails. Again, it takes a specification > > and the email and returns true or false. > > There isn't much difference between > > match_calendar_month(2007, 11, message) > > and > > m = CalendarMonthMatcher(2007, 11) > m.match(message)
Yes, there isn't a world of difference between the two. But there is a world of difference between those and: match(message, before=date(2007, 12, 1), after=date(2007, 11, 1)) And you can add parameters as needed. In the end, you may have a lot of parameters, but only one match function and only one interface. > <snip> But take for example two of my app's mailbox actions -- these aren't > their real names, but for clarity let's call them ArchiveByMonth and > SaveAttachmentsByMonth. The former moves messages from previous > months into an archival mbox file ./archives/YYYY/MM.mbox > corresponding to each message's month, and the latter saves message > attachments into a directory ./attachments/YYYY/MM/. Each of these > actions would work by using either match_calendar_month() or > CalendarMonthMatcher().match() to perform its action on all messages > within a given month; then it iterates through previous months and > repeats until there are no more messages left to be processed. > > In my object-oriented implementation, this iteration is performed by > calling m.previous() on the current matcher, much like the > simplified example in my write-up. Without taking the OO approach, > on the other hand, both types of actions would need to compute the > previous month themselves; sure that's not an entirely burdensome > task, but it really seems like the wrong place for that code to > reside. (And if you tackle this by writing another method to return > the requisite (year, month) tuple, and apply that method alongside > wherever match_calendar_month() is used... well, at that point > you're really just doing object-oriented code without the "class" > keyword.) > > Furthermore, suppose I want to save attachments by week instead of > month: I could then hand the SaveAttachmentsByPeriod action a > WeekMatcher instead of a MonthMatcher, and the action, using the > matcher's common interface, does the job just as expected. (This is > an actual configuration file option in the application; the nice > thing about taking an OO approach to this app is that there's a very > straightforward mapping between the configuration file syntax and > the actual implementation.) > > It could be that I'm still "thinking in Java," as you rather > accurately put it, but here the object-oriented approach seems > genuinely superior -- cleaner and, well, with better encapsulated > functionality, to use the buzzword. > Or it could be that you are confusing two things with each other. Let me try to explain it another way. Think of all the points on a grid that is 100x100. There are 10,000 points, right? If you wanted to describe the position of a point, you could name each point. You'd have 10,000 names. This isn't very good because people would have to know all 10,000 names to describe a point in your system. But it is simple, and it is really easy to implement. But hey, we can just number the points 0 to 9999 and it gets even simpler, right? OR you could describe the points as an (x,y) pair. Now people only have to remember 200 different names--100 for the columns, 100 for the rows. Then if you used traditional numbers, they'd only have to be able to count to 100. Computer science is full of things like this. When you end up with complexity, it is probably because you are doing something wrong. My rule of thumb is if I can't explain it all in about 30 seconds, then it is going to be a mystery to everyone but myself no matter how much documentation I write. How do you avoid complexity? You take a step back, identify patterns, or pull different things apart from each other (like rows and columns), and try to find the most basic principles to guide the entire system. The very fact that you are talking about months (and thus days and weeks and years and centuries, etc...) and not generic dates means you have some more simplifying to do in your design elsewhere as well. Rewrite the SaveAttachmentsByMonth so that it calls a more generic SaveAttachmentsByDateRange function. Or better yet, have it FilterEmailsByDateRange and ExtractAttachment and SaveAttachment. Or even better, have it FilterEmailsBySpecification(date_from=X, date_to=Y) and SaveAttachmentl. Do you see the point? Your big function SaveAttachmentsByMonth is kind of like point number 735. It's easier to describe it as the point at (7,35) than as a single number. It's better to talk about the most basic functionality --- saving emails and filter emails -- rather than talking about big concepts. I call this concept "orthogonality" after the same concept in linear algebra. It's just easier when you are dealing in an orthogonal basis-- or a set of functions that do simple things and don't replicate each other's functionality. Your users will appreciate it as well. While it may be nice to have a shiny button that saves attachments by months, they'd rather they could specify the date ranges theyd like to use (hours? Days? Weeks? Quarters?) and what they'd like to save (the attachments, the entire email, etc...) (Better yet, what they'd like to *do*.) > > If I really wanted to pass around the specifications as objects, I > > would do what the re module does: have one generic object for all the > > different kinds of age matching possible, and one generic object for > > all the email objects possible. These would be called, > > "AgeMatchSpecification", etc... These are noun-y things. Here, > > however, they are really a way of keeping your data organized so you > > can tell that that particular dict over there is an > > AgeMatchSpecification and that one is an EmailMatchSpecification. And > > remember, the specifications don't do the matching--they merely tell > > the match function what it is you wanted matched. > > Oddly enough, the re module was sort of my inspiration here: > > my_regex = re.compile("abc") > my_regex.match("some string") > > (Sure, re.compile() is a factory function that produces SRE_Pattern > instances rather than the name of an actual class, but it's still > used in much the same way.) > Except we don't have different kinds of re expressions for different kinds of matching. One spec to handle everything is good enough, and it's much simpler. If you have the time, try to see what people did before regex took over the world. In fact, try writing a text parser that doesn't use one general regex function. You'll quickly discover why one method with one very general interface is the best way to handle things. > > Now, part of the email match specification would probably include bits > > of the date match specification, because you'd want to match the > > various dates attached to an email. That's really not rocket science > > though. > > > There wouldn't be any need to integrate the classes anymore if I did > > it that way. Plus, I wouldn't have to remember a bunch of class names. > > I'd just have to remember the various parameters to the match > > specification for age matching and a different set of parameters for > > the email matching. > > You're sort of missing the bigger picture of this application, > although that's entirely not your fault as I never fully described > it to begin with. The essence of this project is that I have a > family of mailbox actions (delete, copy, archive to mailbox, archive > by time period, ...) and a family of email matching rules (match > read messages, match messages with attachments, match messages of a > certain size, match messages by date, ...) of which matching by date > is only one subtype -- but there are even many different ways to > match by date (match by number of days old, match by specific > calendar month, match by specific calendar month *or older*, match > by day of the week, ...); not to mention arbitrary Boolean > combinations of other matching rules (and, or, not). > > My goal is to create a highly configurable and extensible app, in > which the user can mix and match different "action" and "matcher" > instances to the highest degree possible. And using class > definitions really facilitates that, to my Java-poisoned mind. For > example, if the user writes in the config file > > actions = ( > ( > # Save attachments from read messages at least 10 days old > mailbox => ( > path => '/path/to/maildir', > type => 'maildir', > ), > match => ( > type => And, > p => ( > type => MarkedRead, > state => True, > ), > q => ( > type => DaysOld, > days => 10, > ), > ), > action => ( > type => SaveAttachments, > destination => '/some/directory/', > ), > ), > ) > > (can you tell I've been working with Lighttpd lately?) > > then my app can easily read in this dictionary and map the > user-specified actions directly into Matcher and Action instances; > and this without me having to write a bunch of code to process > boolean logic, matching types, action parameters, and so on into a > program flow that has a structure needlessly divergent from the > configuration file syntax. It also means that, should a user > augment the program with his own Matcher or Action implementation, > as I intend to make it easy to do, then those implementations can be > used straightaway without even touching the code for the > configuration file reader. > See, you are thinking in general terms, but you are writing a specific implementation. In other words, you're talking about the problem the right way, but you're trying to write the code in a different way. Coming from a C, perl, or Java background, this is to be expected. Those languages are a strait-jacket that impose themselves on your very thoughts. But in Python, the code should read like pseudo-code. Python *is* pseudo-code that compiles, after all. You don't need many classes because the branching logic--the bits or the program that say, "I'm filtering by days and not months"--can be contained in one bigger function that calls a very general sub- function. There's no need to abstract that bit out into a class. No one is going to use it but the match routine. Just write the code that does it and be done with it. In fact, by writing classes for all these branches in the program logic, you are doing yourself a disservice. When you return to this code 3 weeks from now, you'll find all the class declarations and metaclass and syntactic sugar is getting in your way of seeing what is really happening. That is always bad and should be avoided, just like flowery language and useless decorum should be avoided. > As for the decision to use a metaclass proxy to my AgeSpec classes, > I'm fully prepared to admit wrongdoing there. But I still believe > that an object-oriented design is the best approach to this problem, > at least considering my design goals. Or am I *still* missing the > point? > No, you're arguing about the thing I wanted to argue about, so you see the point I am trying to make. It's painful to realize that all those years learning OO design and design patterns just to make Java usable are wasted in the world of Python. I understand that because I've invested years mastering C++ and perl before discovering Python. Your solace comes when you embrace that and see how much simpler life really is when the language gets out of your way. -- http://mail.python.org/mailman/listinfo/python-list