First of all, thanks for your comments. On Sun, Aug 5, 2012 at 10:32 AM, Gaal Yahas <[email protected]> wrote:
> I wanted to mention one other thing -- how do you know this thing works? > > When you're developing it, you probably ran it several times. This is fine > to do occasionally, but it has a few problems: > - it puts load on Gabor's servers > - it's slow! you have to fetch a bunch or resources from the network every > time and uncompress them. > - when you have a bug, you have to figure out what's wrong from looking at > the output of the entire program. This can be hard. > - it's possible you have a bug in the statistics code that doesn't crash > the program, but which causes the wrong output! You'd never find it by > looking at the report, unless you had some prior knowledge about the actual > stats. > > Testing is a big subject, and I don't know how well you're already > familiar with it, so I'll just again give a few general remarks: > > - The main point I'm making here is that your script is a single flow of > code, and if you break it down to functions (sub get_page, sub analyze_page > etc.) you can cause only specific parts of your code to run, with input > that's under your control, and verify that the output is what you expect. > > - You might be tempted to comment out pieces of code / add flags like > $TEST or whatever -- it's okay here and there during development, but it > doesn't scale well down the line. You end up with messy code or things you > have to clean up before release, and have to repeat some work every time > you want to test something. > > - Whatever works. Perl has spectacular support for testing, but it means > it's easy to get overwhelmed by the options. Start with Test::More or > something else that's well-known. > > Right, I really need to start making tests for this and do it in parts, I think this will be for my next version of the script (tests and building it in smaller chunks). > On Sun, Aug 5, 2012 at 9:51 AM, Gaal Yahas <[email protected]> wrote: > >> Since you asked for comments, here are a few general ones. >> >> - You use autodie, but aren't telling it to capture errors in system. The >> most likely part of your program to fail is the network part (what you >> currently delegate to wget), so it's probably the one you want the most >> control over. Currently you'd fail silently. If you say "use autodie >> qw(:all)" you'd die on the first wget failure. What would you like to do? >> I'm guessing that ideally, you should retry (or have wget retry). But at >> the very least, you want good error messages. To summarize this point: >> don't give up contorl of error reporting in the heart of what you're trying >> to accomplish. >> >> > OK, so I removed the autodie, but can i use it somehow and still have my own warnings? (want to learn, so a link will be good enough). About the wget, I meant the program on linux distro's, not something on perl itself. How do I get the errors of a system error? I use system() for the wget and gzip so if there is an error i can see it in the terminal, but if I will be able to get the errors it can help if and when there will be problems and save them for a time I can look at them. > - You're using use strict, which is very good, but notice that the way >> your file handles are named, they are unchecked globals. Avoid >> all-uppercase, sigilless names (FILE and so one) if you want protection; >> use lexicals instead: >> >> open my $fh, "<", "myfile" or die ... >> >> (BTW, you have or die here and there, despite autodie. I'd recommend for >> now just getting rid of autodie entirely and doing all the errors yourself.) >> >> When you use lexical filehandles, the files are closed automatically when >> the variable goes out of scoped. This is great for reads. For writes, you >> should always close the file yourself to check for errors. There are fancy >> ways of doing that automatically but never mind them for now. >> >> I understand from your comment that I can check for errors when closing a file, how? (again, if you prefer giving a link that's OK too). > - It's not unreasonable to use external tools in your script. Whatever >> gets the job done. Of course there are tradeoffs: you comment that you >> don't know if these tools are available on other platforms. Well, modules >> for gzip and http may not be available on all platforms either -- take a >> look at corelist if you want to make educated guesses about what to expect. >> >> Didn't understand what you mean, I used the linux programs wget and gzip in this script, thus i don't know if those programs are available for windows for example, thus I don't know if it will work on windows. > - Syntactically, this: >> >> if (CONDITION 1) { >> CODE 1 >> } if (CONDITION 2) { >> CODE 2 >> } if (CONDITION 3) { >> CODE 3 >> } ... >> >> checks all the conditions and may run all the branches. I think you >> meant "elsif" there. >> >> I use it this way because it doesn't get the right numbers and make mistakes if I use elsif (at least on some of the years, don't remember why). > - Semantically, what you're doing is extracting a bunch of values from a >> hash into a number of scalars. Why? This has a few costs: >> >> 1. You have to pick names for these. If you ever scratch your head about >> the name of a variable, and end up with something mundane like "first", >> something's wrong. (I mean, first is perfectly all right if its scope is >> very small, and within that block of code it's clear what it's intended to >> refer to. But here the scope of these variables is pretty much the whole >> program, and the reader is confounded with arbitrary distinctions like >> "first" and "firs". >> >> I have refactored the names, it's better, not great. The reason for the names is that $first is the one with the most mails on this year, so may be the name isn't great but I don't know if there is a much better choice. I am open to suggestions. 2. It's inefficient in code. You have a 20 lines of code that aren't >> furthering your goal as a programmer here. Couldn't the user of this data >> may as well just say $names{$some_name}? >> >> The point of these 20 lines is to get the numbers of the most active participants, I guess there are other ways (and may be better ways) but I don't know them yet. > 3. It's inefficient in performance. Your for loop runs as many times as >> there are elements in %names, and performs ten conditions. So you have 100 >> branches in your code where you didn't need any. In this case, it doesn't >> matter; but you have to keep these things in mind for when you have larger >> data structures. >> >> So, what you say is that i should find a way to make it run only on the "right" parts of my variable? the big question is how can I do that? > - Again on naming, "percent" vs. "per" -- the reader can't tell what these >> mangled names are for. >> >> - I see you're using "say", which is a new language feature. If you're >> doing that, you may as well know about and use given/when instead of >> if/else, though like I said, it's not actually required here. >> >> >> Keep having fun, >> And I hope this is useful, >> Gaal >> >> Thanks again for your comments, it's very helpful and I think I learnt some things already :) > On Sun, Aug 5, 2012 at 3:39 AM, moshe nahmias <[email protected]>wrote: >> >>> Hi all, >>> I just uploaded my first little program to github. I want your feedback >>> on the script. >>> the script is intended to make the statistics for mailing groups from >>> their archives, for now it only works on linux (or at least if wget and >>> gzip are installed). >>> >>> So, what do you think? >>> I know its not perfect so feel free to teach me, after all, that's one >>> of the reasons to publish this :) >>> the link: https://github.com/moshe742/mail-stats.git >>> >>> Moshe >>> >>> _______________________________________________ >>> Perl mailing list >>> [email protected] >>> http://mail.perl.org.il/mailman/listinfo/perl >>> >> >> >> >> -- >> Gaal Yahas <[email protected]> >> http://gaal.livejournal.com/ >> > > > > -- > Gaal Yahas <[email protected]> > http://gaal.livejournal.com/ > > _______________________________________________ > Perl mailing list > [email protected] > http://mail.perl.org.il/mailman/listinfo/perl >
_______________________________________________ Perl mailing list [email protected] http://mail.perl.org.il/mailman/listinfo/perl
