Re: [Boston.pm] 64 bit perl boost?
Double check where the limit is. It may well be 2 GB. Ben On 6/21/06, James Eshelman [EMAIL PROTECTED] wrote: Thanks Sherm. It looks like there might be some benefit for high-end users who are likely to go beyond 4GB VM but we can postpone it 'til then. - Original Message - From: Sherm Pendley [EMAIL PROTECTED] To: James Eshelman [EMAIL PROTECTED] Cc: boston-pm@pm.org Sent: Wednesday, June 21, 2006 11:10 AM Subject: Re: [Boston.pm] 64 bit perl boost? On Jun 21, 2006, at 10:23 AM, James Eshelman wrote: I have a large O-O perl system running on Fedora Core 3 ( I know, it's old! - that's a separate subject) on Xenon 64-bit processors. The perl interpreter is only a 32-bit app. Anyone have an idea how much performance boost we're likely to get by recompiling everything for 64-bits? Does your app need more than 4G of virtual memory space? Does your app spend a significant amount of its time splitting huge numbers into 32-bit chunks so it can cope with them? If you answered no to these questions, don't bother recompiling. It won't help. sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Perl Curses clarification
I am going to second the suggestion to write an upload feature from a spreadsheet. You're not going to invent a better UI, and it is going to take you a lot more work. Plus there is no security issue here - anyone can do anything they want with the spreadsheet but they can only enter it into your application if they have the data. BTW I'm going to suggest that you not use flatfiles for this. Use a database, if only something trivial like SQLite. Your program will be simpler and faster. If your barrier is that you don't understand databases, well now is the time to learn. Cheers, Ben On 6/1/06, Janet Marie Jackson [EMAIL PROTECTED] wrote: Thanks to those who have answered. Let me clarify a bit more what I need to do. We want to use $USER to verify a valid user before running the program, so this is very unlikely go on the web or have a web interface. If a teaching assistant's personal account is compromised, we're really in deep you-know-what - otherwise, it's our best choice for security. The program will not be accessed by anyone other than the course staff. There will be a back-end flat file (probably CSV) listing the current students, their basic info, and their homework scores, one per line. When the user logs in, he/she will be presented with a menu along these lines: 1. View scores 2. Enter scores 3. Exit Your choice: The user can run the program by adding student names to the command line, in which case the choice will include only those students specified. Otherwise, based on the $USER, that person's section will be included. If the choice is to view scores, all students (or those from the args) and their scores will be shown. If the choice is to enter scores, the user is asked which homework, then the program will allow the user to enter a score for each student for the stated homework. At the end, the updates are written back to the file. The feature to enter scores is where I'm stuck... I'm debating trying to use something like curses to display all the student names and to allow the user to navigate with arrow keys to enter scores, vs. displaying one student at a time with a request for the score (which doesn't need curses, but takes more time) - or some other variant of either of these. At the same time, if the user doesn't want to go through the entire section, he/she CAN specify only certain students. I want this to be comfortable and convenient to use, so can't decide which approach is preferable. Thanks for your ideas! Jan ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] version of perl that can use scalar filehandles
Use scalar filehandles? You're probably thinking of 5.6.0, which was the first that would let you autovivify filehandles. As far as I know, through the 5.x series you could always do: my $fh = do {local *fh}; open($fh, $somefile) or die Can't read '$somefile': $!; If you didn't remember the do local trick, you could always use Symbol and then call gensym to get your typeglob. As for passing old-style filehandles, both of the following syntaxes are likely to work: call_function(*FILEHANDLE); call_function(\*FILEHANDLE); Cheers, Ben On 5/23/06, Greg London [EMAIL PROTECTED] wrote: more importantly, what is the syntax for passing a filehandle into a routine if it is FILEHANDLE instead of $FILEHANDLE? From: [EMAIL PROTECTED] on behalf of Greg London Sent: Tue 5/23/2006 4:10 PM To: boston-pm@mail.pm.org Subject: [Boston.pm] version of perl that can use scalar filehandles what was the earliest version of perl that would allow you to use scalar filehandles? open(my $fh, filename); instead of open(FILEHANDLE, filename); ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] version of perl that can use scalar filehandles
Do not weep. What changed in 5.6 was that it started autovivifying them. Just make the following conversion: open(my $fh, $file) ... my $fh = do {local *FH}; open($fh, $file) ... and your problem is fixed. Cheers, Ben On 5/23/06, Greg London [EMAIL PROTECTED] wrote: 5.6? (weeps) well, that'll never happen. I'll have to recode with *GLOBS. (weeps some more) Thanks for all the replies. Greg From: Ricker, William [mailto:[EMAIL PROTECTED] Sent: Tue 5/23/2006 4:23 PM To: Greg London Cc: boston-pm@mail.pm.org Subject: RE: [Boston.pm] version of perl that can use scalar filehandles more importantly, what is the syntax for passing a filehandle into a routine if it is FILEHANDLE instead of $FILEHANDLE? open(FILEHANDLE, $filename ) or die trying $!; open(my $fh, filename); Autovivification of unitialized scalar filehandles was added in 5.6.0 http://search.cpan.org/~nwclark/perl-5.8.8/pod/perl56delta.pod QUOTE File and directory handles can be autovivified Similar to how constructs such as $x-[0] autovivify a reference, handle constructors (open(), opendir(), pipe(), socketpair(), sysopen(), socket(), and accept()) now autovivify a file or directory handle if the handle passed to them is an uninitialized scalar variable. This allows the constructs such as open(my $fh, ...) and open(local $fh,...) to be used to create filehandles that will conveniently be closed automatically when the scope ends, provided there are no other references to them. This largely eliminates the need for typeglobs when opening filehandles that must be passed around, as in the following example: sub myopen { open my $fh, @_ or die Can't open '@_': $!; return $fh; } { my $f = myopen(/etc/motd); print $f; # $f implicitly closed here } /QUOTE 5.6.0 also added 3-arg open($fh, $mode, $filename) for better safety against injection etc. Which means 5.5.x was the version that couldn't. -=- Bill Not speaking for the Firm. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] version of perl that can use scalar filehandles
On 5/23/06, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT If you didn't remember the do local trick, you could always use Symbol BT and then call gensym to get your typeglob. i have used Symbol::gensym for years and it is fine for this. it comes with perl5 from way back before 5.6 (not sure how old it is). in fact i still use it in some modules as i want them to be backward compatible with older perls. BT As for passing old-style filehandles, both of the following syntaxes BT are likely to work: BT call_function(*FILEHANDLE); BT call_function(\*FILEHANDLE); i prefer the ref version but inside the called sub it won't make a difference and the code will mostly be the same. but there is one difference which is whether you can do OO calls on the handle. you may need to load IO::Handle (or one of its many subclasses) to get that support. That isn't a difference, at least not with current versions of Perl. You can do OO calls on the handle after using either syntax to pass it in. Which methods are available depends on what modules have been loaded. perl -le 'sub foo {$fh = shift; $fh-print(hello)} foo(*STDOUT)' perl -MFileHandle -le 'sub foo {$fh = shift; $fh-print(hello)} foo(*STDOUT)' I don't know whether that flexibility goes back to Perl 5.005 though. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Put similarities in code and differences in data
Code Complete talks about this. And many other things. The main obstacle is getting people to actually READ it. (And after that, to try to APPLY it.) Cheers, Ben On 4/4/06, Tolkin, Steve [EMAIL PROTECTED] wrote: Thank you Charlie. That is the idea I am trying to get across. Do you have any suggestions about how to get developers to see the benefits of writing programs this way? Any specific books, techniques, etc.? Any pitfalls to be aware of? Thanks, Steve -- Steve TolkinSteve . Tolkin at FMR dot COM508-787-9006 Fidelity Investments 82 Devonshire St. M3L Boston MA 02109 There is nothing so practical as a good theory. Comments are by me, not Fidelity Investments, its subsidiaries or affiliates. Steve -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Charlie Reitzel Sent: Tuesday, April 04, 2006 9:18 AM To: boston-pm@mail.pm.org Subject: Re: [Boston.pm] Put similarities in code and differences in data Not really. I believe it is intended to mean data driven programming as Jeremy mentioned earlier. To me, data driven programming means use lotsa lookup tables, the contents of which are user tweakable. As simple as it sounds, it can be an effective technique to let you quickly adapt a system as requirements evolve - without code changes. Having found this hammer early in my programming career, I find a great many nails. Early days in any new design are spent setting up a lookup table table, along with utility routines for reporting, validation, UI picking values (one or several), etc. It may be a use case, but I don't think this is quite the same thing as the subject of this thread which, as Uri says, is a general approach to analysis. At 09:00 AM 4/4/2006 -0400, [EMAIL PROTECTED] wrote: hi ( 06.04.04 08:46 -0400 ) Tolkin, Steve: The difference is that I am trying to find a quote that focuses on the benefits of using data in a special way, as control data, to determine the specific execution path taken by the code. um, isn't this the scientific method? -- \js oblique strategy: how would you have done it? ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] How do I wait for all child processes to terminate?
On 3/31/06, Kripa Sundar [EMAIL PROTECTED] wrote: Dear Ben, Thanks for the detailed reply to my query. If my questions below can be answered by online docs, please feel free to point me to them. I read through the following docs before my previous email. But I am still mostly in the dark: * man -s 2 for fork(), wait(), waitpid() and kill() * perldoc -f for fork(), wait(), waitpid() and kill() * perldoc perlipc Those are the right online docs. The perldocs for IPC::Open2 and IPC::Open3 are also good. Looking at examples helps a lot. The Cookbook is a good source. You can find an example at http://www.perlmonks.org/?node_id=28870. You can look at Parallel::ForkManager. etc. Unfortunately this tends to be a topic where you either get it or don't. And even when you get it, figureing out bugs is frustrating. 1 until -1 == wait(); You'll need something more complex if you want to track the children's exit statuses (very useful for debugging). That idiom is good to know. But I *do* need to track exit statuses (stati?). Please see my pseudo-code below. So much for the simple answer... If you [use POSIX] you can $kid = waitpid(-1, WNOHANG); to poll to see if a kid needs to be reaped. [...] I've seen this verb reap in this context, but don't know what it means. When and how do I reap a kid? How is reaping different from kill()ing it? Terminology time. When a process is doing stuff, we say that it is alive. When it finishes everything it needs to do, it dies. After it dies it becomes a zombie process, meaning that the process is dead but not gone. In particular it needs to tell its parent what its exit status was. The process finally goes away when it delivers that message, and so we call asking for that exit status, reaping. So reaping your child just means, Finding out its exit status so that it can finally finish. Which happens when you call wait or waitpid. When I say, poll to see if a kid needs to be reaped I mean, Check whether any child process has an exit status to tell me. Rather than worry about whether you are a child/parent for the rest of your code, I usually put an exit() here. [...] Sorry, I don't follow at all. When you are fork()ing my usual idiom is this: if (my $child_pid = fork()) { # Do parent stuff. } else { # Do child stuff. exit(); } # Do more parent stuff. That exit() guarantees that the child process can't accidentally execute code that it is not supposed to execute. Explicitly managing children and forking tends to be a lot of work. Unless you really need the complexity, I find that it tends to be easier to take the poor man's approach and do system calls and open up pipes. I'd much rather do system calls, if I can figure out how to wait for the children to finish up. The code sample that I provided at http://www.perlmonks.org/?node_id=28870 might be good enough for you then. All I really want is: system(something $_ ) for 1..5; wait_for_all_children; # 1 until -1 == wait; might suffice. compute_summary_of_children_activities; But that won't really work, will it? system(something $_ ) will launch something as a background job, and then come back in a flash to tell me that I don't have any child. So wait_for_all_children won't have anything at all to wait for. Depending on what you mean by work, that will work. That is, the code will run, jobs will be launched, children will be reaped, etc. But you'll have no way to tie the the children you reap to the jobs you ran. Which makes it hard to summarize what the children did. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] How do I wait for all child processes to terminate?
On 3/30/06, Kripa Sundar [EMAIL PROTECTED] wrote: Hello all, I thought this was fairly simple (and it probably is). But I am not able to figure out how I can fork() off, say, five child processes, and wait for all of them to terminate. 1 until -1 == wait(); You'll need something more complex if you want to track the children's exit statuses (very useful for debugging). Is the code below on the right track? Is it as simple as wait for [EMAIL PROTECTED]? I think this will let me wait N times, where N is the number of children launched. Am I right? TIA. Is there a simpler wait-for-all-descendants primitive? What if some child has children who do not get properly cleaned up? Those grandkids are now your responsibility. If some of my children are hanging, can I detect that, and take action to terminate them? If you use POSIX :sys_wait_h; then on most modern systems you can $kid = waitpid(-1, WNOHANG); to poll to see if a kid needs to be reaped. If you keep track of which kids you launched when and which ones you've reaped, you can decide when you think that the child needs to be terminated, and then use the built-in kill function to send a signal to that child. (The right signal will kill it, with or without giving it a chance to do cleanup.) my @children; # Keep track of my children's PID's. When I've needed to do this kind of logic, I find that hashes work better. Easier to insert/delete as I launch/reap. for (1..5) { my $pid = fork; if ($pid) { # I am in the parent. push @children, $pid; } elsif (defined $pid) { do_some_child_stuff($_); Rather than worry about whether you are a child/parent for the rest of your code, I usually put an exit() here. But note that you may need to worry about child/parent stuff in END blocks because Perl 5.8 causes END blocks to run on exits. If you dislike that rule, then POSIX::exit still does the old behaviour. } else { warn Warning: fork() #$_ failed; } } # for (1..5) if (@children) { # I am in the parent. if ($wait_order eq 'FIFO') { waitpid $_ for @children; print I waited for my children in FIFO order.\n; } else { wait for [EMAIL PROTECTED]; print I waited for my children without imposing an order.\n; } do_some_adult_stuff; } # if (@children) Random tips about this stuff. Explicitly managing children and forking tends to be a lot of work. Unless you really need the complexity, I find that it tends to be easier to take the poor man's approach and do system calls and open up pipes. Also note that a very common source of problems is having open sockets across a fork. For instance if you have a database connection, you may have race conditions between what happens when the parent is talking to the database and what happens when the child is shutting down. Keep your eyes out for that because such races happen easily and can be hard to diagnose. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Regex warning
On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote: I know I've done this before, but I'm not sure what I'm doing differently today. I'm trying to capture a simple command-line option like so: my $debug = 0; if(grep(/--debug=(\d+)/, @ARGV)){ $debug = $1; print debug: $debug\n; # Error here } But I keep getting Use of uninitialized value in concatenation (.) or string when I try to do something with the debug variable. How can $1 not be initialized? If it's matching, then it should have a value, no? That looks like a bug to me. But you can work around it as follows: while(grep(/--debug=(\d+)/, @ARGV)){ $debug = $1; print debug: $debug\n; last; } Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Regex warning
On 3/11/06, Ben Tilly [EMAIL PROTECTED] wrote: On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote: I know I've done this before, but I'm not sure what I'm doing differently today. I'm trying to capture a simple command-line option like so: my $debug = 0; if(grep(/--debug=(\d+)/, @ARGV)){ $debug = $1; print debug: $debug\n; # Error here } But I keep getting Use of uninitialized value in concatenation (.) or string when I try to do something with the debug variable. How can $1 not be initialized? If it's matching, then it should have a value, no? That looks like a bug to me. But you can work around it as follows: Correcting myself, I don't think it is a bug. $1 is dynamically scoped. In your construct above, that means that when grep ends, $1 is cleaned up. while(grep(/--debug=(\d+)/, @ARGV)){ $debug = $1; print debug: $debug\n; last; } And I think this works because the inner part of the while loop executes while the grep is still executing. Which is a Perl optimization to avoid generating a long temporary list in this situation. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Regex warning
On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote: Correcting myself, I don't think it is a bug. $1 is dynamically scoped. In your construct above, that means that when grep ends, $1 is cleaned up. while(grep(/--debug=(\d+)/, @ARGV)){ $debug = $1; print debug: $debug\n; last; } And I think this works because the inner part of the while loop executes while the grep is still executing. Which is a Perl optimization to avoid generating a long temporary list in this situation. Cheers, Ben Did you test this? I get the same error with this construct. Nope. :-( I thought I had, but I was testing something slightly different. So disregard that bit of idiocy. However I did test this: for (@ARGV) { if (/--debug=(\d+)/) { $debug = $1; print debug: $1\n; } } and it works because you're looking at $1 in the right scope. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] parsing CSV string with a comma in it
This is not nearly as simple as people think. Text::CSV can do it, but the example code in the documentation isn't right. (It won't handle embedded returns.) Text::CSV_XS does do it correctly out of the box with its getline function but needs a binary install. You can implement getline with Text::CSV something like this: package Text::CSV; sub getline { my ($self, $fh) = @_; my $line = $fh or return; until ($self-parse($line)) { my $additional = $fh; if ($additional) { $line .= $additional; } else { croak(File terminated in the middle of a line.); } } return $self-fields; } And another alternative is that Text::xSV will handle this in pure Perl out of the box. Personally I tend to use that, but then again I'm biased. :-) Cheers, Ben On 2/28/06, Alex Brelsfoard [EMAIL PROTECTED] wrote: Hello all, I know there's gotta be a nice and easy way to do this. Basically take, for example, the following file: --FILE-- item1a,item2a,item3a part1, item3a part2,item4a,item5a,item6a item1b,item2b,item3b part1, item3b part2,item4b,item5b,item6b .. --FILE-- So, when reading in this file I need to parse each line out into the proper segments: [open file] while (FILE) { my ($item1,$item2,$item3,$item4,$item5,$item6) = [parse $_] } [close file] What would be the best way to handle this? Should I use something like Text::CSV handle this? I would prefer to not need an extra module, but am willing to use one if necessary. Thanks. --Alex ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] daemonizing a perl script
On 2/21/06, Uri Guttman [EMAIL PROTECTED] wrote: JM == John Macdonald [EMAIL PROTECTED] writes: [...] JM Of course, detecting that a log switch of some sort has occurred JM doesn't ensure that you will be able to tell if more than one JM has occurred very quickly (from your frame of reference - JM that might mean that your tailing program got paused for a JM long time instead). well, most tailing doesn't care about how much has changed. tailing just wants to find and return the appended text. whether it returns large chunks or many lines isn't a function of the log file but of the tailing code. I think you're missing John's point. His point is that if 2 log switches happen while you're not looking, all the stuff that was written to the log between those switches is elsewhere and you'll never realize that it was ever in the log. This is not normally an issue. (Typically logs might rotate, say, once a day. And the tailing process checks every minute or so. But it theoretically can happen, and there is no good solution for it.) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] daemonizing a perl script
On 2/20/06, Bob Rogers [EMAIL PROTECTED] wrote: From: Ranga Nathan [EMAIL PROTECTED] [...] On the other hand, doing nice -n-1 myscript would run it at a slightly higher-than-default priority, which might allow it to swap in more quickly when the workload picked up. This would work best if the actions were fairly lightweight, as myscript will hog the CPU while it's running. (I Haven't tried this recipe myself, though.) The problem with a script swapping out is that it takes I/0 to swap it back in. Changing the priority just gives it better access to the CPU, which doesn't help one bit in how quickly I/O lets it get swapped back in. There many two solutions for getting swapped back out. The best is to add RAM. Secondly you can regularly guarantee that the script wakes up regularly and does something. Third, if you're using Linux 2.6 or later, play around with the swappiness parameter. You can control this either by echo 60 /proc/sys/vm/swappiness or by adding vm.swappiness =60 to /etc/sysctl.conf. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] pre-extending strings for apache/mod-perl application
Didn't we just have this discussion? It is extremely hard for pre-extending strings to result in actual performance improvements, and at best you can get a very small win in return for a lot of work. In fact the extra effort of having to track where you are in the string manually almost certainly *loses* performance. So don't do it. Ben On 1/10/06, Donald Leslie {74279} [EMAIL PROTECTED] wrote: I have an apache/mod-perl application that can results in large xml strings which are then transformed by xslt into html. A database query can result in an xml string with a length greater than 300,000 . In a normal perl allocation you can pre-extend the string to prevent repeated new allocations and copies. Does anyone know what happens in a mod-perl application? Does pre-extending have any benefit? Don Leslie ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] More Perl Style
On 12/20/05, Federico Lucifredi [EMAIL PROTECTED] wrote: Hello Guys, More Perl Style lessons for me, if anyone wants to chip in. Following is the script on the chopping block today - in the comments the parts that I did not manage to elegantize as much as I wanted. use Term::ANSIColor qw(:constants); use strict; Where is warnings? #CONFIGURATION my $timelen = 3; my @commands = ( '/bin/netstat -ape | /usr/bin/wc -l', '/usr/bin/lsof | wc -l', '/bin/ps ax -L | wc -l' ); I would indent this like so: my @commands = ( '/bin/netstat -ape | /usr/bin/wc -l', '/usr/bin/lsof | wc -l', '/bin/ps ax -L | wc -l', ); See Code Complete's advice on indentation for why. my @triggers = qw( 0 0 0); #- for(my $flag = 1; $flag;) #hate truth as a number... Gah. 1. NEVER use flag as the name of a flag. One of the most common programming errors is to accidentally reverse the meaning of a flag. Make the name of the flag a yes/no question that the value of the flag is the answer to. Then you will never again make that mistake. 2. Avoid C-style for loops whenever you have an alternative. You have an alternative here - use a while or until loop. I would write the above as: my $last_loop; until ($last_loop) { ... That clearly documents intent. { my $date = `/bin/date -R`; #three lines seem much for this - any more elegant way to chomp things up? chomp $date; print $date.\t\t; Is there a reason to not use Perl's built-in localtime() function? If you need specific formatting, you can save a line by chomping while assigning. chomp(my $date = `/bin/date -R`); Be careful about stacking functions too much, you might have fewer lines but you are doing as much. However this combination is so common that I suspect that it mentally chunks. for (my $i = 0; $i @commands; $i++) Again, avoid C-style for loops. I would write the above as: for my $i (0..$#commands) { ... { my $cmd = $commands[$i]; #I don't like these lookups, but I don't see how to foreach this one my $trig = $triggers[$i]; If you really don't like the lookups you can use Tye McQueens Algorithm::Loops and do a mapcar here. I personally think that the lookups are better. my $result = `$cmd` or die (could not execute command.$cmd.\n); chomp $result; $result == $trig ? print ON_RED, $result, RESET : print $result; print \t; $flag = 0 if ($result == $trig); # finish the internal round, terminate the external. Any nicer way to do it ? Nope. If you wanted to terminate the inner or both, then you could just use last. But if you want to terminate the outer, you need to keep track of state. However see my previous comments about the name of your flag. } print \n; } Hum - the mail client is insisting in wrapping at 80-chars - usually nice but very appropriate to mess up things here :D Code that runs into problems when wrapped at 80 characters needs to be reformatted and/or rewritten. :-P Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] pre-extending strings for apache/mod-perl application
On 12/13/05, Aaron Sherman [EMAIL PROTECTED] wrote: On Mon, 2005-12-12 at 16:44 -0500, Uri Guttman wrote: DL == Donald Leslie {74279} [EMAIL PROTECTED] writes: DL I have an apache/mod-perl application that can results in large DL xml strings which are then transformed by xslt into html. A DL database query can result in an xml string with a length greater DL than 300,000 . In a normal perl allocation you can pre-extend the DL string to prevent repeated new allocations and copies. Does anyone DL know what happens in a mod-perl application? Does pre-extending DL have any benefit? i can't tell you about any mod-perl issues but in general pre-extending in perl doesn't gain you as much as you would think. the reason is that some storage isn't really truely freed to the main pool when it gets freed when its ref count goes to 0. perl will keep it around in anticipation of it be reallocated for this same item in a future call or loop iteration. so it is effectively doing the usual doubling its size to grow into the first large string and then it is already prextended the rest of the time via reusing the previous buffer. Well, just in one trial, it does look like the data gets moved with every substantial growth. [...] The key word is *substantial*. I am too lazy to look at the source code for the factor for strings, but Perl widely uses a strategy of always allocating a fixed factor more space than has currently been requested for a wide variety of data structures. The result is that if you grow a data structure incrementally, the sum of the costs of moving the data forms a geometric series, which sums up to no more than a constant times the final size of the data structure. This constant is usually small enough that it isn't worth the effort it would take to remove it. If you really think that it *is* worth the effort that it would take to remove that insignificant overhead, then I'm going to go out on a limb and say that you have an application which shouldn't be written in Perl. OK, I won't look up the source, but I will demonstrate what I am talking about. It seems from this that strings are slightly different than arrays, but the effect is similar: #! /usr/bin/perl -l $s = ; my $old_loc; for (1..10_000_000) { $s .= 1; my $loc = loc($s); if ($loc != $old_loc) { print Len: . length($s) . ; loc $loc; $old_loc = $loc; } } sub loc { unpack I, pack p, $_[0] } __END__ Len: 1; loc 135673368 Len: 12; loc 135709104 Len: 36; loc 135708936 Len: 52; loc 135698368 Len: 68; loc 135671296 Len: 172; loc 135702616 Len: 2060; loc 135713768 Len: 134156; loc 3083567112 Len: 135156; loc 3083427848 Len: 274420; loc 3083149320 Len: 552948; loc 3082592264 Len: 1110004; loc 3081478152 Len: 2224116; loc 3079249928 Len: 4452340; loc 3074793480 Len: 8908788; loc 3065880584 In case you're interested, that came out to 1.58 copies/character. If you tried to assign a long string to pre-extend, truncate, then incrementally assign the one that you were really interested in, it would come out to 2 copies/character, and you're losing in your attempt to optimize! Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Pretty Graphs with Perl
On 12/6/05, Alex Brelsfoard [EMAIL PROTECTED] wrote: Does anyone who has/does use GD::Graph know if there's an easy way to embed the output graphs into HTML. Basically I'd like to be able to print a bunch of HTML, then the graph, then some more HTML. I've got the grph coming out all fine and dandy. You have to print out HTML that includes an embedded image, and make the URL for that image be served by your program that prints out the graph. One warning if you shell out in a CGI script: be ABSOLUTELY sure that you send headers before calling the shell. A very common mistake is to print your header then call the shell, not realizing that your print just saved data in a buffer and then waits to send it until either the buffer is full or your program ends. The result is that in your code you see the header printed before the graph, but Apache receives graph before the header and gets upset at you. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Fwd: parrot now available as a Debian package
On 11/18/05, John Macdonald [EMAIL PROTECTED] wrote: On Fri, Nov 18, 2005 at 04:16:18PM -0500, Uri Guttman wrote: [...] However, as I recall, NT was being developed for the Alpha at one point - I think it was available commercially for a while and not just internal to MS. Not to surprising, actually, since a large chunk of the original NT design team was hired away from DEC (Dave Cutler et al). [...] This is true. It is an amusing irony that one of the initial design goals for NT was to be highly portable to different chip architectures, while Linux was designed to take full advantage of 386-specific features. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Affordable web hosting service that offers mod_perl?
On 10/26/05, Sherm Pendley [EMAIL PROTECTED] wrote: On Oct 26, 2005, at 11:15 AM, Tom Metro wrote: I've often wondered if this greater power in mod_perl has been a hindrance rather than a help to the Perl web development community. Would we have been better off if, in addition to mod_perl (for the rare cases when you do need low-level access), there was a mod_perl_embed, or something like that, that was restricted in ability and focused on the needs of the typical site developer? An admin doesn't have to allow the developer full access to everything mod_perl can do. A common configuration used with mod_perl is to service only static HTML requests on the main server, and proxy requests to mod_perl serviced URLs to a separate server instance that's listening on a high port. This proxy set-up can be set up on a virtual server basis, with each virtual server having its own mod_perl instance. That does require a separate Apache instance, but not a dedicated server. The mod_perl Apache, being a separate instance, would not need to run as nobody or www - it could run with the user's permissions. Or, in a more secure setup, an admin could create two users, joe and joe_perl, and add them both to joe_group. Joe would use joe to log in, and the mod_perl server would run as joe_perl. Joe could then allow the server access to specific files through the use of group permissions. [...] The memory requirements of Apache with mod_perl are such that you can serve far fewer users per machine this way than you can with PHP. Given how competitive the shared hosting business is, it would be hard to do this at a competitive price. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Perl style regex in shell command for matching across lines?
On 10/19/05, Ranga Nathan [EMAIL PROTECTED] wrote: I need to check ftp logs (see below) for successful transfer of files. This is a bash script someone else wrote and I need to modify it. I want to use a Perl style regex like /^125.*?baxp\.caed.*?\n250/i in any ?grep or sed or awk whichever can do this. I tried grep and egrep - they seem to match only one line at a time. I am unable to match for \n inside the pattern. They only match one line at a time. That is what they do. What shell utility would do it? I dont want to bring in the perl interpreter just for this! Thanks for the help. I would use Perl. But you can try this: grep -A 2 -i '^125.*baxp\.caed' FILE_HERE | grep -B 2 '^250' It will separate matches with lines containing --. Though if you want to do anything interesting with them, it would be easier to do it in Perl. Because EVERY tool in the basic Unix toolset assumes that lines of data are the data of interest, so data that goes across a line boundary is a PITA to handle. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] threads and sockets
On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote: Has anyone here written a serious threaded server in perl? I can't seem to find any threads + sockets examples anywhere. I have some stuff working with Thread::Pool but there are problems. (I can elaborate if anyone wants me to...) Why are you trying to write a threaded server in Perl? If you want performance, I would strongly suggest using a pre-fork model, or else use another language. In Perl when you spawn a thread, Perl makes a copy of virtually all data in that thread to avoid race conditions. This is a big, slow step. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] threads and sockets
On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote: On Thu, 2005-10-06 at 18:36 -0400, Uri Guttman wrote: even the people who wrote the threads code in perl disavow them, so i wouldn't even try to do any heavy threading in perl. instead i recommend an event loop server which is stable, faster and easier to code for in most situations. you can use stem, poe, event.pm or even io::select as the main event loop. The problem is that I'm writing an RPC server that itself needs to make RPC calls. I can't be blocking on new clients connecting or existing clients sending requests while the server-side procedure is making its own RPC call out to somewhere else. I don't think an event loop would help, because a computationally slow procedure or one that makes a further RPC call would still block other clients. You can do this with an event loop and multiple processes. The RPC server doesn't make RPC calls. Instead it sends a message to a child process that makes the RPC call. The child process then sends a message back to the RPC server when it has the answer. The RPC server can now use a select loop to cycle through getting RPC requests, forwarding them to children, getting responses, and writing back to the clients. This is conceptually similar to what you might do with a multi-threaded server - you're just using processes rather than threads. An alternate strategy is to not have a single RPC server. Instead do as Apache's prefork model does, and have multiple processes. Each time you get an RPC call, one process processes the request and other processes remain available to service new requests. Unless overhead is a huge concern, I'd personally use the alternate strategy. I'd then avoid writing all of the multi-process logic by using Apache for that piece, and mod_perl to process requests/send responses. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] threads and sockets
On 10/6/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote: [...] BT You can do this with an event loop and multiple processes. BT The RPC server doesn't make RPC calls. Instead it sends a message to BT a child process that makes the RPC call. The child process then sends BT a message back to the RPC server when it has the answer. The RPC BT server can now use a select loop to cycle through getting RPC BT requests, forwarding them to children, getting responses, and writing BT back to the clients. you don't even need children to do non-blocking rpc calls. if you do the protocol yourself and it is over a socket (as it should be), you can do async rpc calls. but if you are using a typical library that hardcodes a sync rpc protocol, then you are stuck. this is a major issue i have with most protocol implementations, especially on cpan. they are written with sync i/o and never think about supporting async. what they don't realize is that async can easily emulate sync but it is almost impossible for sync to emulate async. I assumed that he wouldn't want to rewrite things like database drivers, so I was assuming that he'd be stuck. After all, if he was using async protocols, then he'd have never complained about blocking calls. BT An alternate strategy is to not have a single RPC server. Instead do BT as Apache's prefork model does, and have multiple processes. Each BT time you get an RPC call, one process processes the request and other BT processes remain available to service new requests. preforking is just an optimization of a forking server. if you distribute the processes over a farm of machines, then you can make the main server just connect to the processes on demand or in advance. The key feature that I was pointing to was having multiple processes, not the pre-forking optimization. I specified prefork to distinguish it from the threading model that Apache 2 also offers. BT Unless overhead is a huge concern, I'd personally use the alternate BT strategy. I'd then avoid writing all of the multi-process logic by BT using Apache for that piece, and mod_perl to process requests/send BT responses. and what if you aren't using http for the clients? what if you want to support a cli or a gui client? i really hate how apache (1 or 2) is being touted as the next great application platform and savior. cramming all those different modules (apache and perl) into the insane mess of apache is asking for trouble. anyone ever heard of config file hell? or colliding modules that are too tightly coupled to clean up? If you're building the system from scratch, you can use any protocol that you want for the clients. It doesn't matter what kind of clients you have. For instance I am using Ubuntu at the moment. It uses an http interface to fetch package information and packages when you want to upgrade. But this http interface is not accessed through a browser. Instead you can access it through the purse CLI interface of apt, through the curses interface of aptitude, or through the GUI interface of the Synaptic Package Manager. The protocol spoken over the wire is separate from the front end. As for your complaints about config file hell, there are plenty of known techniques to keep complex Apache applications cleanly organized. If you are going to have 50 million applications in the same Apache process, then you have a problem. But if you're talking a single application which you're considering devoting multiple machines to, then the necessary overhead to set up Apache for your needs is less than the overhead to roll your own application server, or to write event loops for everything. Before you disagree with the last, note that I'm counting as overhead having to implement asynchronous wheels because you don't like the blocking that happens with the synchronous one on CPAN... but what do i know? just doing event loops for over 20 years on many platforms and in many langs. :) hell, i even integrated c kernel threads into an event loop so i could do blocking ops in the threads and the main code was all event driven. but perl threads just can't cut the mustard like c threads. And this makes anything that I've said any less true? Note that I'm not saying that Apache is a perfect strategy. I'm not saying that it has no drawbacks. I'm just saying that it is a workable strategy in many situations, and it is the strategy that I'd be inclined to use fairly often. If you want perfection, then it is clearly the wrong way to go. But the perfect is the enemy of the good, and it is a pretty good solution. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] threads and sockets
On 10/6/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: you don't even need children to do non-blocking rpc calls. if you do the protocol yourself and it is over a socket (as it should be), you can do async rpc calls. but if you are using a typical library that hardcodes a sync rpc protocol, then you are stuck. this is a major issue i have with most protocol implementations, especially on cpan. they are written with sync i/o and never think about supporting async. what they don't realize is that async can easily emulate sync but it is almost impossible for sync to emulate async. BT I assumed that he wouldn't want to rewrite things like database BT drivers, so I was assuming that he'd be stuck. After all, if he was BT using async protocols, then he'd have never complained about blocking BT calls. true about dbi and other sync modules. but i won't assume anything since the OP has not given a full problem spec (yet). Perspective. For what I do, interesting data is almost all stored in the database, so useful programs are going to want to use DBI. I also, as I noted, saw key phrases in the question that suggested synchronous libraries were anticipated. Finally, even if everything can be made asynchronous now, allowing for synchronous calls in your architecture is a piece of future-proofing - some day a management dictate may come down to integrate with some piece of software that has a convenient synchronous interface already, but no asynchronous one. Some applications, because of what they do, environment, etc, can guarantee that asynchronous will never be an issue. But a great many cannot. preforking is just an optimization of a forking server. if you distribute the processes over a farm of machines, then you can make the main server just connect to the processes on demand or in advance. BT The key feature that I was pointing to was having multiple BT processes, not the pre-forking optimization. I specified prefork BT to distinguish it from the threading model that Apache 2 also BT offers. i also like the process farm idea and have used it many times. There are many ways to skin this cat. BT Unless overhead is a huge concern, I'd personally use the alternate BT strategy. I'd then avoid writing all of the multi-process logic by BT using Apache for that piece, and mod_perl to process requests/send BT responses. but what if you already had a tool in perl that let you do all the async communications with no new coding needed? and it can do application servers as well? :) I already indicated reasons to pick a design where synchronous calls are not an issue. An additional reason to avoid cooperative multitasking is scaleability - multiple processes allow you to benefit from multiple CPUs (both real and virtual), and make the migration to multiple machines easier. A cooperatively multitasked program can only use one CPU. (But cooperative benefits from being able to communicate between tasks very directly. However if you do too much of that, it is easy to accidentally introduce race conditions. Particularly if you try to be asynchronous everywhere.) and what if you aren't using http for the clients? what if you want to support a cli or a gui client? i really hate how apache (1 or 2) is being touted as the next great application platform and savior. cramming all those different modules (apache and perl) into the insane mess of apache is asking for trouble. anyone ever heard of config file hell? or colliding modules that are too tightly coupled to clean up? BT If you're building the system from scratch, you can use any protocol BT that you want for the clients. It doesn't matter what kind of clients BT you have. sure. again we have no proper spec so i won't speculate. My point remains. Wanting to support a CLI or GUI client does not prevent you from using http. Your suggesting that it does is a red herring. [...] that still doesn't mean using apache and http for app serving is a good idea. http is stateless so it makes for a bad protocol for when you need multiple remote operations on a single session. sure you can work around it (cookies) but that is still a workaround. Stateless is indeed a drawback to http. And working around it is one of the pieces of overhead to this approach that I alluded to when I said, if you don't mind the overhead. BT Before you disagree with the last, note that I'm counting as overhead BT having to implement asynchronous wheels because you don't like the BT blocking that happens with the synchronous one on CPAN... hmm, what if the wheels are there and rounder than the square ones you currently have? What rounder wheel than DBI do you have to offer me? [...] BT And this makes anything that I've said any less true? no, but it shows that threads and events can work together
Re: [Boston.pm] Trying to learn about the problems with eval()
On 8/16/05, Tim King [EMAIL PROTECTED] wrote: Ben Tilly wrote: I agree that using eval here is wrong. But I still don't see action at a distance. You can argue about whether it is action at a distance, but you have tight coupling between the internals of make_generator and the string passed into it that was generated from very far away. Correct. A function that uses eval() does so in its own context. I don't see what makes this a problem with eval(). Rather, it's a concern that affects the software design. Well this thread started with someone looking for reasons not to use eval. Pointing out that certain ways of trying to use eval affect the software design in negative ways is a good reason to not use eval in some situations. The problem in this example is that make_generator doesn't make a generator. I know full well what the problem is. And the result is that you cannot consider using code generation for a situation like this. Not if you want to make a generator as you have tried, no. My point was that this is not a problem with eval(); it's a problem with your proposed design. You can place the blame wherever you want to. But with a more capable version of eval, you can make the proposed design work. Of course since Perl doesn't *have* capable enough version of eval, you can't make it work in Perl... I use eval() when I don't know what the code is until runtime, or to execute code generated in one place in a context generated from another. But I can't recall a case like the latter that wasn't also an instance of the former. sub foo { my $x = 7; my $generator = sub{++$x}; print $generator-() for 1..5; } But in general you're now prevented from adding various kinds of syntactic sugar, manipulating the passed in code before compiling it, etc. If you really need to manipulate the code, you apply the syntactic sugar when generating the code string. Then you invoke eval() in whatever context the code should execute. The problem is that the place where you'd like to centralize the code manipulation, calling eval, catching errors etc is in one place, and the code that you'd like to manipulate is in another. You *can* achieve the desired flow of control, but the only way that I can think of doing it in Perl is to require the caller to pass in sub {eval(shift)} (which will do an eval in the caller's context). That's an ugly construct to have to throw in when you were trying to create syntactic sugar. Or alternatively, you can pass to the code-generator an evaluation context that the generated code will use. One might use a technique like this when using Perl as a scripting language within a Perl program, for example. I suspect that you mean something like the sub {eval(shift)} that I mentioned above. Note that $generator... is a true coroutine. It is not a coroutine. It is a closure. It is both a closure and a coroutine. If it is a coroutine, then you should be able to return multiple times and restart the call each time. (Traditionally done with a yield operator.) I don't see that capability there... Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] CGI::Prototype
On 8/15/05, Ricker, William [EMAIL PROTECTED] wrote: [...] CGI::Prototype offers a _different_ way of factoring out the you always had to write this glue code code. Catalyst uses the Perl Attributes annotations to factor out glue-code, which is classy demonstration that attributes are a good idea. CGI::Prototype uses prototypical (instance-based, or nonce-class) inheritance. [...] Question on that. My understanding is that attributes are processed at CHECK time, which means that code using attributes would not work right if loaded after CHECK has run. In web development this could mean that Apache::Reload and attributes would *not* play well together. (Which would be a pretty big drawback for me in developing a mod_perl application.) Is my understanding outdated? Thanks, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Trying to learn about the problems with eval()
On 8/15/05, Kripa Sundar [EMAIL PROTECTED] wrote: Dear Ben, another bad point about eval is that it can access and modify lexicals and globals anywhere in the code. so that can lead to action at a distance and very hard to find bugs. [...] I'm not sure if this is what is referred to, but it applies. If this is dynamic code where the string to be evaled is passed in from elsewhere, then one problem is that you might wind up picking up lexicals in the scope of the eval, and being unable to pick up lexicals in the scope where you tried to create the eval. Closures would get this right. Ruby solves this problem by giving you a number of variations on eval, one of which evals things in the context of your caller. Still not perfectly general, but much more likely to be right in this instance. Do you mean examples like below? --\/BEGIN-\/-- % perl -le 'my $x = 7; my $str = q{print ++$x}; {my $x = 11; eval $str}' 12 % --/\-END--/\-- Close, but I meant more like this: sub foo { my $x = 7; my $generator = make_generator(q{++$x}}; print $generator-() for 1..5; } sub make_generator { my $action = shift; my $x = 11; eval qq{ sub { # Some interesting code here... $action; } }; } IMHO the current behaviour is intuitive. And I certainly don't see action at a distance. The person who thinks that the '$x' inside $str is referring to the currently visible $x (value 7) is simply mistaken. Likewise the person who thinks that the inner $x will remain untouched by the eval(). (But maybe this latter is what Uri is referring to as action at a distance.) I agree that Perl's behaviour is logical. However it is inconvenient. And from the point of the person who is trying to use make_generator, it causes internal details to matter too much. A workaround, of course, is to tell the person to use global variables. Which works except for the variables that happen to be used internally in make_generator, which the person doing the calling should not need to know but does. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] count substrings within strings
On 8/9/05, Ronald J Kimball [EMAIL PROTECTED] wrote: On Tue, Aug 09, 2005 at 09:07:16PM -0700, Stephen A. Jarjoura wrote: [...] Did I miss some obvious, and easier method? Basically, you need a loop. s///g allows you to hide the loop, but is less efficient because you're updating the string. You could use m//g with an explicit loop instead: $transfer_count++ while $buffer =~ m/\transfer/g; Beware when generalizing the above. How many copies of hihi are in hihihihi? How many copies of .* in .*.*.*? The latter can be fixed with the proper escapes. The former can be fixed either by using index() to do your searching, or by using pos() in the loop to set the match to just after the start of the match that you last found. (Unless overlapping matches are not allowed for your problem.) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] more syntax checking
On 7/18/05, Uri Guttman [EMAIL PROTECTED] wrote: KS == Kripa Sundar [EMAIL PROTECTED] writes: KS Dear Uri, [...] subs can come into existance any time and be handled by AUTOLOAD and such. so there is no easy compile time way to check that at the moment. [...] KS It is the same principle as for any other stricture. The user is KS asking perl to flag some *legal* usage as being unacceptable for KS her/his purpose. but perl can't divine if subs exist before they are called because some module may do things differently. just look at AUTOLOAD. So sometimes Perl gets it very wrong. That's OK, if people choose to turn on optional warnings, they are choosing the new behaviour. KS For the user who does not want dynamically defined routines, it KS should be trivial for the compiler to honour a suitable pragma KS (say, use strict sub_definitions) and die if it sees that there KS are some subroutine invocations without any compile-time KS definitions. but will that honor modules that are used? that pragma can be lexically scoped but it still is an issue. what about calling something in BEGIN but before it is compiled? perl can't detect that until it tries because something else in the BEGIN block could define the function. So? You choose to turn on a pragma that breaks on valid code. Your choice, and you'll catch the gotchas in pretty fast in development. KS An invocation before a declaration is legal Perl, but gets flagged KS under use strict subs. An invocation without a definition should be KS equally easy to flag (although, of course, perl would have to wait KS until the end of compilation to do so). if you invoke with foo() then you don't get that strict problem. the issue is how can you tell a sub will be defined when a call to it is compiled. if you call subs later in the source file, they won't be defined at the time of the call being compiled, so that fails. you can then force predeclaring of subs in that case (like c) but most perl hackers will hate that. i know what you want but i don't see any easy way to do it that will satisfy most people. Then most Perl hackers will not use a stricture like this. Though I think that more will than you realize. I happen to like the idea of having this there as an option. Right now it isn't an option, so if you want to catch subroutine typos the best that you can do is declare functions like this: my $foo = sub { ... }; and then call them with: $foo-(@args); which will make everyone else want to kill you. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder
On 6/16/05, Joel Gwynn [EMAIL PROTECTED] wrote: [...] When you get right down to it, this Boston neighborhood thing is just confusing. I work in Dorchester but management likes to put Boston on the stationary, which is confusing because there's an identical address in Boston proper, just with a different zip code. Are there any other cities that have similar naming schizophrenia? Q: What do you call a Boston tourist? A: Lost. Similarity is in the eye of the beholder. But Denver is infamous for having many streets whose names are almost the same. You can never know which street someone is talking about until you get to the final St or Ave or etc. For instance in Denver, Colorado you can stand at the intersection of Colorado St and Colorado Ave. However, that notwithstanding, it is nowhere near Boston in being difficult to find your way around. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Image Uploading
On 6/8/05, Alex Brelsfoard [EMAIL PROTECTED] wrote: I know I asked a similar question a while back, but I'm compelled to try again. I have an existing script, not using CGI to take in parameters handed to the script. I would now like to have this script upload a file to the server, but not be forced to convert the entire file to using CGI. For the record, converting the entire file would probably be a Good Thing. This is the code I've used in the past: - use CGI; #create new instantiation of a form query. my $query = new CGI; open (UPLOAD, $filepath) || Error(Could not open file for writing: $filepath); my $picture = $query-upload('photo'); if ($picture) { while ($bytesread=read($picture,$buffer,1024)) { print UPLOAD $buffer; } }else { Error(Picture ($picture) is undefined.br File: $photoNamebrFor some reason I cannot read in this file.); } close (UPLOAD); - But as you can see, it's using CGI. Any suggestions of how to upload an image similarly without having to rewrite the rest of my code? Here is a stupid trick. Slurp STDIN into a scalar. Then use IO::Scalar to tie STDIN to that scalar. Open CGI, then you can seek to position 0 in STDIN, and run your (probably worse) form handling code. Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Image Uploading
On 6/8/05, Mark J. Dulcey [EMAIL PROTECTED] wrote: Alex Brelsfoard wrote: I know I asked a similar question a while back, but I'm compelled to try again. I have an existing script, not using CGI to take in parameters handed to the script. I would now like to have this script upload a file to the server, but not be forced to convert the entire file to using CGI. Any suggestions of how to upload an image similarly without having to rewrite the rest of my code? You're doing HTTP upload, so you're going to need some sort of module to do that. CGI.pm is certainly the most common one. A quick search of CPAN didn't turn up any obvious candidates to replace it, but maybe someone out there will know of one. CGI::Simple is also out there if you don't want to use the HTML generation. (Usually using the HTML generation is a bad idea, but there are some simple tasks for which it makes sense.) However that would not solve the original problem, which is how to add support for uploads without having to rewrite the parts of the script that does its own form processing. (If past experience is anything to judge from, it probably does its own form processing very badly.) But this brings to mind an alternate solution. Which is to write a few functions that present the interface that the old code expects using CGI under the hood to do them, then remove the old form processing code. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Can you better this?
On 5/11/05, John Tsangaris [EMAIL PROTECTED] wrote: I was asked to provide the 73 occurrence of a sequence of numbers, with the numbers 12345. Each number can be used only once, and there are a possible 120 combinations. I was called by a client to figure this out for them, since one of their 2nd grade children was required to provide the answer to this question. I only had a coule of minutes so I pulled this code out of my sleeve to get the answer. But, I'm curious to find out if there is a sleeker way to get the answer and full sequence (preferably more advanced than my 2nd grade answer). Do I smell golf? perl -le 'map/(.).*\1/||print,glob{1,2,3,4,5}x5' This works on Unix. On Windows you'll have to switch quotes around. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Simultaneous redirect to STDOUT File?
On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT Be aware that IO::Tee has limitations. It only works for output that BT goes through Perl's IO system. In particular if your program makes BT a system call, the child process will NOT see the tee. i bet you can work around that by saving STDOUT, reopening it on IO::Tee and having IO::Tee output to the file and the saved STDOUT. i leave implementing this as an exercise to the reader. but using shell to do this is probably the easiest as you can just use tee and all stdout piped to it (from the perl program or its subprocesses) will get teed. as larry says, the shell has to be useful for something! BT You'd lose that bet. i am not sure about that. it might need some hacking to do it. BT IO::Tee is implemented through tying a filehandle inside of Perl. BT The entire mechanism only makes sense from within Perl. A BT launched subprocess (or poorly written XS code) goes through a BT fileno that the operating system knows about. Since the OS does BT not know about Perl's abstractions of I/O, there is no way to get BT the OS to direct output through them. you can then do the STDOUT dup stuff yourself and then bind IO::Tee to that. by closing STDOUT and reopening it to a pipe you create, all the children process will output to that pipe since they will see it as fd 0. you have to fork and have that read the other side of the pipe and use IO::Tee in there. like i said, not simple but doable. this is effectively what the shell does when you pipe anyway. This is just a version of the alternate fork and postprocess that I said would work (and you left out of your reply). But if you're going to do that, then IO::Tee is a red herring - it is easier to loop over filehandles yourself. The heavy lifting is being done by the operating system. See the cookbook for a sample implementation. another totally different approach is to use one of my perl sayings, print rarely, print late. too much code is written with direct calls to print (with or without explicit handles). when you print late, you just build up all your output in strings with .= and then just return it to the caller. only at the highest level where the actual print decisions are really made do you finally call print. this is also faster as print is very slow as it invokes all manner of stdio/perlio code each time it is called. appending to a buffer is very fast and clean. so if you did it this way, the top level would be like: We're now getting into optimization, so this is platform dependent. Besides, optimization First of all be aware that while .= is fast in Perl, in many other high-level languages the equivalent is slow. For instance try to create the string hello world\nx1_000_000 with a simple appending loop in Perl, JavaScript, Ruby, Java and Python. Using the default string implementation this is very slow in every language but Perl. Making it fast requires jumping through various sets of hoops. How many and which ones depends on the language. Java has a StringBuffer class that does the trick. In JavaScript you can accumulate strings in an array and then join it. Unfortunately if the array gets too big then you run into GC overhead. So then you have to start accumulating into an array and joining parts of the array early. (Ugh.) Secondly even in Perl I'd expect print to be faster than using .= repeatedly instead of print. Let's try it: $ time perl -e 'print hello world\n for 1..1_000_000' /dev/null real0m0.379s user0m0.380s sys 0m0.000s $ time perl -e '$s .= hello world\n for 1..1_000_000; print $s' /dev/null real0m0.752s user0m0.600s sys 0m0.150s $ perl -v This is perl, v5.8.4 built for i386-linux-thread-multi [...] Why did this happen? Well when you print, most of the time what it does is shove the data on a buffer. If said buffer passes over some threshold (eg 2 K) then it actually writes it to the pipe. All of your output has to go through this process, so adding a level of Perl buffering is pure overhead. Having to buffer all of it is more overhead still. (Incidentally in this case, syswrite is slightly faster than print.) Or at least *should* be. In older Perl's by default you went through the OS stdio stuff, and the hand-off from Perl to the OS could be slow. Depending on your platform, that is. (Linux was slow IIRC.) So you may have once done a benchmark and made an optimization conclusion and then never noticed that it has now become dated. (This has happened to me plenty of times...) use File::Slurp ; my $text = do_lots_of_work_and_return_all_the_text() ; print $text ; write_file( $tee_file, $text ) if $tee_file ; it makes for a very good api too in all
Re: [Boston.pm] Simultaneous redirect to STDOUT File?
On 5/10/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: [...] BT Maintainability is more important than optimization. I often use BT this strategy for maintainance reasons. Going full-cycle, one way BT to accomplish all of this without changing code is to tie to a BT filehandle that accumulates data and prints it later. but what if you don't want to print it but log it or send it to a message? what if you want a status sub to be useful in many different ways? making it use a handle or printing directly limits your flexibility and control. delaying printing until you are ready also means you can use write_file which is faster than print as it bypassed perlio. With tie you can do all of that. It may involve some hoops, but you can do it. You may need to write your own Tie class though. The key point was without changing code. I should have been more explicit about that. This is a strategy to consider if you have existing code and wish to refactor in a way which is inconsistent with how it was intended to work. I would not normally choose to write new code on that plan. As for performance, again I consider optimization less important than maintainability until proven otherwise. Besides, in my experience the bulk of I/O time tends to be spent waiting for resources (another process, filesystems etc) rather than stdio buffering. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Simultaneous redirect to STDOUT File?
Be aware that IO::Tee has limitations. It only works for output that goes through Perl's IO system. In particular if your program makes a system call, the child process will NOT see the tee. Cheers, Ben On 5/9/05, Duane Bronson [EMAIL PROTECTED] wrote: If it's Unix-only, you can open (tee output.log |) and write to that. And search.cpan.org tells me there's IO::Tee Or you could use something like log4perl which I think allows you to configure multiple appenders of which one can be stdout and another can be a log file. That might be overkill, though. Palit, Nilanjan wrote: I want to redirect print output to both stdout a file at the same time: I can think of writing a sub that executes 2 print statements (one each to stdout the filehandle), but I was hoping someone has a more elegant solution. Thanks, -Nilanjan ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm -- Sincerely *Duane Bronson* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] http://www.nerdlogic.com/ 453 Washington St. #4A, Boston, MA 02111 617.515.2909 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Simultaneous redirect to STDOUT File?
On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT Be aware that IO::Tee has limitations. It only works for output that BT goes through Perl's IO system. In particular if your program makes BT a system call, the child process will NOT see the tee. i bet you can work around that by saving STDOUT, reopening it on IO::Tee and having IO::Tee output to the file and the saved STDOUT. i leave implementing this as an exercise to the reader. but using shell to do this is probably the easiest as you can just use tee and all stdout piped to it (from the perl program or its subprocesses) will get teed. as larry says, the shell has to be useful for something! You'd lose that bet. IO::Tee is implemented through tying a filehandle inside of Perl. The entire mechanism only makes sense from within Perl. A launched subprocess (or poorly written XS code) goes through a fileno that the operating system knows about. Since the OS does not know about Perl's abstractions of I/O, there is no way to get the OS to direct output through them. If you want to avoid the shell, the cookbook has a fork recipe for postprocessing your own output. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] [getting OT] Controlling Windows with Perl?
While you can't uninstall IE, you can reduce its exposure to the web. A friend of mine developed a lockdown approach that included installing Mozilla, removing all visible signs of IE, pointing IE at a proxy server, and creating a login script that continually repoints IE at said proxy server. This proxy server allows through a couple of things that really need it (eg Microsoft Update), and otherwise displays a static page telling you to use a better browser. The proxy server is a necessary step because you can't actually remove IE. Last I heard the lab that he runs (which is used by a bunch of teenagers) had avoided getting any significant virus infections in over a year. He is proud of that fact, but complains that achieving that goal takes a *lot* more work than it should. Cheers, Ben On Tue, 22 Mar 2005 00:20:03 -0500, Anthony R. J. Ball [EMAIL PROTECTED] wrote: Windows cannot really live without IE, too many things embed it. I have just been playing with Macromedia Breeze and it obviously uses embedded IE to talk to the Macromedia site in its powerpoint plugin. Like it or not, the only way to unistall IE is to unistall Windows... Hrm... doesn't sound like an awful idea ;) On Mon, Mar 21, 2005 at 09:14:41PM -0800, Ranga Nathan wrote: Accessing inernet when you are logged on as administrator is like inviting AIDS (sorry, this sounds drastic but it is :) ). At home where I dont have too much security, I always log on as a common low-privilege user. while on internet. Using Mozilla is always wise. I can not believe that there is still no way to remove IE from Windows The worst nightmare is some casino site that attaches to IE like a leech! I even called those folks one day and they refuse to own up to anything! __ Ranga Nathan / CSG Systems Programmer - Specialist; Technical Services; BAX Global Inc. Irvine-California Tel: 714-442-7591 Fax: 714-442-2840 Bob Rogers [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 03/21/2005 07:03 PM To Ben Tilly [EMAIL PROTECTED] cc boston-pm@pm.org, Ranga Nathan [EMAIL PROTECTED] Subject Re: [Boston.pm] [getting OT] Controlling Windows with Perl? From: Ben Tilly [EMAIL PROTECTED] Date: Mon, 21 Mar 2005 18:21:38 -0800 And now that there is serious venture capital behind adware, some of the more difficult security exploits are getting hit hard. For instance I've heard that that internal Windows messages have *no* security infrastructure. Any application can send a message to any other application and there is no way for the recipient to figure out who the message is really from. (To exploit you have to send the right message to the right application when it is expecting to see a message that can be confused with yours.) That is correct. It is apparently easy to subvert apps such as antivirus that run as Administrator via their GUI, if they are foolish enough to present a GUI on a less-privileged desktop. But if you're using IE as your trojan horse, and you already have enough control over it to send messages to other app windows, then you have full access to the privs of the IE user, so why bother? Odds are it's a home system, and you won't even have to get Administrator privs in order to install adware, spyware, etc. A friend who supports a lot of small businesses is predicting that by the end of this year, Windows will essentially be unusable on the Internet. This seems extreme to me, but I don't keep track of these things, he does, and he has pretty good insight into the industry. It seems extreme to me, too, even if we were just talking about home systems. If I understand correctly, this window message thing is a fundamental design flaw in the older Windows APIs, but there is current technology that addresses the problem. Unfortunately, it is less convenient for users, so the trick will be to get vendors to switch to using it. But if it threatens to hit MS in their pocketbook, it will happen. But then, I do my best to ignore Windows, and have been largely successful at it, so I'm hardly an expert. -- Bob Rogers http://rgrjr.dyndns.org/ ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm -- www.suave.net - Anthony Ball - [EMAIL PROTECTED] OSB - http://rivendell.suave.net/Beer -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= To find fault is easy; to do better may be difficult. - Plutarch ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman
Re: [Boston.pm] Controlling Windows with Perl?
On Tue, 22 Mar 2005 19:02:51 +, David Cantrell [EMAIL PROTECTED] wrote: On Mon, Mar 21, 2005 at 06:21:38PM -0800, Ben Tilly wrote: [...] A fun issue is popups. Everything works fine and then someday there is an unexpected error and a popup stops everything in its tracks. Sure, you can probably put in some kind of search for popups that makes them go away, but do you dare? Until you see them do you know whether it is safe to ignore this popup? The issue here isn't even gui vs command line, the same problem was the bane of expect scripts. It is fairly simple to teach the computer what happens if everything goes right. But in manipulating someone else's user interface you discover boundary cases the hard way - one by one. The result is very fragile. Whenever your test script meets something it has not been taught to expect that is either a bug in what you are testing, or a bug in the test script. In either case, the only correct thing to do is stop. I almost commented on this in my previous email. What you say is absolutely correct for test scripts. It most emphatically is NOT correct for production jobs. When you have a chain of dependencies, halting the whole process dead in its tracks at the first sign of trouble leads to a process that does not complete. Instead you want to log anything that seems minor, pause and alert someone on anything that seems major (or is unknown), and be able to be told to continue, told to stop, and be able to be restarted from a known clean state without having lost all of your work so far. If you have frequent intermediate states that you can restart from, then it may be OK to always stop and just get restarted from there. Adding that extra logic adds a lot of complexity. Adding it while working remotely through your (possibly incompletely understood) interface adds a lot more complexity. And once written, even minor UI tweaks are likely to cause havoc, so you cannot easily upgrade the program being driven. If you're calling a library (by contrast), that extra logic is usually easier to add. (For one thing you don't have to categorize errors - they already come as warnings or fatal.) The API tends to be a lot simpler than the UI was, and API changes from one version to the next are far less likely to cause problems, so you are much more able to upgrade. If your job is testing software, then driving a UI is necessary, there is no other way to test the UI. If your job is maintaining a production system, you really want a programmatic API. Cheers, Ben PS Disclaimer: I've had to run nightly production jobs. I have never been responsible for automated testing. Assume appropriate biases. ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Controlling Windows with Perl?
On Mon, 21 Mar 2005 08:04:31 -0600 (CST), [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I've seen programs that can monitor your keystrokes and mouse clicks, etc, in order to replay them against the operating system. Does perl have the ability to do something like that? Yes. The purpose of my search is that I want to automate certain responsibilities which necessitate using windows based programs, but not being a Windows programmer, I have no clue on how to do this. I don't know if it's possible, or if perl can do the trick. But I'm hoping someone else does. Danger alert! Danger alert! What you'll find is that you can write the script, and it will mostly work, but there will be constant issues. For instance someone will walk in to look at the batch job, will jiggle the mouse, then everything breaks. Command line functionality is not an option as many of the programs are gui only. Many gui programs can be manipulated through Win32::OLE. Many gui programs have a command-line replacement, or can be rewritten in Perl. Both approaches will be far more reliable than trying to drive a gui programmatically. For an example, lets say I wanted to write a script that would open quickmail on my system, click the new message button, type in some stuff in the window, and then click send... There are Perl modules that allow you to send mail directly. Using them will be far simpler and more reliable. Trust me on this. Am I off in la-la land, or can this be done, and be done with perl? It can. Been there, done that, have the scars. Which is why I'm telling you to only use that as a method of last resort. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Controlling Windows with Perl?
On Mon, 21 Mar 2005 17:49:45 -0800, Ranga Nathan [EMAIL PROTECTED] wrote: You would use Windows Scripting tool for that. Check-out WSH (Windows Scripting Host). There are many macros that do just that and as it was pointed out, this has caused many security exploitations in windows. And now that there is serious venture capital behind adware, some of the more difficult security exploits are getting hit hard. For instance I've heard that that internal Windows messages have *no* security infrastructure. Any application can send a message to any other application and there is no way for the recipient to figure out who the message is really from. (To exploit you have to send the right message to the right application when it is expecting to see a message that can be confused with yours.) A friend who supports a lot of small businesses is predicting that by the end of this year, Windows will essentially be unusable on the Internet. This seems extreme to me, but I don't keep track of these things, he does, and he has pretty good insight into the industry. There is software like Win Runner (Mercury tools I think) and Load Runner that do this kind of thing for repeated testing of Windows applications. You should be able to do this in Perl too. You will be playing keystrokes to get to the buttons, basically like screen-scraping. Sounds like a lot more work than, say, finding the right module to send email directly. A fun issue is popups. Everything works fine and then someday there is an unexpected error and a popup stops everything in its tracks. Sure, you can probably put in some kind of search for popups that makes them go away, but do you dare? Until you see them do you know whether it is safe to ignore this popup? The issue here isn't even gui vs command line, the same problem was the bane of expect scripts. It is fairly simple to teach the computer what happens if everything goes right. But in manipulating someone else's user interface you discover boundary cases the hard way - one by one. The result is very fragile. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] HTML Renderer
On Tue, 8 Mar 2005 14:02:47 -0500 (EST), Chris Devers [EMAIL PROTECTED] wrote: [...] Similarly -- and this way lies madness, I admit up front -- just run the script on a system that can use AppleScript or COM (or WSH or whatever it is, I'm not a Windows programmer) to just automate interacting with a regular browser like Firefox or Safari, and save the result that way. If you run it on OSX, you can go straight from this to a PDF file for free. I've done this on Windows for web pages that were IE only. It was a small PITA to get running (you have to install a driver to print to PDF files and there were some magic parameters that had to be set by hand in IE so that it would print to a file), but not that hard. What was hard was that it was unreliable, and every so often needed to be kicked. Which was OK since it was a batch process that produced a bunch of them that were stored as files. (I would NOT do this for an interactive web page!) I was very happy when those web pages got cleaned up so that we could switch to html2pdf instead. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Perl popularity and interview questions
On Mon, 7 Mar 2005 08:22:04 -0800, Palit, Nilanjan [EMAIL PROTECTED] wrote: -Original Message- From: Greg London [mailto:[EMAIL PROTECTED] Sent: Monday, March 07, 2005 11:17 AM As for the triple-plus operator ;) I'd think perl would take x, do a ++ on it, get 2, and then do the +1 on it to get three. But oh well. just won't use that in my code. No. In Perl (or C), $x++ = use then increment, whereas ++$x = increment then use. Thus the expression will use the existing value of x (1) to compute the value of y then increment x itself. And in C++. However there note that x++ implies an implicit cloning operation - you need the original value to increment and the returned value. If your constructor is heavyweight, it can be much better to write ++x, and avoid that clone (which cannot be optimized away by the compiler because there may be nontrivial semantic effects). I consider it very ironical that C++ demonstrates that ++C is a better way to write that idiom. (Also I'd prefer a language that was improved before I used it...) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Perl Popularity and Interview Questions
On Mon, 07 Mar 2005 08:36:56 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Mon, 2005-03-07 at 01:51 -0500, James Freeman wrote: [...] If you know more trivia than I do (I've yet to see that), then I would hire you on the spot. Let's turn this into, Let's try to stump Aaron! Here are a few tries from me: 1. What is the output of the following, and why? package Foo; our $_; print grep {$_} -5..5; 2. Explain what's special about 0 but true and why that's never actually needed. 3. Why don't you get a warning from: perl -we 'ignore useless use of a constant' 4. As Larry points out in perlsyn, he has never felt the need to use goto LABEL in Perl. (With good reason, as was first proven in the 70's, any algorithm that can be written with goto can be written, with the same efficiency, with named loop control.) Why, then, did he include the feature? 5. Who made the syntax $somesub-() work, and why? Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] why popularity matters
On Thu, 3 Mar 2005 14:46:19 -0500 (EST), Greg London [EMAIL PROTECTED] wrote: Chris Devers said: I think it would be nice if Perl were more popular. I don't think advocacy is a bad thing. I don't think certification, or courses, are unreasonable. But of the ways I can think of to make Perl more popular, I'm not sure that any of these will be more effective than simply writing great software that a lot of people benefit from: setting up a certification program; setting up a marketing campaign; ranting about the matter endlessly on mailing lists; shouting down people who think that having more great software would be a good thing. It seems that the poeple who want to talk about certification are the ones getting shouted down here. I think that you're seriously misrepresenting what is happening here. --- Hey, maybe we could create certifcation as a way to advocate perl. Shut up. certification doesn't prove anything. I think responses are more along the lines of, certification introduces a lot of problems, and we don't see how you'll make a certification become accepted. But it might make advocate perl in the business world. Shut up. the business world is an idiot if it doesn't use perl when it should. I think responses are more along the lines of, we don't think that the business world pays as much attention as you think. If you think that it does, then please explain the success, past and present, of C, C++, PHP and Perl. But if they pay attention to certification, why not give it to them? Shut up. Certification only makes money for the certification company. Again, we're dubious that companies pay as much attention as you think that they do, and it comes with costs. But what if OReilly could offer free web-based certification? Shut up and do it yourself. I think that several have mentioned the cheap (used to be free) web-based certification at http://www.brainbench.com/. You have not explained why we should expect another one to have significantly better uptake than that one. I can't do it myself, that's why I'm bringing it to the Perl Monger list. Shut up and advocate then, and stop arguing on the mailing list. Strange how that works. You don't feel that you can tackle the task and so argue on a mailing list, most of the members of whom are in no better position to do it than you are. What do you expect to happen? You then find out that this is a common source of discussion, and a lot of people who are in better positions to do something about it than you are also dubious about it. But you don't seem to be trying to understand why, you're just frustrated that we are not acting on it. Again, would you predict this to be useful? If you really wanted you could say, I'm going to tackle this, I need help, anyone who wants to help me please sign up here. That would be more likely to go somewhere. Better yet, I pointed you at a past discussion which showed you that there are prominent people who agree with you. You could go to one of them (Tim Maher comes to mind) and say, I understand that you're interested in getting Perl certification off of the ground, is there any way that I can help? He has several advantages over you. He runs a training business, he is well-known within the community, he has a better idea than you do of who is likely to help and who isn't. That option sounds depressingly effective, you might actually get somewhere. But no. Instead you're spending time talking about how important this is without actually doing anything about it. And then you're wondering why it is going nowhere. For an excellent overview of why this usually won't work I highly recommend reading _The Logic of Collective Action_. --- The mailing list is a shared channel. I don't shout people down because they're talking about something I don't want them to talk about. If you don't like certification, fine. If you have some historical information to give about it, fine. If you have some knowledge about certifciations attempt, fine. I think that I gave you all of the things that you say are fine. In fact I've given you all of them in this post. But after a certain point, the conversation moved from pointing out all the problems with certification, to attempting to push the conversation off the mailing list completely. There was at least one request to Ron to kill this thread, maybe two, I can't remember. And now most of the resistance is coming from people who have the attitude of Just fucking do it, stop talking about it. Ron's request was to stop personal attacks in this thread. Which is fair, and may likely have been directed at something that I said. As for continued resistance, at some point if a discussion is going nowhere, it makes sense to drop it. Some poeple seem to have broken out in hives at the mere mention of certification and are foaming at the mouth to the point where they cannot remain silent while a couple people talk about
Re: [Boston.pm] OT: O'Reilly
On Thu, 03 Mar 2005 00:50:34 +, Federico Lucifredi [EMAIL PROTECTED] wrote: Hey Ben, How do you feel when you have a nice process in place through which people are supposed to contact you, and customers keep on persisting in trying to get direct numbers to inside contacts? I tend to get irritated by that, but YMMV. Maybe a random editor will be like me, maybe not. I am not trying to go *around* the process, I am just trying to get some advice from someone more in the know than myself, and someone on the inside is ideal to answer two or three stoopid questions before I send things in through the appropriate official channels. Well from O'Reilly's point of view you certainly are going around the process, they have somewhere that they want you to start with to reach them, and you want to go contact an insider instead. I have no idea how specifical employees there will feel about it though. I as asking because, yeah, I can also figure that chromatic and Rael are editors there, but I am, indeed, concerned about bugging them out of the blue. Enough said. You might try contacting any O'Reilly author instead to get feedback/a better idea who might be sympathetic. You must have missed Brian's talk of 'bribing' two weeks ago -- I am not going that far (yet!) =) I definitely did miss that talk. Remember, I'm only possibly going to move to Boston, right now I'm in Santa Monica. PS: given how friendly the ppl at Pearson/AW seem to be, O'Reilly must really be under a deluge of proposals like Uri noted! Well, they certainly are popular. (Authors know that any given title is likely to sell a lot better if it is published by O'Reilly than it will when published by someone else.) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] perl6/pugs
On Tue, 1 Mar 2005 16:02:08 -0500, Adam Turoff [EMAIL PROTECTED] wrote: On Mon, Feb 28, 2005 at 03:39:30PM -0500, Gyepi SAM wrote: It must be: I am using LISP, after a long hiatus, and really liking it. I simply did not appreciate its power upon introduction six years ago. Yep. I never fully understood closures until I used them in Perl. After that, Lisp and Scheme were no big deal. [Except for the Y-Combinator, and ((call/cc call/cc) (call/cc call/cc)); those still make my brain hurt]. The Y-Combinator made my brain hurt until I figured it out. Heavy use of continuations still make my brain hurt, and I'm favourable to the opinion that a continuation is like a goto, only worse. Here's an explanation of the Y-Combinator. It won't work in Perl because Perl doesn't do lexical binding of input parameters. JavaScript does and most should know that, so I'll do it in JavaScript. Our goal is to be able to write a recursive function of 1 variable using only functions of 1 variables and no assignments, defining things by name, etc. (Why this is our goal is another question, let's just take this as the challenge that we're given.) Seems impossible, huh? As an example, let's implement factorial. Well step 1 is to say that we could do this easily if we cheated a little. Using functions of 2 variables and assignment we can at least avoid having to use assignment to set up the recursion. // Here's the function that we want to recurse. X = function (recurse, n) { if (0 == n) return 1; else return n * recurse(recurse, n - 1); }; // This will get X to recurse. Y = function (builder, n) { return builder(builder, n); }; // Here it is in action. Y( X, 5 ); Now let's see if we can cheat less. Well firstly we're using assignment, but we don't need to. We can just write X and Y inline. // No assignment this time. function (builder, n) { return builder(builder, n); }( function (recurse, n) { if (0 == n) return 1; else return n * recurse(recurse, n - 1); }, 5 ); But we're using functiions of 2 variables to get a function of 1 variable. Can we fix that? Well a smart guy by the name of Haskell Curry has a neat trick, if you have good higher order functions then you only need functions of 1 variable. The proof is that you can get from functions of 2 (or more in the general case) variables to 1 variable with a purely mechanical text transformation like this: // Original F = function (i, j) { ... }; F(i,j); // Transformed F = function (i) { return function (j) { ... }}; F(i)(j); where ... remains exactly the same. (This trick is called currying after its inventor. The language Haskell is also named for Haskell Curry. File that under useless trivial.) Now just apply this transformation everywhere and we get our final version. // The dreaded Y-combinator in action! function (builder) { return function (n) { return builder(builder)(n); }}( function (recurse) { return function (n) { if (0 == n) return 1; else return n * recurse(recurse)(n - 1); }})( 5 ); Feel free to try it. alert() that return, tie it to a button, whatever. That code calculates factorials, recursively, without using assignment, declarations, or functions of 2 variables. (But trying to trace how it works is likely to make your head spin. And handing it, without the derivation, just slightly reformatted will result in code that is sure to baffle and confuse.) You can replace the 4 lines that recursively define factorial with any other recursive function that you want. /me wonders how different the world would be if EvilLarry didn't let map and filter^Wgrep slip into Perl... I'm rather more thankful for closures. After all, being list-oriented, I can define map/grep quite easily. But closures I need to be in the language... Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] RPM building (was: Bottom Up)
On Tue, 1 Mar 2005 18:01:13 -0500, Gyepi SAM [EMAIL PROTECTED] wrote: On Tue, Mar 01, 2005 at 03:16:06PM -0500, Duane Bronson wrote: [...] I don't know of any CPAN distributions. However, if you are on an RPM based system, you might try my ovid program http://search.cpan.org/~gyepi/Ovid-0.06/ovid which recursively converts CPAN modules into rpms by following dependencies. It makes a normally painful and tedious task very easy. It's rpm specific because that's what I usually use, but that needn't be. The Debian equivalent is dh-make-perl. I haven't used it extensively, so I don't know how well it works. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] OT: O'Reilly
On Tue, 01 Mar 2005 23:35:40 +, Federico Lucifredi [EMAIL PROTECTED] wrote: Hello Uri, I have a bookish request: does anybody have an editorial contact at O'Reilly I can exchange a few ideas with? I am cooking a proposal for them and I need a few tips here and there. BT I'd start with http://www.oreilly.com/oreilly/author/intro.html. been there, done that. What I need is someone to talk to *before* I send them the proposal, hence my hope someone might have an editor's email. How do you feel when you have a nice process in place through which people are supposed to contact you, and customers keep on persisting in trying to get direct numbers to inside contacts? I tend to get irritated by that, but YMMV. Maybe a random editor will be like me, maybe not. You could lurk on use.perl.org and figure out that it looks like chromatic and gnat work at O'Reilly. Then contact them and see if they're interested in helping you. You might irritate them, you might get good advice. I don't know. I'm going to guess that they'll tell you to start with http://www.oreilly.com/oreilly/author/intro.html. When your proposal gets there, it doesn't have to be perfect. If they think that it has promise, they'll work with you on it. Note that when I say, it has promise, I mean that it fits into their idea of what they want their catalog to look like. A great idea for something that they have something pretty close to will lose to an average proposal for something that they feel is a hole in their offerings. and contact manning.com as well. they are open to proposals too. if you can't find the contact i should have some info still. I will keep that in mind, but right now I think this is such a fit for ORA that I have a hard time thinking of going to another publisher. ORA may or may not agree. As I noted, the quality of the idea is not the only factor in their decision. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] (also) Perl
On Tue, 1 Mar 2005 14:55:58 -0600 (CST), Alex Brelsfoard [EMAIL PROTECTED] wrote: My impression is that the language which is making the most inroads on traditional Perl areas is PHP. Is that because of the wonderful certifications that PHP has which Perl doesn't? Or is it because PHP is seen as easier to get started with than Perl? Also PHP has the huge advantage that hosting environments allow it to be used in a shared hosting environment, while mod_perl requires dedicated servers. (That is because PHP is less capable, so it is hard for one site to cause problems for other sites running in the same Apache process.) Are you telling me that this DOESN'T keep you up at nights? I know I'm exaggerating, but this is partly what gets me riled up: that simply because something is easier to get started with it's better. Hell the PHP documentation itself explains why it's easier to get started with: it gives su-root permissions on install. So you don't need to configure anything. Just sit down and play; no worries about not being able to do anything, because you basically have root. I'm sorry, but I'm not willing to take on that huge a security hole just to make the setup process easier. To me this just gives me more reason to fight harder to tell managers that Perl is the way to go. If everything that depressed me kept me up at nights, I'd never get any sleep. It sounds to me like you should read the classic essay, Worse is Better: http://www.dreamsongs.com/WIB.html No matter how much I may wish it otherwise, the world will be as it is regardless of what I can do. So I'll try to educate my corner and then survive as best I can. Suppose that we try this and it doesn't work. Does the argument then become that we need to get our certification backed by someone prominent because a certification that nobody has heard of is proving to be useless? We're just trying to find ways to communicate to managers that know nothing about Perl. This is just one idea. And I think well worht TRYING. If certification had no potential downsides, then I'd cheer you. But it has potential downsides that concern me, so I won't. Fortunately, unlike worse is better, what I'd like to have happen will happen naturally without any effort on my part. Also I have a different theory. My theory is that the non-savvy manager is going to ask someone he trusts for an opinion, who is either going to be someone whose competence has been proven (less likely), or is going to be someone else of about the same position and abilities (more likely). In neither case does the existence of a certification enter into the process. Well if he is about the same position and abilities then the certification program will be advertising to him/her as well. Here are some questions to ask yourself about this. - How much money do you wish to spend on advertising? - Where do you expect that money to come from? - Would that be a cost-effective use of that money? - Will the people whose money you're expecting to use agree? So now I need to take an endless stream of training from an approved source? Well, as was explained before. Certification is only PART of the hiring process. If you get one certificate and then spend years working with perl you obviously don't need another certificate. Your experience will trump your certificate at that point. I'm trying to see how this certificate does more than being able to put on your resume, I've taken these courses from trainer X. I've seen people say that on their resumes, and I paid attention. I did not necessarily recommend the hire, but you don't need a recognizable certificate to realize value from training. And remember to give the correct answer on a test even when I think it is wrong? (Quick: is our a good thing? Read http://www.perlmonks.org/?node_id=48379 before answering. Yet as cool feature of the day I'm sure that a certification would have required me to talk up how great it was.) Well, chalk that up to the proper design of the certification program. At this point we're past deciding whether or not to DO the certification. We're at the point of deciding how best to do the certification. If you're going to dream of a certification, why not dream of a perfectly adminstered one? My point is that existing certifications are notorious for having specific shortcomings. Unless you give me a good reason to believe that this would be different, I'm going to believe that your certification would be as bad as the rest. A certification that has very prominent and vocal opponents within the community is likely to have an uphill battle to acceptance. A certification that didn't have enough support for people to learn what they need to pass it is going to find that the hill is looking more like a cliff. I thought we were discussing this because we were already looking up said hill. And my
Re: [Boston.pm] perl6/pugs
Ruby is easier for Perl people to get into than Haskell. By the same token, learning Ruby will expand your horizons less than Haskell. Which is preferable depends on your point of view. Cheers, Ben On Mon, 28 Feb 2005 13:49:59 -0500, Benjamin Kram [EMAIL PROTECTED] wrote: I just grabbed binary of Haskell. I'm thinking of poking around with that as well, and Ruby... b On Mon, Feb 28, 2005 at 01:40:35PM -0500, Kenneth A Graves wrote: On Mon, 2005-02-28 at 13:32, Aaron Sherman wrote: On Mon, 2005-02-28 at 12:51, Benjamin Kram wrote: Has anyone had a chance to play with pugs? I just svned down a copy and was going to toy with it a bit. Only a little bit. I am, however, sure that the correct way to boost the popularity of your favorite niche language is to write a compiler / interpreter in it for a popular language. Pugs will certainly boost Haskell in this way ;-) I haven't gotten around to playing with Pugs yet, but I did build Haskell this weekend. It's a functional-programming conspiracy. --kag ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm -- it would be horrid to be robbed by the wrong kind of people -archy Don Marquis, the big bad wolf, 1935 ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] advocacy
On Mon, 28 Feb 2005 16:04:34 -0500, Tom Metro [EMAIL PROTECTED] wrote: Sean Quinlan wrote: [...] If Amazon, Yahoo, Ticketmaster, etc. are already using Perl in a big way, then why not put effort into making that more visible? One way is through a silly button campaign. Built with Perl, Powered by Perl, whatever. We've all seen them around for other products. I've even seen them for Perl, though I don't recall there being any standard or effort to encourage them. If such a thing existed, the next step would be getting the big name users of Perl to put them on their sites. It's better that an IT manager notices that Amazon uses Perl when he is shopping for books, than having a page somewhere on perl.com that lists Amazon among the big names. This step is easier said than done. Small companies have an incentive to not talk about their technology very much. As for why, Paul Graham put it well in http://www.paulgraham.com/avg.html: And so, I'm a little embarrassed to say, I never said anything publicly about Lisp while we were working on Viaweb. We never mentioned it to the press, and if you searched for Lisp on our Web site, all you'd find were the titles of two books in my bio. This was no accident. A startup should give its competitors as little information as possible. If they didn't know what language our software was written in, or didn't care, I wanted to keep it that way. When it comes to large companies, that real estate becomes valuable territory and they're not going to donate it for free. The technology you use is an internal decision. It has no relevance to the customer. What is the business case for putting it out there? If you're going to ruin your branding by advertising something, any healthy business is going to advertise something that makes them money. Think about things this way. If you were selling, say, cars, how much would you expect to have to pay to get that advertising on those pages? That's the amount that you're asking them to give away. What's your business case for doing so. That it would be great for you? There are lots of people and groups who could say that. What about for them? Another approach would be to get people from these companies to contribute articles to general IT publications. It's great that some of them show up at Perl conferences, but that's preaching to the converted. They do from time to time. However even that is not really in the company's interest. It is very much in the interest of the person who gets published since it looks good on the resume. But it means that this valuable employee either is likely to cost more or may leave. And probably doesn't help your core business. (Unless you're in a business like consulting, in which case you're likely to value the publicity.) Actively discouraging employees from publishing is likely to cause them to leave as well, so smart companies don't discourage. But they don't generally encourage either, and aren't about to start. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] OT: O'Reilly
I'd start with http://www.oreilly.com/oreilly/author/intro.html. Cheers, Ben On Tue, 01 Mar 2005 01:52:29 +, Federico Lucifredi [EMAIL PROTECTED] wrote: Hello fellow Mongers, I have a bookish request: does anybody have an editorial contact at O'Reilly I can exchange a few ideas with? I am cooking a proposal for them and I need a few tips here and there. Best - Federico _ -- 'Problem' is a bleak word for challenge - Richard Fish Muad'Dib of Caladan (Federico L. Lucifredi)- Harvard University BU http://metcs.bu.edu/~lucifred ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] short-listing languages for applications software development
On Fri, 25 Feb 2005 09:04:51 -0500, James Linden Rose, III [EMAIL PROTECTED] wrote: On Friday, February 25, 2005, at 08:28 AM, Tolkin, Steve wrote: I think this is the best point that has been advanced in favor of using perl: Amazon, Google, Yahoo, Morgan Stanley all use Perl in production ... Does anyone have additional details, e.g. the names of the projects, number of servers, number of users, estimated cost, estimated savings by using perl, etc. That kind of additional detail would usually be considered proprietary, and hence is unlikely to become public knowledge. I think it mentioned in the book eBoys that the guy who founded Ebay (Iranian guy whose name escapes me) wrote Ebay in Perl... and aside from that, I wrote KanjiCafe.com's Ice Mocha in Perl as well (^_^). Pierre Omidyar. The same thing was described in The Perfect Store. But when they needed to scale, they went to C++. So that's not a very good advertisement for Perl. However I've heard rumor that eBay recently aquired Rent.com, which apparently is written in Perl... Other well-known companies in the LA area who are using Perl in a big way include Ticketmaster and City Search. You can find more success stories at http://perl.oreilly.com/news/success_stories.html Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] (also) Perl
On Fri, 25 Feb 2005 15:51:46 -0500, James Linden Rose, III [EMAIL PROTECTED] wrote: On Friday, February 25, 2005, at 03:04 PM, Alex Brelsfoard wrote: I think part of the problem is that it is an open source system that doesn't have a fund for advertising. I think if we simply saw some commercials on tv talking about Perl, or telling about all it's success stories. Heck even if they're just like the Intel commercials simply saying Yeah, here we are. We're Perl. We're cool. Yeah, so like us. It wouldn't take many to make a difference. /me thinks of all of the dot coms who had advertising policies that resembled that. All failed of course, because they were wrong... Perl isn't completely without commercial allies. Being the dominant publisher of Perl related texts, it has certainly been in O'Reily's interest to promote its use. That aside, over the last 10 years, the number of shared CGI scripts written in perl and available to the web developing community is vast. I'm sure it dwarfs all other languages. I'm not sure that what is available in Perl dwarfs what is available in PHP. Furthermore shared CGI scripts tend to be truly awful. (There are, admittedly, some exceptions.) What Perl is really lacking is a widely recognized, widely accessible certification program. When you hire Java programmers they walk in the door with papers proving that somebody said they know what they're doing. Perl is generally practiced outside this whole vetting process. Welcome to the routine debate about whether Perl should have a certification program. You're free to start one, but you'll have a lot of trouble getting prominent people to sign on. That makes less technically experienced bosses woozy with fear. You know you're a genius with Perl, but no 3rd party has printed up a certificate telling your employer this. Actually in my experience the people who are most confident of their abilities tend to be mediocre at best. Top notch people are generally aware of ways that they can be better. (If you don't spend time painfully aware that improvement is needed, then improvement doesn't tend to happen...) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] (also) Perl
On Fri, 25 Feb 2005 19:18:54 -0500, Bogart Salzberg [EMAIL PROTECTED] wrote: On Feb 25, 2005, at 6:08 PM, Alex Brelsfoard wrote: Ideas? How about an alliance with Apple? Ditch AppleScript and replace it with Perl, marry Perl to a GUI and turn Mac users into Perl-hacking sysadmins. Does anyone know of a good book on database theory? Really. Joe Celko is well-regarded and has several books aimed at programmers at different levels. Pick one that you feel might be at your level. If you're using Oracle, I'll highly recommend anything that you feel is applicable by Thomas Kyte. (Many of his books are intended for DBAs, you probably don't want those.) Speaking personally, I don't have a ton of book recommends because I did most of my learning about SQL from co-workers. I suspect that many Perl programmers who use databases are in the same boat, which may be why you have been getting so few responses to that request. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: Debian v CPAN RE: [Boston.pm] Install problems
On Tue, 15 Feb 2005 06:24:51 + (GMT), Simon Wilcox [EMAIL PROTECTED] wrote: On Mon, 14 Feb 2005, Ben Tilly wrote: I've also been told that Module::Build doesn't do a good job for people who want to install a module into a personal directory - it tries to install it into the system directory and then fails if you don't have permissions. :-( That's not been my experience. install_base seems to tdo the right thing as far as I can tell. From the man page: install_base You can also set the whole bunch of installation paths by supplying the install_base parameter to point to a directory on your system. For instance, if you set install_base to /home/ken on a Linux system, you'll install as follows: lib = /home/ken/lib arch= /home/ken/lib/i386-linux script = /home/ken/scripts bin = /home/ken/bin bindoc = /home/ken/man/man1 libdoc = /home/ken/man/man3 Note that this is different from how MakeMaker's PREFIX parameter works. PREFIX tries to create a mini-replica of a site-style installation under the directory you specify, which is not always possible (and the results are not always pretty in this case). install_base just gives you a default layout under the directory you specify, which may have little to do with the installdirs=site layout. The exact layout under the directory you specify may vary by system - we try to do the sensible thing on each platform. So ./Build --install_base=/home/simonw should do what you want. Do you have other experience ? Here's my experience. I'm using Module::Build with the compatibility layer in my modules. I had a user attempt to install my module into a personal directory using CPAN. Said user met with abysmal failure. Perhaps there was something obvious to do, but the user didn't find it, given my situation at the time (travelling with only minimal Internet access and no ability to test anything) I couldn't easily debug the problem, and I was therefore left unhappy. Given that my needs are very simple (test that this plain Perl module with few or no dependencies works, then copy it to the appropriate system directory), I've become very doubtful that Module::Build is buying me anything other than another dependency which sometimes can cause trouble. Once I deal with a failed hard drive at home, I'm planning to switch my modules away from Module::Build. I originally switched to it because I liked the idea of making it easier to do installs on multiple platforms, including Windows. But my impression is that virtually nobody on Windows actually uses Module::Build, they aren't really starting to, it complicates life on Unix, and I don't see at this point that it is buying me anything. (I haven't seen it buying me anything for a very long time, but until I had a frustrated user that I couldn't easily help I had no reason to get rid of it.) I also have disagreements with the Module::Build team, particularly their attitude towards providing backwards compatibility with the rest of the world. You can find my opinions summed up at http://www.perlmonks.org/?node_id=354276. Feel free to disagree with me as much as you want. But please remember that I'm criticizing a social issue, not a technical one. You can have all of the technical choices right, but if you can't get people to adopt it, then you've lost. Yes, it might be great if the world+dog adopted Module::Build. But we'll never get to the promised land unless the necessary social dynamics line up to get world+dog on board. Regards, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Install problems
On Mon, 14 Feb 2005 03:39:21 -0500, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT My recollection says that Debian likes to place the BT modules that it installs in /usr/lib while the ones BT that you install go into /usr/local/lib. Guess which BT one is first on Perl's library path? BT This can cause problems when you've installed the BT needed version of a module but Debian has placed BT an older version somewhere earlier in your path. BT My solution is to configure CPAN to pass the BT install-time argument UNINST=1, which causes CPAN BT to delete the conflicting Debian version of a module BT on install. (If I'm going to CPAN, I know what I want to BT do and Debian doesn't know enough to manage the BT packages for me.) i think another solution would be to just rip out debian's /usr/lib/perl5 and /usr/bin/perl and install perl from source using /usr/local/lib. then all cpan modules will be properly installed there and perl will be in /usr/local/bin. also then you get to build perl the way you want. my suse 9 has perl built with threads which slows all programs down. I think that Perl is part of the Debian core system - they have a lot of system utilities that use Perl, and have been known to trip interesting bugs when they switch versions. Plus were I to take your suggestion then I'd have to do a bunch of research to find out what custom modules they expect to have installed, and install them myself. Furthermore I could get into interesting fun if in a system upgrade Debian decided to reinstall its own version of Perl after all. And I'm suddenly back into the original problem with no idea what broke, and what all I need to fix to get back. If you want to use a /usr/local/bin/perl on Debian, go ahead. But replacing theirs with yours (either in local or not) seems to me to be bad sysadmining. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: Debian v CPAN RE: [Boston.pm] Install problems
On Mon, 14 Feb 2005 10:37:03 -0500, Ricker, William [EMAIL PROTECTED] wrote: doesn't support Module::Build so any modules Ouch iirc even if they have a compatibility Makefile.PL. Double ouch. Maybe it needs a patch. Thanks for the warning, that may put me off adopting Module::Build. I've also been told that Module::Build doesn't do a good job for people who want to install a module into a personal directory - it tries to install it into the system directory and then fails if you don't have permissions. :-( Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Install problems
On Sat, 12 Feb 2005 14:38:23 -0500, Joel Gwynn [EMAIL PROTECTED] wrote: Thanks for your response. See below [...] 6. what has changed since that installation? I can't think of anything. I may have upgraded something using apt-get My recollection says that Debian likes to place the modules that it installs in /usr/lib while the ones that you install go into /usr/local/lib. Guess which one is first on Perl's library path? This can cause problems when you've installed the needed version of a module but Debian has placed an older version somewhere earlier in your path. 7. do you know about any other problems from others with Geo::Code::US (it has failed before according to CPAN) and Bundle::CPAN (looks okay)? Both failed. 8. do you have all the prerequisites already installed for these two modules? I thought the CPAN module was supposed to handle the dependencies. [...] The CPAN module is supposed to handle dependencies. Debian is supposed to handle dependencies. When the two argue about what to do and how to do it, you lose. And they can't both be in control. My solution is to configure CPAN to pass the install-time argument UNINST=1, which causes CPAN to delete the conflicting Debian version of a module on install. (If I'm going to CPAN, I know what I want to do and Debian doesn't know enough to manage the packages for me.) Good luck locating the conflicting module versions. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Social Meeting Followup
On Thu, 20 Jan 2005 16:16:24 -0500, Ronald J Kimball [EMAIL PROTECTED] wrote: Fifteen Perl Mongers braved the cold last night to attend our Social Meeting at Fire+Ice in Harvard Sq., including our guest of honor, Ben Tilly. We ate lots of food, told bad jokes, and discussed Perl mind share, evolution, the history of Boston Perl Mongers, bioinformatics, the state of our web site, mathematics, and other topics. A few people stayed until the restaurant closed at 11pm (which is when the worst of the bad jokes were told!). And an enjoyable time was had by the guest of honor (aka the primary bad joke teller)! But I cannot speak for my victi^Hjoke recipients... Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] mind share
On Tue, 08 Mar 2005 22:37:29 -0500, William Goedicke [EMAIL PROTECTED] wrote: Dear Tom - I've thought a lot about why perl hasn't gained respect in the deployment/hiring marketplace. Tom == Tom Metro [EMAIL PROTECTED] writes: Tom This reminded me of something I've wondered about for a long Tom time. Why did PHP become as successful and popular as it is, Tom even though it mostly offers a subset of what Perl can Tom do. I think that PHP gained popularity for two reasons. It initially met a need, that is, to embed logic within html. Second, it was simple. And a third. It is so limited that hosting companies have no problem enabling PHP. Therefore if you want to use a $20 host you get the choice: use PHP and be fast or use Perl CGIs and be slow. But you can't use mod_perl unless you run your own server. Tom Similarly, Java, seemingly through the addition of servlets, Tom succeeded at enterprise web development, despite Perl having Tom been there first. It was more than that. There was a successful marketing campaign which portrayed security, deployability and state-of-the-artness. And don't forget that Java had the aura of corporate support before open source had any mindshare. (Sun began marketing Java before open source was even a phrase!) Tom Today mod_perl is only rarely recognized as being an Tom application server. But, among productivity focused programmers mod_perl is recognized as one of the best frameworks to deliver web applications. I'm not sure whether, at this point, there is much in practice to distinguish mod_perl from competitors like mod_python. I'm also not sure how many people have the mindset that mod_* is really an application server that just happens to work over the web really well. I'm also dubious of how well disseminated basic mod_perl best practices are. For instance how many know to use reverse proxies for performance? See http://perl.apache.org/docs/1.0/guide/strategy.html#Adding_a_Proxy_Server_in_http_Accelerator_Mode for details. Tom More recently, there's Python [...] great success with its Tom own application server, Zope. As a perdominately perl programmer I must say I love zope and bemoan the lack of comparable CMS in perl. Well you can use Zope from within Perl: http://www.zope.org/Wikis/zope-perl/FrontPage Personally I don't like Zope. It made the mistake of pushing you to have code in an opaque repository that cannot be trivially integrated (or at least could not when I last checked, which was a long time ago) with standard revision control systems. They may have fixed this since I looked - they certainly have fixed a lot of potential problems, but from my point of view this is a deal breaker. For the same reason, no matter how tempting it is to have code in a database, don't. Ditto for your basic configuration information. See http://www.perlmonks.org/?node_id=394251 for a slightly longer rant about this. Tom And lastly, C#, which has borrowed ideas from Perl, Java, and Tom C++. Competing with the commercial software world is a whole different animal. We were already discussing Java which is part of the commercial software world. Tom All of these are aspects of the same theme - Perl loosing Tom mindshare to other technologies. It started out as a quiet, Tom underground language (telling someone you programmed in Perl Tom back in the late 80's, early 90's just got a blank stare) and Tom is perhaps heading back there (I've noticed it getting Tom dropped off the list of programming languages listed on trade Tom magazine qualification forms). Siiggg.. You're right, of course, but, isn't that issue all about the battle with the commercial world. My impression is that the Perl job scene has been improving in the last couple of years. My other impression is that Perl has an unfortunately high proportion of programmers who have messed with basic CGI but do not understand programming very well. Having said that, I'm a leader in a consulting firm and I'm struggling to convince my firm that we should develop a LAMP Enabling practice. I see tons of organic LAMP deployment occuring. The idea of my consulting product is that LAMP deployments are immature and that there's value-adding consulting in making LAMP deployments enterprise quality and by aligning them with strategic goals. I love using the phrase enterprise quality and I hate hearing it. Both for the same reason. You can mean anything you want by it, but the listener is likely to give it a very generous interpretation. :-/ Perl's strength, in my mind, is that it has enormous breadth. As an example; I write some app and after the fact realize I need to process barcodes. No problem. This is an important strength, but it becomes less important as projects become more significant. What I mean by that is this: for small projects you can
Re: [Boston.pm] When will the Jan meeting be?
On Fri, 31 Dec 2004 02:26:44 -0500, Sean Quinlan [EMAIL PROTECTED] wrote: On Fri, 2004-12-31 at 02:05, Ben Tilly wrote: Sunday? (Checks quickly.) I'll try to make the next Sun the 19th of Jan in 2014 if I'm around. DOH! Right my calendar is still on Dec. That's my cue I should have been off to bed a while ago! :) I suspected as much... In the meantime I'd love to make Thu the 19th of Jan in 2005 at, say, 7 pm... ;-) How about Wed the 19th of Jan 2005? ;-} Of course, at this point I can't be certain of much obviously, but at least we've gotten started! And to quote Homer, D'oh! My bedtime was also exceeded... Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] When will the Jan meeting be?
On Fri, 31 Dec 2004 11:58:09 -0500, Sean Quinlan [EMAIL PROTECTED] wrote: On Fri, 2004-12-31 at 11:43, Ronald J Kimball wrote: Sorry for the inconvenience! If we can quickly select an alternate day for Jan I'll try to get it scheduled ASAP. How about a technical meeting on Tuesday, Jan 25? Meanwhile, we can discuss what day would be best for future meetings. OK, I'll find us a room for the 25th. After some negotiation, we have agreed upon the 19th of January, 2005, being a Wednesday. Now we just need to decide upon a location. I'd favor somewhere we don't usually go to (i.e. not Boston Beer Works or Cambridge Brewing Company). LMAO! Well, I think Uri suggested FireIce in Harvard Sq., which is certainly someplace new. Parking in Harvard is the pits, but access by T is good and parking can be found free or cheap around Davis Sq. or at Alewife if you have to drive and don't want to face Harvard. I think that I'll be near Mass General that week, which I think is near Harvard. (I don't know Boston, it may be on the other side of the city for all I know.) So Harvard is likely to be convenient for me. Non-Harvard alternates in Kenmore Sq. are Bertuchi's and Uno's. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] multiple encodingd - utf8
On Wed, 29 Dec 2004 23:16:24 -0500, Jeff Finn [EMAIL PROTECTED] wrote: hey all, I have a group of files in a directory on a linux box where the file names are either encoded with utf-8 or shift_jis. Unfortunately, not knowing japanese, I have no idea which is which. Is there a way to go through the directory and determine how the filenames are encoded? Ultimately, I want to put this directory listing on the web, and I want the browser to be able to display the correct names of all files without having to manually toggle the charecter encoding. I've never used it but the Jcode module exports a getcode function that looks like it will do what you want. The documentation for Jcode suggests that it should be superceded in Perl 5.8 by the Encode module, but I didn't browse its documentation enough to verify that. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Max hash key length
On Thu, 30 Dec 2004 18:02:07 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Wed, 2004-12-29 at 18:10, Ben Tilly wrote: Under normal circumstances, to get non-miniscule odds of having a collision somewhere between MD5 keys, you'd need about 2**64 keys. If you have less than, say, a billion keys then you can ignore that possibility for all practical intents and purposes. I understand risk assessment and the idea that nothing is 100% safe, but when you have a situation where you KNOW from day one that some keys will collide, and your data will be corrupted, you don't build that into your system if you have an easy out. Then I recommend that you never use rsync. As for me, I'm sometimes willing to accept the possibility of algorithm failures which are less than the odds of my program going wrong because of cosmic radiation. This is hashing 101. You hash, you bucket based on the hashes, and then you store a list at each bucket with key and value tuple for a linear search. There are other ways to do it, but this is the classic. Yes, I'm familiar with this, and outlined it in a previous email in this thread. Of course, Perl does this for you. That extra time that I measured is almost certainly the time spent comparing the two strings, which your tie interface will also have to do because of collisions. Want to bet whether Perl spends more time in computing hash values or comparing strings? Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] Can anyone lend me a dorm fridge?
I'll be in New England from Jan 3 through Jan 22. It would be extremely nice for me to have a portable fridge so that I can keep my son's food in my hotel room. I'd rather not buy one for just a few week stay. If anyone has a fridge that they can lend me for that period, please get back to me or Linda Julien ([EMAIL PROTECTED]). Thanks, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Can anyone lend me a dorm fridge?
On Fri, 31 Dec 2004 01:53:31 -0500, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT I said New England for a reason. I'll be in a number of hotels and a BT number of states. While I'd expect some hotels to work out, I don't BT think that I should plan on always being lucky, hence my desire for a BT portable fridge that I can take with me... CLUEBATi get it!/CLUEBAT well, calling them all is still possible. lugging around a dorm fridge is a pain unless you must have it. Calling them all means figuring out what they all are in advance. That takes something known as planning, which I've never been known for. Ironically I used to own a fridge that would fit the need, but threw it out because I moved into somewhere that had a real fridge. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] When will the Jan meeting be?
On Fri, 31 Dec 2004 01:30:46 -0500, Sean Quinlan [EMAIL PROTECTED] wrote: [...] In honor of Ben's visit I hereby propose a social event for Sun the 19th of Jan at say 7pm. any interesting new suggestions for a location? Given the season the closer to a T station the better. And of course a good beer selection is required! ;-} Sunday? (Checks quickly.) I'll try to make the next Sun the 19th of Jan in 2014 if I'm around. In the meantime I'd love to make Thu the 19th of Jan in 2005 at, say, 7 pm... ;-) -- Sean Quinlan [EMAIL PROTECTED] ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Max hash key length
On Wed, 29 Dec 2004 10:49:19 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Tue, 2004-12-28 at 13:46, Ian Langworth wrote: On 28.Dec.2004 01:14AM -0500, Tom Metro wrote: If you are concerned about the performance impact of long keys, and your application fits a write-once, read-many model, then you could always hash the hash keys. Say generate an MD5 digest of the key string, and then use the digest as the hash key. This might make a nice Tie:: module, if there already isn't one. But then again, tie itself is allegedly slow... No, that would defeat the point Or at least that's what I was going to say... I had a whole rationale typed up, but then I went to benchmark my hypothesis and I get this: $ perl -MBenchmark -e 'my $long=ax10_000;my %x;timethis(100_000,sub {$x{$long}++});print Final: $x{$long}\n' timethis 10: 7 wallclock secs ( 6.54 usr + 0.00 sys = 6.54 CPU) @ 15290.52/s (n=10) Final: 10 $ perl -MBenchmark -e 'my $long=ax10_000;my %x;timethis(100_000,sub {my $tmp=unpack(%32C*,$long) % 65535;$x{$tmp}++});my $tmp=unpack(%32C*,$long) % 65535;print Final: $x{$tmp}\n' timethis 10: 2 wallclock secs ( 2.16 usr + 0.00 sys = 2.16 CPU) @ 46296.30/s (n=10) Final: 10 Is there a bug in my code, or is there really that substantial a savings? The savings is somewhere between nothing and the ratio of lengths of string, so it is in the range that I expected. I see no obvious errors in your code. So I'd suspect that you're seeing what the savings looks like. Of course, there's a substantial problem with the above: hashes DO conflict. Your module would have to do the same conflict resolution that perl's built-in hashing would do, and that's probably where the extra overhead comes in (though I admit I'm not seeing it... perhaps in comparing the long value to the original?) Think about what Perl has to do to do a hash lookup. 1. Compute a hash value. This is a calculation that goes character by character, and hence takes an amount of time that is proportional to the length of the key. 2. Figure out which bucket that hash value would go into. This is a fixed numerical calculation. 3. Walk through the linked list that that bucket points to, checking whether or not you have the right value. As an optimization Perl does this check by first comparing the hash value and string lengths and only then does it compare the strings for equality. Walking the list is (on average if the hashing algorithm works well) a fixed time operation. But testing for equality takes time proportional to the length of the key. So you see that (if the hashing algorithm does a good job of distributing keys to buckets), the time to access a hash element is independent of the number of things in the hash but proportional to the length of the hash key. In a case where collisions wouldn't be a real problem, I guess that's a non-issue, but those are rare cases. If your hashing algorithm does not take time proportional to the length of the thing to be hashed, then it is ignoring possible differences in big chunks of that thing. Cases where you'd be willing to do that are likely few and far between. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Max hash key length
On Wed, 29 Dec 2004 13:13:22 -0800, Palit, Nilanjan [EMAIL PROTECTED] wrote: Folks, Thanks for the good ideas the performance discussion. I'll try out the different suggestions. Now, regarding Tom Metro's original suggestion for using an MD5 Digest: I read that the original MD5 algorithm has known issues with collisions. Any experiences with how well Digest::MD5 does when used with many millions of keys? Do I need to test for collisions myself (at the expense of lost performance), or is it pretty well tested (or proved?) to stand up to an intensive application? FYI, known issues means that we have a known way to produce two files with the same MD5 hash that is faster than just looking for one. Under normal circumstances, to get non-miniscule odds of having a collision somewhere between MD5 keys, you'd need about 2**64 keys. If you have less than, say, a billion keys then you can ignore that possibility for all practical intents and purposes. That said, the suggestion of using MD5 keys is a non-starter for eliminating the performance issue. Calculating an MD5 hash of a string of length n is O(n). In fact _any_ decent hashing algorithm is going to take time proportional to the length of the string because if you try to take less time, then you have to skip parts of the string and then you can't notice changes in the skipped part of the string. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Max hash key length
On Wed, 29 Dec 2004 18:54:41 -0500, Tom Metro [EMAIL PROTECTED] wrote: Ben Tilly wrote: That said, the suggestion of using MD5 keys is a non-starter for eliminating the performance issue. Calculating an MD5 hash of a string of length n is O(n). The qualifier I added to my suggestion of using MD5 was that the application be of a write-once, read-many nature, with respect to the keys. Thus once you generate the digest of the long key, you cache it. It isn't a universal solution. Where do you cache it, and given that you've cached it, why not cache the hash value instead? That is, if you're going to write once, read many times, you can do something like this: my $value_ref = \ $hash{$big_key}; # now access $$value_ref lots of times This strikes me as simpler and more efficient than the following: my $short_key = md5($big_key); # now access $hash{$short_key} lots of times The only problem with this scheme is that if $big_key is not in the hash it can be inconvenient. But even so, you may find that two regular hash lookups in Perl are faster than computing one md5 hash. (Or you may not - benchmark it.) Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] When will the Jan meeting be?
On Tue, 28 Dec 2004 13:01:28 -0500, Ronald J Kimball [EMAIL PROTECTED] wrote: On Sun, Dec 19, 2004 at 11:35:45PM -0800, Ben Tilly wrote: As I said before, I'll be in Boston for part of January. January 19 would be particularly convenient for me to meet with Boston.pm people. If it is another time I could try to make it (but probably won't succeed). But I'll need a basic plan soonish because my access to a computer will be spotty through January. I just realized that we don't have a date reserved at BU for the January meeting. Ben, would you prefer meeting people at a social meeting or a technical meeting? I'd prefer meeting people at whichever kind of meeting would result in my meeting more people and having more of an opportunity to talk to them. :-) However if a technical meeting is better for that, don't look at me to present anything. I'm spending January taking care of my son, while my wife goes to residency interviews, and Jan 19 will be my first break from baby care all year. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] Max hash key length
On Mon, 27 Dec 2004 16:36:38 -0800, Palit, Nilanjan [EMAIL PROTECTED] wrote: I wanted to know if there are any limitations to the max key length used for hashes in Perl. Also, what are the performance implications, if any, of using long keys? I have an application that needs key lengths in the range of ~1000, but with relatively limited numbers of keys (few to low tens of thousands). There is no upper limit beyond access to RAM. The performance implication is that computing a hash value takes time proportionate to the length of the key. So doing a hash lookup for a key of length 1000 could be up to 100 times slower than doing a hash lookup for a key of length 10. (It won't actually be 100 times as slow because there are other steps which take the same time, but without benchmarking it I don't know what the likely time is.) As always, the average time to access an element in a hash should be independent of the number of keys that you have. Cheers, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
[Boston.pm] When will the Jan meeting be?
As I said before, I'll be in Boston for part of January. January 19 would be particularly convenient for me to meet with Boston.pm people. If it is another time I could try to make it (but probably won't succeed). But I'll need a basic plan soonish because my access to a computer will be spotty through January. Thanks, Ben ___ Boston-pm mailing list Boston-pm@mail.pm.org http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] transposing rows and columns in a CSV file
On Sat, 13 Nov 2004 17:43:37 -0500, Uri Guttman [EMAIL PROTECTED] wrote: BT == Ben Tilly [EMAIL PROTECTED] writes: BT How was I confusing issues? What I meant is that calling mmap does BT not use significant amounts of RAM. (The OS needs some to track BT that the mapping exists, but that should be it.) Once you actually use BT the data that you mmapped in, file contents will be swapped in, and BT RAM will be taken, but not until then. mmap uses virtual ram which means physical ram and swap space. so mmap can suck up as much physical ram as you have if you allocate it. We are both right, and we are both wrong. The reality is that any such behaviour on the part of mmap is OS and implementation dependent. And an intelligent OS will make much of it configurable. Let me refer to my local Linux manpage. For this purpose I'd specify MAP_SHARED. In that case edits made to memory are made to the file, and there is no need to reserve RAM or swap. (Not that Linux would reserve RAM anyways, Linux allows overcommitting.) Something close to your described behaviour may happen if you use MAP_PRIVATE and don't specify MAP_NORESERVE. If you specify MAP_NORESERVE you won't use swap. (You might get a SIGSEGV if you make a write when RAM is not available.) Which reinforces the point that specific details about the side-effects of mmap (or any system call) are implementation dependent and should never be assumed. BT As for a 3 GB limit, now that you mention it, I heard something BT about that. But I didn't pay attention since I don't need it right now. BT I've also heard about Intel's large addressing extensions (keep 2GB BT in normal address space, page around the top 2 GB, you get 64 GB BT of addressible memory). I'm curious about how (or if) the two can BT cooperate. eww, that sounds like manual management of extended memory. like the days of overlays or even pdp-11 extended memory (which i used to access 22 bit address (yes, 22 bit PHYSICAL space) from a 16 bit cpu.). not something you want to deal with unless you have to. You got it. Worse yet, from the way that I see it, in the next 5 years our industry must decide whether the bad old days are going to return, and I don't know which way it will jump. The problem is that consumer computers will soon use over 4 GB of RAM. The obvious clean solution is to switch to 64 bit CPUs. But as soon as you go to 64-bit pointers, a lot of programs and data structures grow (worst case they double) so there is a big jump in memory needs. (Rather than being a bit over your needs, you are a lot over.) And when data bloats, moving that data around the computer slows down as well. Programmer pain is invisible. That size and performance hit is not. A couple of years ago Intel quietly added an addressing extension to allow for up to 64 GB of RAM, and then pursued a non-consumer 64-bit strategy (which flopped). The way that I read that is that Intel expects the consumer industry to do as it did for a decade after the 16 to 32 bit conversion should have happened - stick with smaller pointers and swallow the addressing difficulty. AMD's strategy was more public. They came out with a 64-bit CPU that solved a bunch of problems with the x86 architecture, meaning that if you switch to 64-bit mode there is a good chance that your code will speed up. Their CPU has been successful enough that Intel has been compelled to issue a copycat CPU. But which way the industry will jump is still unclear to me. To me the first good test is what happens when high-end games start having memory requirements that are too big for straightforward 32-bit access. (A limit which conventially is placed at 2 GB, but can be stretched to somewhere between 2 GB and 4 GB.) Will they manually manage some big chunks of data, or will they require an AMD Athalon-compatible computer? as for the original problem, i keep saying that mmap will give little help if the input matrix won't fit into physical ram. once you start swapping (manually or virtually) all bets are off on speed of any transpostion algorithm. you have to start counting disk accesses and once you do, who care how it was done (mmap or whatever)? mmap with MAP_SHARED may reduce your RAM requirements, and improves your access speed. I agree that the difference is pretty marginal. As for your counting disk accesses, I've already pointed out that disk accesses are not created equal. If you really care about the performance of the application, you need to benchmark, not make simplistic estimates. Because unless you know a lot of detail about how the disk drive works, you can't easily predict what the actual performance will be. [...] BT However the over-committed allocation comment confuses me. BT Why would a single mmap result in over committing memory? you can allocate all the virtual space allowed
Re: [Boston.pm] transposing rows and columns in a CSV file
On Mon, 15 Nov 2004 15:58:11 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Sat, 13 Nov 2004 11:40:25 -0800, Ben Tilly [EMAIL PROTECTED] wrote: On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote: [...] Um, mmap does not (well should not - Windows may vary) use any RAM You are confusing two issues. using RAM is not the same as allocating process address space. How was I confusing issues? Let me demonstrate: What I meant is that calling mmap does not use significant amounts of RAM. Calling mmap uses NO RAM. It doesn't interact with RAM at all. But does allocate (potentially huge) amounts of process address space, and reserves it in such a way that your process can no longer allocate it for uses like libc's memory allocator (which you access through functions like malloc). I feel like we are all talking past each other. Let's go back to basics. When I say RAM I mean the physical RAM on the computer. Whether or not that RAM is currently allocated to your process. So if you do something and that makes something else get paged out, then you've used RAM in my view. Whether that RAM is in pages that are attached to your process, or was used by the kernel I still see that as you using RAM. If you mmap a 3GB file (actually less than 3GB, but I'll use that number as an example for now) on an x86 linux box and then call malloc, you get back a NULL pointer because malloc will fail. This is actually not quite true. That malloc will likely work because it will be allocated from some existing page of address space that malloc's internal page allocator reserved before you called mmap, but that won't work for long. I'm aware of this and wasn't disputing it. (The OS needs some to track that the mapping exists, but that should be it.) Actually, no. The place that mmap is tracked is a) in the file descriptor table, which is outside of your 3GB process space in kernel-space and b) in the system page table, which is not in your address space at all, but in hardware. Where it is tracked doesn't concern me. That it is tracked, does. However I realize that I don't know enough about how the memory management is handled. I would think that this would be dynamic in some way - on creating a process the kernel should need to write very little data, but will then write more later. But I don't know enough to verify that one way or the other. Once you actually use the data that you mmapped in, file contents will be swapped in, and RAM will be taken, but not until then. RAM will be taken is a meaningless term here. Ignore RAM for purposes of this conversation. On the one hand you are saying that I'm confused about what I meant by a comment about using RAM, and on the other you are telling me that I am to ignore RAM for the purposes of this conversation, it is meaningless. There is a contradiction there. For discussing what *you* want to talk about it may be meaningless, but for discussing what *I* had been talking about it isn't. And for deciding whether or not I was confused it most definitely isn't. (Perhaps you're confused about what I was talking about?) It appears that you want to discuss what the world looks like to a process. For that I wholeheartedly agree, talking about what is in RAM is generally counterproductive, if the abstraction of virtual memory works, then you should never know or care about what is or is not in RAM. But I was talking about what things lead to resource consumption that could adversely affect a machine which is carrying out a particular computation. For that it matters a great deal whether particular operations are going to cause pages of RAM to be discarded and allocated for something else. Because when it comes to actual performance, the abstraction of virtual memory leaks badly. (And what I was saying about resource consumption is that mmap doesn't. Consume in meaningful amounts that is.) As for a 3 GB limit, now that you mention it, I heard something about that. But I didn't pay attention since I don't need it right now. Suffice to say that your process cannot be larger than 3GB under x86 Linux. There are extensions, options and hacks if you want to go larger, but after 3GB it gets very dicey. Are you saying that Linux does not give an user-level API to Intel's addressing extensions? Or that it does but you recommend against using it? I've also heard about Intel's large addressing extensions (keep 2GB in normal address space, page around the top 2 GB, you get 64 GB of addressible memory). I'm curious about how (or if) the two can cooperate. The ability to re-map memory like this is quite common, and the *OS* can take advantage of it, but as long as you're on an x86 and using 32-bit pointers, your one process can still only have 3GB of address space (4GB-1GB for system area). But it could, for example
Re: [Boston.pm] transposing rows and columns in a CSV file
On Mon, 15 Nov 2004 18:46:15 -0500 (EST), Dan Sugalski [EMAIL PROTECTED] wrote: On Mon, 15 Nov 2004, Ben Tilly wrote: On Mon, 15 Nov 2004 15:58:11 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Sat, 13 Nov 2004 11:40:25 -0800, Ben Tilly [EMAIL PROTECTED] wrote: On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote: [Massive snippage of two ships passing in the night] Heh. :-) In Perl I'd expect it to be possible but fragile. If Parrot could make it possible and not fragile, that would be great. In parrot it's quite robust. Parrot supports buffers as core PMC types. A buffer can refer to any part of memory with any read-only or copy-on-write semantics you like. That would be nice. Incidentally will Parrot also support efficiently building strings incrementally? I like the fact that in Perl 5 it is O($n) to do something like: $string .= hello for 1..$n; In most other languages that is quadratic, and I'm wondering what to expect in Perl 6. That's not O(n) in Perl 5, it's just smaller than O(n^2). The same's true for Parrot -- we've got mutable strings and generally over-allocate, so it's not going to be quadratic time. Neither, though, is it going to be linear. Expect somewhere in between. I don't have source-code in front of me, but my memory says that when you have to reassign a string in Perl 5, the amount of extra length that you give is proportional to the length of the string. (If that is not how Perl does it, then Perl darned well should!) In that case it really is O(n). What happens is that the total recopying work can be bounded above by a geometric series that converges to something O(n). Everything else is O(n). So the result is O(n). I gave you more details at http://www.perlmonks.org/?node_id=276051. (Hey, my math background has to be good for something...) Perl uses the same strategy for other data structures, including growing a hash and on various array operations. Of course the main reason that I'm familiar with this trick is that my 2 miniscule core contributions to Perl were performance enhancements from using this trick in places where it had not been previously used. (i.e. unshift and map.) Unlike you, I don't have any other performance knowledge to confuse me... Cheers, Ben ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] transposing rows and columns in a CSV file
On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] wrote: On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote: [...] Um, mmap does not (well should not - Windows may vary) use any RAM You are confusing two issues. using RAM is not the same as allocating process address space. Allocating process address space is, of course, required for mmap (same way you allocate address space when you load a shared library, which is also mmap-based under Unix and Unix-like systems). All systems have to limit address space at some point. Linux does this at 3GB up to 2.6.x where it becomes more configurable and can be as large as 3.5, I think. How was I confusing issues? What I meant is that calling mmap does not use significant amounts of RAM. (The OS needs some to track that the mapping exists, but that should be it.) Once you actually use the data that you mmapped in, file contents will be swapped in, and RAM will be taken, but not until then. As for a 3 GB limit, now that you mention it, I heard something about that. But I didn't pay attention since I don't need it right now. I've also heard about Intel's large addressing extensions (keep 2GB in normal address space, page around the top 2 GB, you get 64 GB of addressible memory). I'm curious about how (or if) the two can cooperate. To be clear, though, if you had 10MB of RAM, you could still mmap a 3GB file, assuming you allowed for over-committed allocation in the kernel (assuming Linux... filthy habit, I know). Exactly what I was referring to. However the over-committed allocation comment confuses me. Why would a single mmap result in over committing memory? mmap should not cause any more or less disk accesses than reading from the file in the same pattern should have. It just lets you do things like use Perl's RE engine directly on the file contents. Actually, no it doesn't as far as I know (unless the copy-on-write code got MUCH better recently). Where does a write happen? I was thinking in terms of using the RE engine (with pos) as a tokenizer. I was thinking that you'd use something like Sys::Mmap's mmap call directly so that there is a Perl variable that Perl thinks is a regular variable but which at a C level has its data at an mmapped location. Fragile, I know (because Perl doesn't know that it cannot reallocate the variable), but as long as you are careful to not cause it to be reallocated or copied, there should be no limitations on what you can do. Like I said, you probably won't get the win out of mmap in Perl that you would expect. In Parrot you would, but that's another story. In Perl I'd expect it to be possible but fragile. If Parrot could make it possible and not fragile, that would be great. Cheers, Ben ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] transposing rows and columns in a CSV file
On Fri, 12 Nov 2004 07:38:57 -0500, Gyepi SAM [EMAIL PROTECTED] wrote: On Fri, Nov 12, 2004 at 02:11:37AM -0500, Aaron Sherman wrote: [...] I think mmap would be just as ideal in Perl and a lot less work too. Rather than indexing and parsing a *large* file, you must mmap and parse it. In fact, the CSV code, which was left as an exercise in you pseudo-code, would be the only code required. It depends on your definition of ideal. A Perl string is far more complex than a C string, and translating between the two adds complexity. It requires an external module and adds platform dependencies. I should point out though that mmap has a 2GB limit on systems without 64bit support. Such systems can't store files larger than that anyhow. This is at best 2/3 correct. First you're right that mmap has a 2 GB limit because it maps things into your address space, and so the size of your pointers limit what you can address. It is also correct that there are complications in handling large files on 32 bit systems. Most operating systems didn't handle that case. However today most 32 bit operating systems have support for large files, and Perl added the necessary hooks to take advantage of it several versions ago. So if you have a relatively up to date system, odds are very good that you don't have a 2 GB limit. Certainly not on Windows or Linux. Cheers, Ben ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm
Re: [Boston.pm] transposing rows and columns in a CSV file
On Fri, 12 Nov 2004 10:05:27 -0500, Uri Guttman [EMAIL PROTECTED] wrote: GS == Gyepi SAM [EMAIL PROTECTED] writes: [...] this talk about mmap makes little sense to me. it may save some i/o and even some buffering but you still need the ram and mmap still causes disk accesses. Um, mmap does not (well should not - Windows may vary) use any RAM (other than what the memory manager needs to keep track of the fact that the mapping has happened). Using mmap does not imply any particular algorithm, it is an optimization saying that you're going to leave the ugly details of paging the file to/from your process to the OS. mmap should not cause any more or less disk accesses than reading from the file in the same pattern should have. It just lets you do things like use Perl's RE engine directly on the file contents. if the original file is too big for ram then the algorithm chosen must be one to minimize disk accesses and mmap doesn't save those. this is why disk/tape sorts were invented, to minimize the slow disk and tape accesses. so you would still need my algorithm or something similar regardless of how you actually get the data from disk to ram. and yes i have used mmap on many projects. I'm not sure what you mean by something similar, but yes, you'll need SOME algorithm to solve the problem. Which statement is so general as to be meaningless. I'm sure that there are some possible algorithms that you'd never have thought of. (Mostly because they're bad.) Disk/tape sorts were invented because back in the day there was not enough RAM to do anything useful and so everything had to go to disk. Of course once you're forced to go to disk, why not optimize it...? Of course this problem said to guarantee being able to do the sort, not necessarily to do it most efficiently. Therefore no single criteria - including disk accesses - necessarily MUST dominate your choice. Furthermore disk accesses are not created equal. There are multiple levels of cache between you and disk. Accessing data in a way that is friendly to cache will improve performance greatly. In particular managing to access data sequentially is orders of magnitude faster than jumping around. The key is not how often you access disk, it is how often your hard drive has to do a seek. When it needs to seek it reads far more data than it is asked for and puts that in cache. When you read sequentially, most of your accesses come from cache, not disk. That is why databases use merge-sort so much, it accesses data in exactly the way that hard drives are designed to be accessed most efficiently. A quick sort has fewer disk accesses, but far more of them cause an unwanted seek. when analyzing algorithm effienciency you must work out which is the slowest operation that has the steepest growth curve and work on minimizing it. since disk access is so much slower than ram access it becomes the key element rather than the classic comparison in sorts. in You must, must, must. What is this preoccupation with must? As I just pointed out, disk accesses are not all equal. Secondly in many applications you will *parallelize* the slowest step, not minimize it. For instance good databases not only like to use mergesort internally, they often distribute the job to several processes or threads that all work at once, that way if one process is waiting on a disk read, others may be going at the same time. Thirdly, and most importantly, it is more important to make code work than to make it efficient. If a stupid solution will work and a smart one should be faster, code the stupid solution first. a matrix transposition in ram, i would count the matrix accesses and/or copies of elements. with a larger matrix, then ram accesses would be key. my solution would load as much matrix into ram as possible (maybe using mmap but that is not critical anymore) and transpose it. then write the section out. that is 2 (large) disk accesses per chunk (or 1 per disk block). then you do a merge (assuming you can access all the sction files at one time) which is another disk access per section (or block). and one more to write out the final matrix (in row order). so that is O((2 + 2) * section_count) disk accesses which isn't too bad. You said that you want to assume that we can access all section files at once. Well suppose that I take a CSV file which is 100 columns by 10 million rows, transpose it, then try to transpose it again. Your assumption just broke. Maybe it would work for the person with the original problem, maybe not. Here is the outline of a solution that avoids all such assumptions. 1. Run through the CSV file and output a file lines of the format: $column,$row:$field You'll need to encode embedded newlines some way, for instance s/\\//g; s/\n/\\n/g; - you may also want to pre-pad the columns and rows with some number of 0's so that an ASCII-betical sort does The Right Thing. 2. Sort the intermediate
[Boston.pm] Passing through
I'll be in town early next year and would like to meet some of the locals. Exact dates have not been nailed down yet, but I should be in Boston from January 14-21 or so. If a boston.pm meeting could happen in that time, I'd be interested in going. If nothing official happens, I'd be up for something unofficial. The best day for me probably will be Wed, Jan 19. Cheers, Ben ___ Boston-pm mailing list [EMAIL PROTECTED] http://mail.pm.org/mailman/listinfo/boston-pm