from:"Ben Tilly"

Re: [Boston.pm] 64 bit perl boost?

2006-06-21 Thread Ben Tilly

Double check where the limit is.  It may well be 2 GB.

Ben

On 6/21/06, James Eshelman [EMAIL PROTECTED] wrote:
 Thanks Sherm.  It looks like there might be some benefit for high-end users
 who are likely to go beyond 4GB VM but we can postpone it 'til then.


 - Original Message -
 From: Sherm Pendley [EMAIL PROTECTED]
 To: James Eshelman [EMAIL PROTECTED]
 Cc: boston-pm@pm.org
 Sent: Wednesday, June 21, 2006 11:10 AM
 Subject: Re: [Boston.pm] 64 bit perl boost?


 On Jun 21, 2006, at 10:23 AM, James Eshelman wrote:

  I have a large O-O perl system running on Fedora Core 3 ( I know,
  it's old! - that's a separate subject) on Xenon 64-bit
  processors.   The perl interpreter is only a 32-bit app.  Anyone
  have an idea how much performance boost we're likely to get by
  recompiling everything for 64-bits?

 Does your app need more than 4G of virtual memory space?

 Does your app spend a significant amount of its time splitting huge
 numbers into 32-bit chunks so it can cope with them?

 If you answered no to these questions, don't bother recompiling. It
 won't help.

 sherm--

 Cocoa programming in Perl: http://camelbones.sourceforge.net
 Hire me! My resume: http://www.dot-app.org


 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm


 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Perl Curses clarification

2006-06-01 Thread Ben Tilly

I am going to second the suggestion to write an upload feature from a
spreadsheet.  You're not going to invent a better UI, and it is going
to take you a lot more work.  Plus there is no security issue here -
anyone can do anything they want with the spreadsheet but they can
only enter it into your application if they have the data.

BTW I'm going to suggest that you not use flatfiles for this.  Use a
database, if only something trivial like SQLite.  Your program will be
simpler and faster.  If your barrier is that you don't understand
databases, well now is the time to learn.

Cheers,
Ben

On 6/1/06, Janet Marie Jackson [EMAIL PROTECTED] wrote:
  Thanks to those who have answered.  Let me clarify a bit more what I
 need to do.  We want to use $USER to verify a valid user before running
 the program, so this is very unlikely go on the web or have a web
 interface.  If a teaching assistant's personal account is compromised,
 we're really in deep you-know-what - otherwise, it's our best choice for
 security.  The program will not be accessed by anyone other than the
 course staff.
   There will be a back-end flat file (probably CSV) listing the
 current students, their basic info, and their homework scores, one per
 line.  When the user logs in, he/she will be presented with a menu along
 these lines:

 1.  View scores
 2.  Enter scores
 3.  Exit

 Your choice:

 The user can run the program by adding student names to the command line,
 in which case the choice will include only those students specified.
 Otherwise, based on the $USER, that person's section will be included.
 If the choice is to view scores, all students (or those from the args) and
 their scores will be shown.  If the choice is to enter scores, the user is
 asked which homework, then the program will allow the user to enter a
 score for each student for the stated homework.  At the end, the updates
 are written back to the file.
  The feature to enter scores is where I'm stuck...  I'm debating
 trying to use something like curses to display all the student names and
 to allow the user to navigate with arrow keys to enter scores, vs.
 displaying one student at a time with a request for the score (which
 doesn't need curses, but takes more time) - or some other variant of
 either of these.  At the same time, if the user doesn't want to go
 through the entire section, he/she CAN specify only certain students.  I
 want this to be comfortable and convenient to use, so can't decide which
 approach is preferable.
  Thanks for your ideas! Jan


 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] version of perl that can use scalar filehandles

2006-05-23 Thread Ben Tilly

Use scalar filehandles?  You're probably thinking of 5.6.0, which was
the first that would let you autovivify filehandles.  As far as I
know, through the 5.x series you could always do:

  my $fh = do {local *fh};
  open($fh,  $somefile) or die Can't read '$somefile': $!;

If you didn't remember the do local trick, you could always use Symbol
and then call gensym to get your typeglob.

As for passing old-style filehandles, both of the following syntaxes
are likely to work:

  call_function(*FILEHANDLE);
  call_function(\*FILEHANDLE);

Cheers,
Ben

On 5/23/06, Greg London [EMAIL PROTECTED] wrote:
 more importantly, what is the syntax for passing a filehandle
 into a routine if it is FILEHANDLE instead of $FILEHANDLE?


 

 From: [EMAIL PROTECTED] on behalf of Greg London
 Sent: Tue 5/23/2006 4:10 PM
 To: boston-pm@mail.pm.org
 Subject: [Boston.pm] version of perl that can use scalar filehandles



 what was the earliest version of perl that would
 allow you to use scalar filehandles?

 open(my $fh, filename);

 instead of

 open(FILEHANDLE, filename);


 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm



 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] version of perl that can use scalar filehandles

2006-05-23 Thread Ben Tilly

Do not weep.

What changed in 5.6 was that it started autovivifying them.  Just make
the following conversion:

  open(my $fh, $file) ...

  my $fh = do {local *FH};
  open($fh, $file) ...

and your problem is fixed.

Cheers,
Ben

On 5/23/06, Greg London [EMAIL PROTECTED] wrote:
 5.6?

 (weeps)

 well, that'll never happen.

 I'll have to recode with *GLOBS.

 (weeps some more)

 Thanks for all the replies.

 Greg

 

 From: Ricker, William [mailto:[EMAIL PROTECTED]
 Sent: Tue 5/23/2006 4:23 PM
 To: Greg London
 Cc: boston-pm@mail.pm.org
 Subject: RE: [Boston.pm] version of perl that can use scalar filehandles



  more importantly, what is the syntax for passing a filehandle
  into a routine if it is FILEHANDLE instead of $FILEHANDLE?

open(FILEHANDLE, $filename ) or die trying $!;


  open(my $fh, filename);

 Autovivification of unitialized scalar filehandles was added in 5.6.0
http://search.cpan.org/~nwclark/perl-5.8.8/pod/perl56delta.pod

 QUOTE
 File and directory handles can be autovivified

 Similar to how constructs such as $x-[0] autovivify a reference, handle
 constructors (open(), opendir(), pipe(), socketpair(), sysopen(),
 socket(), and accept()) now autovivify a file or directory handle if the
 handle passed to them is an uninitialized scalar variable. This allows
 the constructs such as open(my $fh, ...) and open(local $fh,...) to be
 used to create filehandles that will conveniently be closed
 automatically when the scope ends, provided there are no other
 references to them. This largely eliminates the need for typeglobs when
 opening filehandles that must be passed around, as in the following
 example:

   sub myopen {
 open my $fh, @_
  or die Can't open '@_': $!;
 return $fh;
 }

 {
 my $f = myopen(/etc/motd);
 print $f;
 # $f implicitly closed here
 }

 /QUOTE

 5.6.0 also added 3-arg open($fh, $mode, $filename) for better safety
 against injection etc.

 Which means 5.5.x was the version that couldn't.


 -=- Bill

 Not speaking for the Firm.



 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] version of perl that can use scalar filehandles

2006-05-23 Thread Ben Tilly

On 5/23/06, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:

   BT If you didn't remember the do local trick, you could always use Symbol
   BT and then call gensym to get your typeglob.

 i have used Symbol::gensym for years and it is fine for this. it comes
 with perl5 from way back before 5.6 (not sure how old it is). in fact i
 still use it in some modules as i want them to be backward compatible
 with older perls.

   BT As for passing old-style filehandles, both of the following syntaxes
   BT are likely to work:

   BT   call_function(*FILEHANDLE);
   BT   call_function(\*FILEHANDLE);

 i prefer the ref version but inside the called sub it won't make a
 difference and the code will mostly be the same.

 but there is one difference which is whether you can do OO calls on the
 handle. you may need to load IO::Handle (or one of its many subclasses)
 to get that support.

That isn't a difference, at least not with current versions of Perl.
You can do OO calls on the handle after using either syntax to pass it
in.  Which methods are available depends on what modules have been
loaded.

  perl -le 'sub foo {$fh = shift; $fh-print(hello)} foo(*STDOUT)'
  perl -MFileHandle -le 'sub foo {$fh = shift; $fh-print(hello)}
foo(*STDOUT)'

I don't know whether that flexibility goes back to Perl 5.005 though.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Put similarities in code and differences in data

2006-04-06 Thread Ben Tilly

Code Complete talks about this.

And many other things.

The main obstacle is getting people to actually READ it.  (And after
that, to try to APPLY it.)

Cheers,
Ben

On 4/4/06, Tolkin, Steve [EMAIL PROTECTED] wrote:
 Thank you Charlie.  That is the idea I am trying to get across.  Do you
 have any suggestions about how to get developers to see the benefits of
 writing programs this way?  Any specific books, techniques, etc.?  Any
 pitfalls to be aware of?

 Thanks,
 Steve
 --
 Steve TolkinSteve . Tolkin at FMR dot COM508-787-9006
 Fidelity Investments   82 Devonshire St. M3L Boston MA 02109
 There is nothing so practical as a good theory.  Comments are by me,
 not Fidelity Investments, its subsidiaries or affiliates.


 Steve

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Charlie Reitzel
 Sent: Tuesday, April 04, 2006 9:18 AM
 To: boston-pm@mail.pm.org
 Subject: Re: [Boston.pm] Put similarities in code and differences in
 data


 Not really.  I believe it is intended to mean data driven programming
 as
 Jeremy mentioned earlier.  To me, data driven programming means use
 lotsa
 lookup tables, the contents of which are user tweakable.  As simple as
 it
 sounds, it can be an effective technique to let you quickly adapt a
 system
 as requirements evolve - without code changes.

 Having found this hammer early in my programming career, I find a great
 many nails.  Early days in any new design are spent setting up a lookup

 table table, along with utility routines for reporting, validation, UI
 picking values (one or several), etc.

 It may be a use case, but I don't think this is quite the same thing as
 the
 subject of this thread which, as Uri says, is a general approach to
 analysis.

 At 09:00 AM 4/4/2006 -0400, [EMAIL PROTECTED] wrote:
 hi
 
 ( 06.04.04 08:46 -0400 ) Tolkin, Steve:
   The difference is that I am trying to find a quote that focuses on
 the
   benefits of using data in a special way, as control data, to
 determine
   the specific execution path taken by the code.
 
 um, isn't this the scientific method?
 
 --
 \js oblique strategy: how would you have done it?
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm


 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] How do I wait for all child processes to terminate?

2006-03-31 Thread Ben Tilly

On 3/31/06, Kripa Sundar [EMAIL PROTECTED] wrote:
 Dear Ben,

 Thanks for the detailed reply to my query.

 If my questions below can be answered by online docs, please feel free
 to point me to them.  I read through the following docs before my
 previous email.  But I am still mostly in the dark:

 * man -s 2 for fork(), wait(), waitpid() and kill()
 * perldoc -f for fork(), wait(), waitpid() and kill()
 * perldoc perlipc

Those are the right online docs.  The perldocs for IPC::Open2 and
IPC::Open3 are also good.  Looking at examples helps a lot.  The
Cookbook is a good source.  You can find an example at
http://www.perlmonks.org/?node_id=28870.  You can look at
Parallel::ForkManager.  etc.

Unfortunately this tends to be a topic where you either get it or
don't.  And even when you get it, figureing out bugs is frustrating.

  1 until -1 == wait();
 
  You'll need something more complex if you want to track the children's
  exit statuses (very useful for debugging).

 That idiom is good to know.  But I *do* need to track exit statuses
 (stati?).  Please see my pseudo-code below.

So much for the simple answer...

  If you [use POSIX] you can
 
$kid = waitpid(-1, WNOHANG);
 
  to poll to see if a kid needs to be reaped. [...]

 I've seen this verb reap in this context, but don't know what it
 means.  When and how do I reap a kid?  How is reaping different
 from kill()ing it?

Terminology time.  When a process is doing stuff, we say that it is
alive.  When it finishes everything it needs to do, it dies.  After it
dies it becomes a zombie process, meaning that the process is dead but
not gone.  In particular it needs to tell its parent what its exit
status was.  The process finally goes away when it delivers that
message, and so we call asking for that exit status, reaping.

So reaping your child just means, Finding out its exit status so that
it can finally finish.  Which happens when you call wait or waitpid.

When I say, poll to see if a kid needs to be reaped I mean, Check
whether any child process has an exit status to tell me.

  Rather than worry about whether you are a child/parent for the rest of
  your code, I usually put an exit() here.  [...]

 Sorry, I don't follow at all.

When you are fork()ing my usual idiom is this:

  if (my $child_pid = fork()) {
# Do parent stuff.
  }
  else {
# Do child stuff.
exit();
  }

  # Do more parent stuff.

That exit() guarantees that the child process can't accidentally
execute code that it is not supposed to execute.

  Explicitly managing children and forking tends to be a lot of work.
  Unless you really need the complexity, I find that it tends to be
  easier to take the poor man's approach and do system calls and open
  up pipes.

 I'd much rather do system calls, if I can figure out how to wait for
 the children to finish up.

The code sample that I provided at
http://www.perlmonks.org/?node_id=28870 might be good enough for you
then.

 All I really want is:

 system(something $_ ) for 1..5;
 wait_for_all_children; # 1 until -1 == wait; might suffice.
 compute_summary_of_children_activities;

 But that won't really work, will it?  system(something $_ ) will
 launch something as a background job, and then come back in a flash
 to tell me that I don't have any child.  So wait_for_all_children
 won't have anything at all to wait for.

Depending on what you mean by work, that will work.  That is, the
code will run, jobs will be launched, children will be reaped, etc. 
But you'll have no way to tie the the children you reap to the jobs
you ran.  Which makes it hard to summarize what the children did.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] How do I wait for all child processes to terminate?

2006-03-30 Thread Ben Tilly

On 3/30/06, Kripa Sundar [EMAIL PROTECTED] wrote:
 Hello all,

 I thought this was fairly simple (and it probably is).  But I am not
 able to figure out how I can fork() off, say, five child processes, and
 wait for all of them to terminate.

1 until -1 == wait();

You'll need something more complex if you want to track the children's
exit statuses (very useful for debugging).

 Is the code below on the right track?  Is it as simple as
 wait for [EMAIL PROTECTED]?  I think this will let me wait N times,
 where N is the number of children launched.  Am I right?  TIA.
 Is there a simpler wait-for-all-descendants primitive?

What if some child has children who do not get properly cleaned up?
Those grandkids are now your responsibility.

 If some of my children are hanging, can I detect that,
 and take action to terminate them?

If you

  use POSIX :sys_wait_h;

then on most modern systems you can

  $kid = waitpid(-1, WNOHANG);

to poll to see if a kid needs to be reaped.  If you keep track of
which kids you launched when and which ones you've reaped, you can
decide when you think that the child needs to be terminated, and then
use the built-in kill function to send a signal to that child.  (The
right signal will kill it, with or without giving it a chance to do
cleanup.)

  my @children;   # Keep track of my children's PID's.

When I've needed to do this kind of logic, I find that hashes work
better.  Easier to insert/delete as I launch/reap.

  for (1..5) {
  my $pid = fork;
  if ($pid) {
  # I am in the parent.
  push @children, $pid;
  } elsif (defined $pid) {
  do_some_child_stuff($_);

Rather than worry about whether you are a child/parent for the rest of
your code, I usually put an exit() here.  But note that you may need
to worry about child/parent stuff in END blocks because Perl 5.8
causes END blocks to run on exits.  If you dislike that rule, then
POSIX::exit still does the old behaviour.

  } else {
  warn Warning: fork() #$_ failed;
  }
  } # for (1..5)
 
  if (@children) {
  # I am in the parent.
  if ($wait_order eq 'FIFO') {
  waitpid $_ for @children;
  print I waited for my children in FIFO order.\n;
  } else {
  wait for [EMAIL PROTECTED];
  print I waited for my children without imposing an order.\n;
  }
  do_some_adult_stuff;
  } # if (@children)

Random tips about this stuff.

Explicitly managing children and forking tends to be a lot of work.
Unless you really need the complexity, I find that it tends to be
easier to take the poor man's approach and do system calls and open
up pipes.

Also note that a very common source of problems is having open sockets
across a fork.  For instance if you have a database connection, you
may have race conditions between what happens when the parent is
talking to the database and what happens when the child is shutting
down.  Keep your eyes out for that because such races happen easily
and can be hard to diagnose.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Regex warning

2006-03-11 Thread Ben Tilly

On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote:
 I know I've done this before, but I'm not sure what I'm doing
 differently today.  I'm trying to capture a simple command-line option
 like so:

 my $debug = 0;

 if(grep(/--debug=(\d+)/, @ARGV)){
 $debug = $1;
 print debug: $debug\n; # Error here
 }

 But I keep getting Use of uninitialized value in concatenation (.) or
 string when I try to do something with the debug variable.  How can
 $1 not be initialized?  If it's matching, then it should have a value,
 no?

That looks like a bug to me.  But you can work around it as follows:

while(grep(/--debug=(\d+)/, @ARGV)){
$debug = $1;
print debug: $debug\n;
last;
}

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Regex warning

2006-03-11 Thread Ben Tilly

On 3/11/06, Ben Tilly [EMAIL PROTECTED] wrote:
 On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote:
  I know I've done this before, but I'm not sure what I'm doing
  differently today.  I'm trying to capture a simple command-line option
  like so:
 
  my $debug = 0;
 
  if(grep(/--debug=(\d+)/, @ARGV)){
  $debug = $1;
  print debug: $debug\n; # Error here
  }
 
  But I keep getting Use of uninitialized value in concatenation (.) or
  string when I try to do something with the debug variable.  How can
  $1 not be initialized?  If it's matching, then it should have a value,
  no?

 That looks like a bug to me.  But you can work around it as follows:

Correcting myself, I don't think it is a bug.  $1 is dynamically
scoped.  In your construct above, that means that when grep ends, $1
is cleaned up.

 while(grep(/--debug=(\d+)/, @ARGV)){
 $debug = $1;
 print debug: $debug\n;
 last;
 }

And I think this works because the inner part of the while loop
executes while the grep is still executing.  Which is a Perl
optimization to avoid generating a long temporary list in this
situation.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Regex warning

2006-03-11 Thread Ben Tilly

On 3/11/06, Joel Gwynn [EMAIL PROTECTED] wrote:
  Correcting myself, I don't think it is a bug.  $1 is dynamically
  scoped.  In your construct above, that means that when grep ends, $1
  is cleaned up.
 
   while(grep(/--debug=(\d+)/, @ARGV)){
   $debug = $1;
   print debug: $debug\n;
   last;
   }
 
  And I think this works because the inner part of the while loop
  executes while the grep is still executing.  Which is a Perl
  optimization to avoid generating a long temporary list in this
  situation.
 
  Cheers,
  Ben
 

 Did you test this?  I get the same error with this construct.

Nope. :-(

I thought I had, but I was testing something slightly different.  So
disregard that bit of idiocy.

However I did test this:

for (@ARGV) {
if (/--debug=(\d+)/) {
$debug = $1;
print debug: $1\n;
}
}

and it works because you're looking at $1 in the right scope.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] parsing CSV string with a comma in it

2006-02-28 Thread Ben Tilly

This is not nearly as simple as people think.

Text::CSV can do it, but the example code in the documentation isn't
right.  (It won't handle embedded returns.)  Text::CSV_XS does do it
correctly out of the box with its getline function but needs a binary
install.  You can implement getline with Text::CSV something like
this:

  package Text::CSV;

  sub getline {
my ($self, $fh) = @_;

my $line = $fh or return;
until ($self-parse($line)) {
  my $additional = $fh;
  if ($additional) {
$line .= $additional;
  }
  else {
croak(File terminated in the middle of a line.);
  }
}
return $self-fields;
  }

And another alternative is that Text::xSV will handle this in pure
Perl out of the box.  Personally I tend to use that, but then again
I'm biased. :-)

Cheers,
Ben

On 2/28/06, Alex Brelsfoard [EMAIL PROTECTED] wrote:
 Hello all,

 I know there's gotta be a nice and easy way to do this.
 Basically take, for example, the following file:
 --FILE--
 item1a,item2a,item3a part1, item3a part2,item4a,item5a,item6a
 item1b,item2b,item3b part1, item3b part2,item4b,item5b,item6b
 ..
 --FILE--


 So, when reading in this file I need to parse each line out into the proper
 segments:
 [open file]
 while (FILE) {
 my ($item1,$item2,$item3,$item4,$item5,$item6) = [parse $_]
 }
 [close file]

 What would be the best way to handle this?
 Should I use something like Text::CSV handle this?
 I would prefer to not need an extra module, but am willing to use one if
 necessary.

 Thanks.

 --Alex

 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] daemonizing a perl script

2006-02-21 Thread Ben Tilly

On 2/21/06, Uri Guttman [EMAIL PROTECTED] wrote:
  JM == John Macdonald [EMAIL PROTECTED] writes:
[...]
   JM Of course, detecting that a log switch of some sort has occurred
   JM doesn't ensure that you will be able to tell if more than one
   JM has occurred very quickly (from your frame of reference -
   JM that might mean that your tailing program got paused for a
   JM long time instead).

 well, most tailing doesn't care about how much has changed. tailing just
 wants to find and return the appended text. whether it returns large
 chunks or many lines isn't a function of the log file but of the tailing
 code.

I think you're missing John's point.

His point is that if 2 log switches happen while you're not looking,
all the stuff that was written to the log between those switches is
elsewhere and you'll never realize that it was ever in the log.

This is not normally an issue.  (Typically logs might rotate, say,
once a day.  And the tailing process checks every minute or so.  But
it theoretically can happen, and there is no good solution for it.)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] daemonizing a perl script

2006-02-20 Thread Ben Tilly

On 2/20/06, Bob Rogers [EMAIL PROTECTED] wrote:
From: Ranga Nathan [EMAIL PROTECTED]
[...]
On the other hand, doing nice -n-1 myscript  would run it at a
 slightly higher-than-default priority, which might allow it to swap in
 more quickly when the workload picked up.  This would work best if the
 actions were fairly lightweight, as myscript will hog the CPU while
 it's running.  (I Haven't tried this recipe myself, though.)

The problem with a script swapping out is that it takes I/0 to swap it
back in.  Changing the priority just gives it better access to the
CPU, which doesn't help one bit in how quickly I/O lets it get swapped
back in.

There many two solutions for getting swapped back out.  The best is to
add RAM.  Secondly you can regularly guarantee that the script wakes
up regularly and does something.  Third, if you're using Linux 2.6 or
later, play around with the swappiness parameter.  You can control
this either by

echo 60  /proc/sys/vm/swappiness

or by adding

vm.swappiness =60

to /etc/sysctl.conf.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] pre-extending strings for apache/mod-perl application

2006-01-10 Thread Ben Tilly

Didn't we just have this discussion?

It is extremely hard for pre-extending strings to result in actual
performance improvements, and at best you can get a very small win in
return for a lot of work.  In fact the extra effort of having to track
where you are in the string manually almost certainly *loses*
performance.

So don't do it.

Ben

On 1/10/06, Donald Leslie {74279} [EMAIL PROTECTED] wrote:
 I have an apache/mod-perl application that can results in large xml
 strings which are then transformed by xslt into html. A database query
 can result in an xml string with a length greater than 300,000 . In a
 normal perl allocation you can pre-extend the string to prevent repeated
 new allocations and copies. Does anyone know what happens in a mod-perl
 application? Does pre-extending have any benefit?

 Don Leslie

 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] More Perl Style

2005-12-20 Thread Ben Tilly

On 12/20/05, Federico Lucifredi [EMAIL PROTECTED] wrote:
 Hello Guys,
  More Perl Style lessons for me, if anyone wants to chip in. Following
 is the script on the chopping block today - in the comments the parts
 that I did not manage to elegantize as much as I wanted.

 use Term::ANSIColor  qw(:constants);
 use strict;

Where is warnings?

 #CONFIGURATION


 my $timelen = 3;

 my @commands = ( '/bin/netstat -ape | /usr/bin/wc -l',
  '/usr/bin/lsof | wc -l',
  '/bin/ps ax -L | wc -l' );

I would indent this like so:

  my @commands = (
'/bin/netstat -ape | /usr/bin/wc -l',
'/usr/bin/lsof | wc -l',
'/bin/ps ax -L | wc -l',
  );

See Code Complete's advice on indentation for why.

 my @triggers = qw( 0 0 0);

 #-

 for(my $flag = 1; $flag;) #hate truth as a number...

Gah.

1. NEVER use flag as the name of a flag.  One of the most common
programming errors is to accidentally reverse the meaning of a flag. 
Make the name of the flag a yes/no question that the value of the flag
is the answer to.  Then you will never again make that mistake.

2. Avoid C-style for loops whenever you have an alternative.  You have
an alternative here - use a while or until loop.  I would write the
above as:

  my $last_loop;
  until ($last_loop) {
...

That clearly documents intent.

  {
   my $date = `/bin/date -R`; #three lines seem much for this - any more
 elegant way to chomp things up?
   chomp $date;
   print $date.\t\t;

Is there a reason to not use Perl's built-in localtime() function?

If you need specific formatting, you can save a line by chomping while
assigning.

  chomp(my $date = `/bin/date -R`);

Be careful about stacking functions too much, you might have fewer
lines but you are doing as much.  However this combination is so
common that I suspect that it mentally chunks.

   for (my $i = 0; $i  @commands; $i++)

Again, avoid C-style for loops.  I would write the above as:

for my $i (0..$#commands) {
  ...

{
 my $cmd = $commands[$i]; #I don't like these lookups, but I don't
 see how to foreach this one
 my $trig = $triggers[$i];

If you really don't like the lookups you can use Tye McQueens
Algorithm::Loops and do a mapcar here.  I personally think that the
lookups are better.

 my $result = `$cmd` or die (could not execute command.$cmd.\n);
 chomp $result;
 $result == $trig  ?  print ON_RED, $result, RESET  :  print $result;
 print \t;

 $flag = 0 if ($result == $trig); # finish the internal round,
 terminate the external. Any nicer way to do it ?

Nope.  If you wanted to terminate the inner or both, then you could
just use last.  But if you want to terminate the outer, you need to
keep track of state.

However see my previous comments about the name of your flag.

}

   print \n;
  }


 Hum - the mail client is insisting in wrapping at 80-chars - usually
 nice but very appropriate to mess up things here :D

Code that runs into problems when wrapped at 80 characters needs to be
reformatted and/or rewritten. :-P

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] pre-extending strings for apache/mod-perl application

2005-12-13 Thread Ben Tilly

On 12/13/05, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Mon, 2005-12-12 at 16:44 -0500, Uri Guttman wrote:
   DL == Donald Leslie {74279} [EMAIL PROTECTED] writes:
 
DL I have an apache/mod-perl application that can results in large
DL xml strings which are then transformed by xslt into html. A
DL database query can result in an xml string with a length greater
DL than 300,000 . In a normal perl allocation you can pre-extend the
DL string to prevent repeated new allocations and copies. Does anyone
DL know what happens in a mod-perl application? Does pre-extending
DL have any benefit?
 
  i can't tell you about any mod-perl issues but in general pre-extending
  in perl doesn't gain you as much as you would think. the reason is that
  some storage isn't really truely freed to the main pool when it gets
  freed when its ref count goes to 0. perl will keep it around in
  anticipation of it be reallocated for this same item in a future call or
  loop iteration. so it is effectively doing the usual doubling its size
  to grow into the first large string and then it is already prextended
  the rest of the time via reusing the previous buffer.

 Well, just in one trial, it does look like the data gets moved with
 every substantial growth.
[...]

The key word is *substantial*.

I am too lazy to look at the source code for the factor for strings,
but Perl widely uses a strategy of always allocating a fixed factor
more space than has currently been requested for a wide variety of
data structures.  The result is that if you grow a data structure
incrementally, the sum of the costs of moving the data forms a
geometric series, which sums up to no more than a constant times the
final size of the data structure.  This constant is usually small
enough that it isn't worth the effort it would take to remove it.

If you really think that it *is* worth the effort that it would take
to remove that insignificant overhead, then I'm going to go out on a
limb and say that you have an application which shouldn't be written
in Perl.

OK, I won't look up the source, but I will demonstrate what I am
talking about.  It seems from this that strings are slightly different
than arrays, but the effect is similar:

#! /usr/bin/perl -l
$s = ;
my $old_loc;
for (1..10_000_000) {
  $s .= 1;
  my $loc = loc($s);
  if ($loc != $old_loc) {
print Len:  . length($s) . ; loc $loc;
$old_loc = $loc;
  }
}

sub loc {
  unpack I, pack p, $_[0]
}
__END__
Len: 1; loc 135673368
Len: 12; loc 135709104
Len: 36; loc 135708936
Len: 52; loc 135698368
Len: 68; loc 135671296
Len: 172; loc 135702616
Len: 2060; loc 135713768
Len: 134156; loc 3083567112
Len: 135156; loc 3083427848
Len: 274420; loc 3083149320
Len: 552948; loc 3082592264
Len: 1110004; loc 3081478152
Len: 2224116; loc 3079249928
Len: 4452340; loc 3074793480
Len: 8908788; loc 3065880584

In case you're interested, that came out to 1.58 copies/character.  If
you tried to assign a long string to pre-extend, truncate, then
incrementally assign the one that you were really interested in, it
would come out to 2 copies/character, and you're losing in your
attempt to optimize!

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Pretty Graphs with Perl

2005-12-06 Thread Ben Tilly

On 12/6/05, Alex Brelsfoard [EMAIL PROTECTED] wrote:
 Does anyone who has/does use GD::Graph know if there's an easy way to embed
 the output graphs into HTML.
 Basically I'd like to be able to print a bunch of HTML, then the graph, then
 some more HTML.
 I've got the grph coming out all fine and dandy.

You have to print out HTML that includes an embedded image, and make
the URL for that image be served by your program that prints out the
graph.

One warning if you shell out in a CGI script: be ABSOLUTELY sure that
you send headers before calling the shell.  A very common mistake is
to print your header then call the shell, not realizing that your
print just saved data in a buffer and then waits to send it until
either the buffer is full or your program ends.  The result is that in
your code you see the header printed before the graph, but Apache
receives graph before the header and gets upset at you.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Fwd: parrot now available as a Debian package

2005-11-18 Thread Ben Tilly

On 11/18/05, John Macdonald [EMAIL PROTECTED] wrote:
 On Fri, Nov 18, 2005 at 04:16:18PM -0500, Uri Guttman wrote:
[...]
 However, as I recall, NT was being developed for the Alpha at
 one point - I think it was available commercially for a while
 and not just internal to MS.  Not to surprising, actually,
 since a large chunk of the original NT design team was hired
 away from DEC (Dave Cutler et al).
[...]

This is true.

It is an amusing irony that one of the initial design goals for NT was
to be highly portable to different chip architectures, while Linux was
designed to take full advantage of 386-specific features.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Affordable web hosting service that offers mod_perl?

2005-10-27 Thread Ben Tilly

On 10/26/05, Sherm Pendley [EMAIL PROTECTED] wrote:
 On Oct 26, 2005, at 11:15 AM, Tom Metro wrote:

  I've often wondered if this greater power in mod_perl has been a
  hindrance rather than a help to the Perl web development community.
  Would we have been better off if, in addition to mod_perl (for the
  rare
  cases when you do need low-level access), there was a
  mod_perl_embed, or
  something like that, that was restricted in ability and focused on the
  needs of the typical site developer?

 An admin doesn't have to allow the developer full access to
 everything mod_perl can do.

 A common configuration used with mod_perl is to service only static
 HTML requests on the main server, and proxy requests to mod_perl
 serviced URLs to a separate server instance that's listening on a
 high port.

 This proxy set-up can be set up on a virtual server basis, with each
 virtual server having its own mod_perl instance. That does require a
 separate Apache instance, but not a dedicated server. The mod_perl
 Apache, being a separate instance, would not need to run as nobody
 or www - it could run with the user's permissions. Or, in a more
 secure setup, an admin could create two users, joe and joe_perl,
 and add them both to joe_group. Joe would use joe to log in, and
 the mod_perl server would run as joe_perl. Joe could then allow the
 server access to specific files through the use of group permissions.
[...]

The memory requirements of Apache with mod_perl are such that you can
serve far fewer users per machine this way than you can with PHP. 
Given how competitive the shared hosting business is, it would be hard
to do this at a competitive price.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Perl style regex in shell command for matching across lines?

2005-10-19 Thread Ben Tilly

On 10/19/05, Ranga Nathan [EMAIL PROTECTED] wrote:
 I need to check ftp logs  (see below) for successful transfer of files.
 This is a bash script someone else wrote and I need to modify it. I want
 to use a Perl style regex like

 /^125.*?baxp\.caed.*?\n250/i

  in any ?grep or sed or awk whichever can do this.

 I tried grep and egrep - they seem to match only one line at a time. I am
 unable to match for \n inside the pattern.

They only match one line at a time.  That is what they do.

 What shell utility would do it? I dont want to bring in the perl
 interpreter just for this!
 Thanks for the help.

I would use Perl.  But you can try this:

  grep -A 2 -i '^125.*baxp\.caed' FILE_HERE | grep -B 2 '^250'

It will separate matches with lines containing --.

Though if you want to do anything interesting with them, it would be
easier to do it in Perl.  Because EVERY tool in the basic Unix toolset
assumes that lines of data are the data of interest, so data that
goes across a line boundary is a PITA to handle.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] threads and sockets

2005-10-06 Thread Ben Tilly

On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote:
 Has anyone here written a serious threaded server in perl?  I can't seem
 to find any threads + sockets examples anywhere.  I have some stuff
 working with Thread::Pool but there are problems.  (I can elaborate if
 anyone wants me to...)

Why are you trying to write a threaded server in Perl?

If you want performance, I would strongly suggest using a pre-fork
model, or else use another language.  In Perl when you spawn a thread,
Perl makes a copy of virtually all data in that thread to avoid race
conditions.  This is a big, slow step.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] threads and sockets

2005-10-06 Thread Ben Tilly

On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote:
 On Thu, 2005-10-06 at 18:36 -0400, Uri Guttman wrote:
  even the people who wrote the threads code in perl disavow them, so i
  wouldn't even try to do any heavy threading in perl. instead i recommend
  an event loop server which is stable, faster and easier to code for in
  most situations. you can use stem, poe, event.pm or even io::select as
  the main event loop.

 The problem is that I'm writing an RPC server that itself needs to make
 RPC calls.  I can't be blocking on new clients connecting or existing
 clients sending requests while the server-side procedure is making its
 own RPC call out to somewhere else.

 I don't think an event loop would help, because a computationally slow
 procedure or one that makes a further RPC call would still block other
 clients.

You can do this with an event loop and multiple processes.

The RPC server doesn't make RPC calls.  Instead it sends a message to
a child process that makes the RPC call.  The child process then sends
a message back to the RPC server when it has the answer.  The RPC
server can now use a select loop to cycle through getting RPC
requests, forwarding them to children, getting responses, and writing
back to the clients.

This is conceptually similar to what you might do with a
multi-threaded server - you're just using processes rather than
threads.

An alternate strategy is to not have a single RPC server.  Instead do
as Apache's prefork model does, and have multiple processes.  Each
time you get an RPC call, one process processes the request and other
processes remain available to service new requests.

Unless overhead is a huge concern, I'd personally use the alternate
strategy.  I'd then avoid writing all of the multi-process logic by
using Apache for that piece, and mod_perl to process requests/send
responses.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] threads and sockets

2005-10-06 Thread Ben Tilly

On 10/6/05, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:

   BT On 10/6/05, Jeremy Muhlich [EMAIL PROTECTED] wrote:
[...]
   BT You can do this with an event loop and multiple processes.

   BT The RPC server doesn't make RPC calls.  Instead it sends a message to
   BT a child process that makes the RPC call.  The child process then sends
   BT a message back to the RPC server when it has the answer.  The RPC
   BT server can now use a select loop to cycle through getting RPC
   BT requests, forwarding them to children, getting responses, and writing
   BT back to the clients.

 you don't even need children to do non-blocking rpc calls. if you do the
 protocol yourself and it is over a socket (as it should be), you can do
 async rpc calls. but if you are using a typical library that hardcodes a
 sync rpc protocol, then you are stuck. this is a major issue i have with
 most protocol implementations, especially on cpan. they are written with
 sync i/o and never think about supporting async. what they don't realize
 is that async can easily emulate sync but it is almost impossible for
 sync to emulate async.

I assumed that he wouldn't want to rewrite things like database
drivers, so I was assuming that he'd be stuck.  After all, if he was
using async protocols, then he'd have never complained about blocking
calls.

   BT An alternate strategy is to not have a single RPC server.  Instead do
   BT as Apache's prefork model does, and have multiple processes.  Each
   BT time you get an RPC call, one process processes the request and other
   BT processes remain available to service new requests.

 preforking is just an optimization of a forking server. if you
 distribute the processes over a farm of machines, then you can make the
 main server just connect to the processes on demand or in advance.

The key feature that I was pointing to was having multiple processes,
not the pre-forking optimization.  I specified prefork to distinguish
it from the threading model that Apache 2 also offers.

   BT Unless overhead is a huge concern, I'd personally use the alternate
   BT strategy.  I'd then avoid writing all of the multi-process logic by
   BT using Apache for that piece, and mod_perl to process requests/send
   BT responses.

 and what if you aren't using http for the clients? what if you want to
 support a cli or a gui client? i really hate how apache (1 or 2) is
 being touted as the next great application platform and savior. cramming
 all those different modules (apache and perl) into the insane mess of
 apache is asking for trouble. anyone ever heard of config file hell? or
 colliding modules that are too tightly coupled to clean up?

If you're building the system from scratch, you can use any protocol
that you want for the clients.  It doesn't matter what kind of clients
you have.

For instance I am using Ubuntu at the moment.  It uses an http
interface to fetch package information and packages when you want to
upgrade.  But this http interface is not accessed through a browser. 
Instead you can access it through the purse CLI interface of apt,
through the curses interface of aptitude, or through the GUI interface
of the Synaptic Package Manager.

The protocol spoken over the wire is separate from the front end.

As for your complaints about config file hell, there are plenty of
known techniques to keep complex Apache applications cleanly
organized.  If you are going to have 50 million applications in the
same Apache process, then you have a problem.  But if you're talking a
single application which you're considering devoting multiple machines
to, then the necessary overhead to set up Apache for your needs is
less than the overhead to roll your own application server, or to
write event loops for everything.

Before you disagree with the last, note that I'm counting as overhead
having to implement asynchronous wheels because you don't like the
blocking that happens with the synchronous one on CPAN...

 but what do i know? just doing event loops for over 20 years on many
 platforms and in many langs. :) hell, i even integrated c kernel threads
 into an event loop so i could do blocking ops in the threads and the
 main code was all event driven. but perl threads just can't cut the
 mustard like c threads.

And this makes anything that I've said any less true?

Note that I'm not saying that Apache is a perfect strategy.  I'm not
saying that it has no drawbacks.  I'm just saying that it is a
workable strategy in many situations, and it is the strategy that I'd
be inclined to use fairly often.

If you want perfection, then it is clearly the wrong way to go.  But
the perfect is the enemy of the good, and it is a pretty good
solution.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] threads and sockets

2005-10-06 Thread Ben Tilly

On 10/6/05, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:

you don't even need children to do non-blocking rpc calls. if you do the
protocol yourself and it is over a socket (as it should be), you can do
async rpc calls. but if you are using a typical library that hardcodes a
sync rpc protocol, then you are stuck. this is a major issue i have with
most protocol implementations, especially on cpan. they are written with
sync i/o and never think about supporting async. what they don't realize
is that async can easily emulate sync but it is almost impossible for
sync to emulate async.

   BT I assumed that he wouldn't want to rewrite things like database
   BT drivers, so I was assuming that he'd be stuck.  After all, if he was
   BT using async protocols, then he'd have never complained about blocking
   BT calls.

 true about dbi and other sync modules. but i won't assume anything since
 the OP has not given a full problem spec (yet).

Perspective.  For what I do, interesting data is almost all stored in
the database, so useful programs are going to want to use DBI.  I
also, as I noted, saw key phrases in the question that suggested
synchronous libraries were anticipated.  Finally, even if everything
can be made asynchronous now, allowing for synchronous calls in your
architecture is a piece of future-proofing - some day a management
dictate may come down to integrate with some piece of software that
has a convenient synchronous interface already, but no asynchronous
one.

Some applications, because of what they do, environment, etc, can
guarantee that asynchronous will never be an issue.  But a great many
cannot.

preforking is just an optimization of a forking server. if you
distribute the processes over a farm of machines, then you can make the
main server just connect to the processes on demand or in advance.

   BT The key feature that I was pointing to was having multiple
   BT processes, not the pre-forking optimization.  I specified prefork
   BT to distinguish it from the threading model that Apache 2 also
   BT offers.

 i also like the process farm idea and have used it many times.

There are many ways to skin this cat.

   BT Unless overhead is a huge concern, I'd personally use the alternate
   BT strategy.  I'd then avoid writing all of the multi-process logic by
   BT using Apache for that piece, and mod_perl to process requests/send
   BT responses.

 but what if you already had a tool in perl that let you do all the async
 communications with no new coding needed? and it can do application
 servers as well? :)

I already indicated reasons to pick a design where synchronous calls
are not an issue.  An additional reason to avoid cooperative
multitasking is scaleability - multiple processes allow you to benefit
from multiple CPUs (both real and virtual), and make the migration to
multiple machines easier.  A cooperatively multitasked program can
only use one CPU.  (But cooperative benefits from being able to
communicate between tasks very directly.  However if you do too much
of that, it is easy to accidentally introduce race conditions. 
Particularly if you try to be asynchronous everywhere.)

and what if you aren't using http for the clients? what if you want to
support a cli or a gui client? i really hate how apache (1 or 2) is
being touted as the next great application platform and savior. cramming
all those different modules (apache and perl) into the insane mess of
apache is asking for trouble. anyone ever heard of config file hell? or
colliding modules that are too tightly coupled to clean up?

   BT If you're building the system from scratch, you can use any protocol
   BT that you want for the clients.  It doesn't matter what kind of clients
   BT you have.

 sure. again we have no proper spec so i won't speculate.

My point remains.  Wanting to support a CLI or GUI client does not
prevent you from using http.  Your suggesting that it does is a red
herring.

[...]
 that still doesn't mean using apache and http for app serving is a good
 idea. http is stateless so it makes for a bad protocol for when you need
 multiple remote operations on a single session. sure you can work around
 it (cookies) but that is still a workaround.

Stateless is indeed a drawback to http.  And working around it is one
of the pieces of overhead to this approach that I alluded to when I
said, if you don't mind the overhead.

   BT Before you disagree with the last, note that I'm counting as overhead
   BT having to implement asynchronous wheels because you don't like the
   BT blocking that happens with the synchronous one on CPAN...

 hmm, what if the wheels are there and rounder than the square ones you
 currently have?

What rounder wheel than DBI do you have to offer me?

[...]
   BT And this makes anything that I've said any less true?

 no, but it shows that threads and events can work together

Re: [Boston.pm] Trying to learn about the problems with eval()

2005-08-16 Thread Ben Tilly

On 8/16/05, Tim King [EMAIL PROTECTED] wrote:
 Ben Tilly wrote:
  I agree that using eval here is wrong. But I still
  don't see action at a distance.
 
  You can argue about whether it is action at a distance, but you
  have tight coupling between the internals of make_generator
  and the string passed into it that was generated from very far
  away.
 
 Correct. A function that uses eval() does so in its own context. I don't
 see what makes this a problem with eval(). Rather, it's a concern that
 affects the software design.

Well this thread started with someone looking for reasons not to use
eval.  Pointing out that certain ways of trying to use eval affect the
software design in negative ways is a good reason to not use eval in
some situations.

  The problem in this example is that make_generator doesn't make a
  generator.
 
  I know full well what the problem is.  And the result is that you
  cannot consider using code generation for a situation like this.
 
 Not if you want to make a generator as you have tried, no. My point was
 that this is not a problem with eval(); it's a problem with your
 proposed design.

You can place the blame wherever you want to.  But with a more
capable version of eval, you can make the proposed design work.

Of course since Perl doesn't *have* capable enough version of
eval, you can't make it work in Perl...

 I use eval() when I don't know what the code is until runtime, or to
 execute code generated in one place in a context generated from another.
 But I can't recall a case like the latter that wasn't also an instance
 of the former.

sub foo {
  my $x = 7;
  my $generator = sub{++$x};
  print $generator-() for 1..5;
}
 
  But in general you're now
  prevented from adding various kinds of syntactic sugar,
  manipulating the passed in code before compiling it, etc.
 
 If you really need to manipulate the code, you apply the syntactic sugar
 when generating the code string. Then you invoke eval() in whatever
 context the code should execute.

The problem is that the place where you'd like to centralize the
code manipulation, calling eval, catching errors etc is in one
place, and the code that you'd like to manipulate is in another.
You *can* achieve the desired flow of control, but the only way
that I can think of doing it in Perl is to require the caller to pass
in sub {eval(shift)} (which will do an eval in the caller's context).
That's an ugly construct to have to throw in when you were
trying to create syntactic sugar.

 Or alternatively, you can pass to the code-generator an evaluation
 context that the generated code will use. One might use a technique like
 this when using Perl as a scripting language within a Perl program, for
 example.

I suspect that you mean something like the sub {eval(shift)} that
I mentioned above.

  Note that $generator... is a true coroutine.
 
  It is not a coroutine.  It is a closure.
 
 It is both a closure and a coroutine.

If it is a coroutine, then you should be able to return multiple times
and restart the call each time.  (Traditionally done with a yield
operator.)

I don't see that capability there...

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] CGI::Prototype

2005-08-15 Thread Ben Tilly

On 8/15/05, Ricker, William [EMAIL PROTECTED] wrote:
[...]
 CGI::Prototype offers a _different_ way of factoring out the you always
 had to write this glue code code.  Catalyst uses the Perl Attributes
 annotations to factor out glue-code, which is classy demonstration that
 attributes are a good idea.  CGI::Prototype uses prototypical
 (instance-based, or nonce-class) inheritance.
[...]

Question on that.

My understanding is that attributes are processed at CHECK
time, which means that code using attributes would not
work right if loaded after CHECK has run.  In web
development this could mean that Apache::Reload and
attributes would *not* play well together.  (Which would be
a pretty big drawback for me in developing a mod_perl
application.)

Is my understanding outdated?

Thanks,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Trying to learn about the problems with eval()

2005-08-15 Thread Ben Tilly

On 8/15/05, Kripa Sundar [EMAIL PROTECTED] wrote:
 Dear Ben,
 
another bad point about eval is that it can access and modify lexicals
and globals anywhere in the code. so that can lead to action at a
distance and very hard to find bugs.
   [...]
  I'm not sure if this is what is referred to, but it applies.
 
  If this is dynamic code where the string to be evaled is
  passed in from elsewhere, then one problem is that you
  might wind up picking up lexicals in the scope of the
  eval, and being unable to pick up lexicals in the scope
  where you tried to create the eval.  Closures would get
  this right.
 
  Ruby solves this problem by giving you a number of
  variations on eval, one of which evals things in the
  context of your caller.  Still not perfectly general, but
  much more likely to be right in this instance.
 
 Do you mean examples like below?
 
 --\/BEGIN-\/--
 % perl -le 'my $x = 7; my $str = q{print ++$x}; {my $x = 11; eval $str}'
 12
 %
 --/\-END--/\--

Close, but I meant more like this:

  sub foo {
my $x = 7;
my $generator = make_generator(q{++$x}};
print $generator-() for 1..5;
  }

  sub make_generator {
my $action = shift;
my $x = 11;
eval qq{
  sub {
# Some interesting code here...
$action;
  }
};
  }

 IMHO the current behaviour is intuitive.  And I certainly don't
 see action at a distance.  The person who thinks that the '$x' inside
 $str is referring to the currently visible $x (value 7) is simply
 mistaken.  Likewise the person who thinks that the inner $x will remain
 untouched by the eval().  (But maybe this latter is what Uri is
 referring to as action at a distance.)

I agree that Perl's behaviour is logical.  However it is
inconvenient.  And from the point of the person who is
trying to use make_generator, it causes internal details
to matter too much.

A workaround, of course, is to tell the person to use
global variables.  Which works except for the variables
that happen to be used internally in make_generator,
which the person doing the calling should not need to
know but does.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] count substrings within strings

2005-08-09 Thread Ben Tilly

On 8/9/05, Ronald J Kimball [EMAIL PROTECTED] wrote:
 On Tue, Aug 09, 2005 at 09:07:16PM -0700, Stephen A. Jarjoura wrote:
[...]
  Did I miss some obvious, and easier method?
 
 
 Basically, you need a loop.  s///g allows you to hide the loop, but is less
 efficient because you're updating the string.  You could use m//g with an
 explicit loop instead:
 
 $transfer_count++ while $buffer =~ m/\transfer/g;

Beware when generalizing the above.  How many copies of
hihi are in hihihihi?  How many copies of .* in .*.*.*?

The latter can be fixed with the proper escapes.  The former
can be fixed either by using index() to do your searching, or
by using pos() in the loop to set the match to just after the
start of the match that you last found.  (Unless overlapping
matches are not allowed for your problem.)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] more syntax checking

2005-07-18 Thread Ben Tilly

On 7/18/05, Uri Guttman [EMAIL PROTECTED] wrote:
  KS == Kripa Sundar [EMAIL PROTECTED] writes:
 
   KS Dear Uri,
[...] subs can come into existance any time and be
handled by AUTOLOAD and such. so there is no easy compile time way to
check that at the moment. [...]
 
   KS It is the same principle as for any other stricture.  The user is
   KS asking perl to flag some *legal* usage as being unacceptable for
   KS her/his purpose.
 
 but perl can't divine if subs exist before they are called because some
 module may do things differently. just look at AUTOLOAD.

So sometimes Perl gets it very wrong.  That's OK, if people choose
to turn on optional warnings, they are choosing the new behaviour.

   KS For the user who does not want dynamically defined routines, it
   KS should be trivial for the compiler to honour a suitable pragma
   KS (say, use strict sub_definitions) and die if it sees that there
   KS are some subroutine invocations without any compile-time
   KS definitions.
 
 but will that honor modules that are used? that pragma can be lexically
 scoped but it still is an issue. what about calling something in BEGIN
 but before it is compiled? perl can't detect that until it tries because
 something else in the BEGIN block could define the function.

So?  You choose to turn on a pragma that breaks on valid code.
Your choice, and you'll catch the gotchas in pretty fast in
development.

   KS An invocation before a declaration is legal Perl, but gets flagged
   KS under use strict subs.  An invocation without a definition should be
   KS equally easy to flag (although, of course, perl would have to wait
   KS until the end of compilation to do so).
 
 if you invoke with foo() then you don't get that strict problem. the
 issue is how can you tell a sub will be defined when a call to it is
 compiled. if you call subs later in the source file, they won't be
 defined at the time of the call being compiled, so that fails. you can
 then force predeclaring of subs in that case (like c) but most perl
 hackers will hate that. i know what you want but i don't see any easy
 way to do it that will satisfy most people.

Then most Perl hackers will not use a stricture like this.  Though I
think that more will than you realize.

I happen to like the idea of having this there as an option.  Right
now it isn't an option, so if you want to catch subroutine typos
the best that you can do is declare functions like this:

  my $foo = sub {
...
  };

and then call them with:

  $foo-(@args);

which will make everyone else want to kill you.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Geo::Coder::US RE: GoogleGeoCoder

2005-06-16 Thread Ben Tilly

On 6/16/05, Joel Gwynn [EMAIL PROTECTED] wrote:
[...]
 When you get right down to it, this Boston neighborhood thing is
 just confusing.  I work in Dorchester but management likes to put
 Boston on the stationary, which is confusing because there's an
 identical address in Boston proper, just with a different zip code.
 Are there any other cities that have similar naming schizophrenia?

Q: What do you call a Boston tourist?
A: Lost.

Similarity is in the eye of the beholder.  But Denver is infamous
for having many streets whose names are almost the same.  You
can never know which street someone is talking about until you
get to the final St or Ave or etc.

For instance in Denver, Colorado you can stand at the
intersection of Colorado St and Colorado Ave.

However, that notwithstanding, it is nowhere near Boston in
being difficult to find your way around.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Image Uploading

2005-06-08 Thread Ben Tilly

On 6/8/05, Alex Brelsfoard [EMAIL PROTECTED] wrote:
 I know I asked a similar question a while back, but I'm compelled to try 
 again.
 
 I have an existing script, not using CGI to take in parameters handed to the
 script.  I would now like to have this script upload a file to the server, but
 not be forced to convert the entire file to using CGI.

For the record, converting the entire file would probably be a Good Thing.

 This is the code I've used in the past:
 -
 use CGI;
 #create new instantiation of a form query.
 my $query = new CGI;
 
 open (UPLOAD, $filepath) || Error(Could not open file for 
 writing:
 $filepath);
 my $picture = $query-upload('photo');
 if ($picture) {
 while ($bytesread=read($picture,$buffer,1024)) {
 print UPLOAD $buffer;
 }
 }else {
 Error(Picture ($picture) is undefined.br File: 
 $photoNamebrFor some
 reason I cannot read in this file.);
 }
 close (UPLOAD);
 -
 
 But as you can see, it's using CGI.
 Any suggestions of how to upload an image similarly without having to rewrite
 the rest of my code?

Here is a stupid trick.

Slurp STDIN into a scalar.  Then use IO::Scalar to tie STDIN to that
scalar.  Open CGI, then you can seek to position 0 in STDIN, and
run your (probably worse) form handling code.

Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Image Uploading

2005-06-08 Thread Ben Tilly

On 6/8/05, Mark J. Dulcey [EMAIL PROTECTED] wrote:
 Alex Brelsfoard wrote:
  I know I asked a similar question a while back, but I'm compelled to try 
  again.
 
  I have an existing script, not using CGI to take in parameters handed to the
  script.  I would now like to have this script upload a file to the server, 
  but
  not be forced to convert the entire file to using CGI.
 
  Any suggestions of how to upload an image similarly without having to 
  rewrite
  the rest of my code?
 
 You're doing HTTP upload, so you're going to need some sort of module to
 do that. CGI.pm is certainly the most common one. A quick search of CPAN
 didn't turn up any obvious candidates to replace it, but maybe someone
 out there will know of one.

CGI::Simple is also out there if you don't want to use the HTML generation.
(Usually using the HTML generation is a bad idea, but there are some
simple tasks for which it makes sense.)

However that would not solve the original problem, which is how to add
support for uploads without having to rewrite the parts of the script that
does its own form processing.  (If past experience is anything to judge
from, it probably does its own form processing very badly.)

But this brings to mind an alternate solution.  Which is to write a few
functions that present the interface that the old code expects using CGI
under the hood to do them, then remove the old form processing code.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Can you better this?

2005-05-11 Thread Ben Tilly

On 5/11/05, John Tsangaris [EMAIL PROTECTED] wrote:
 I was asked to provide the 73 occurrence of a sequence of numbers, with
 the numbers 12345.  Each number can be used only once, and there are a
 possible 120 combinations.
 
 I was called by a client to figure this out for them, since one of their
 2nd grade children was required to provide the answer to this question.  I
 only had a coule of minutes so I pulled this code out of my sleeve to get
 the answer.  But, I'm curious to find out if there is a sleeker way to get
 the answer and full sequence (preferably more advanced than my 2nd grade
 answer).

Do I smell golf?

perl -le 'map/(.).*\1/||print,glob{1,2,3,4,5}x5'

This works on Unix.  On Windows you'll have to switch quotes around.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Simultaneous redirect to STDOUT File?

2005-05-10 Thread Ben Tilly

On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
 
   BT On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote:
 BT == Ben Tilly [EMAIL PROTECTED] writes:
   
   BT Be aware that IO::Tee has limitations.  It only works for output that
   BT goes through Perl's IO system.  In particular if your program makes
   BT a system call, the child process will NOT see the tee.
   
i bet you can work around that by saving STDOUT, reopening it on IO::Tee
and having IO::Tee output to the file and the saved STDOUT. i leave
implementing this as an exercise to the reader. but using shell to do
this is probably the easiest as you can just use tee and all stdout
piped to it (from the perl program or its subprocesses) will get
teed. as larry says, the shell has to be useful for something!
 
   BT You'd lose that bet.
 
 i am not sure about that. it might need some hacking to do it.
 
   BT IO::Tee is implemented through tying a filehandle inside of Perl.
   BT The entire mechanism only makes sense from within Perl.  A
   BT launched subprocess (or poorly written XS code) goes through a
   BT fileno that the operating system knows about.  Since the OS does
   BT not know about Perl's abstractions of I/O, there is no way to get
   BT the OS to direct output through them.
 
 you can then do the STDOUT dup stuff yourself and then bind IO::Tee to
 that. by closing STDOUT and reopening it to a pipe you create, all the
 children process will output to that pipe since they will see it as fd
 0. you have to fork and have that read the other side of the pipe and
 use IO::Tee in there. like i said, not simple but doable. this is
 effectively what the shell does when you pipe anyway.

This is just a version of the alternate fork and postprocess that I
said would work (and you left out of your reply).  But if you're going
to do that, then IO::Tee is a red herring - it is easier to loop over
filehandles yourself.  The heavy lifting is being done by the
operating system.

See the cookbook for a sample implementation.

 another totally different approach is to use one of my perl sayings,
 print rarely, print late. too much code is written with direct calls to
 print (with or without explicit handles). when you print late, you just
 build up all your output in strings with .= and then just return it to
 the caller. only at the highest level where the actual print decisions
 are really made do you finally call print. this is also faster as print
 is very slow as it invokes all manner of stdio/perlio code each time it
 is called. appending to a buffer is very fast and clean. so if you did
 it this way, the top level would be like:

We're now getting into optimization, so this is platform
dependent.  Besides, optimization

First of all be aware that while .= is fast in Perl, in many other
high-level languages the equivalent is slow.  For instance try to
create the string hello world\nx1_000_000 with a simple
appending loop in Perl, JavaScript, Ruby, Java and Python.
Using the default string implementation this is very slow in
every language but Perl.  Making it fast requires jumping
through various sets of hoops.  How many and which ones
depends on the language.  Java has a StringBuffer class that
does the trick.  In JavaScript you can accumulate strings in an
array and then join it.  Unfortunately if the array gets too big
then you run into GC overhead.  So then you have to start
accumulating into an array and joining parts of the array early.
(Ugh.)

Secondly even in Perl I'd expect print to be faster than using .=
repeatedly instead of print.   Let's try it:

$ time perl -e 'print hello world\n for 1..1_000_000'  /dev/null

real0m0.379s
user0m0.380s
sys 0m0.000s

$ time perl -e '$s .= hello world\n for 1..1_000_000; print $s'  /dev/null

real0m0.752s
user0m0.600s
sys 0m0.150s

$ perl -v

This is perl, v5.8.4 built for i386-linux-thread-multi
[...]

Why did this happen?  Well when you print, most of the time what it does
is shove the data on a buffer.  If said buffer passes over some threshold
(eg 2 K) then it actually writes it to the pipe.  All of your output has to go
through this process, so adding a level of Perl buffering is pure
overhead.  Having to buffer all of it is more overhead still.  (Incidentally
in this case, syswrite is slightly faster than print.)

Or at least *should* be.  In older Perl's by default you went through the
OS stdio stuff, and the hand-off from Perl to the OS could be slow.
Depending on your platform, that is.  (Linux was slow IIRC.)  So you may
have once done a benchmark and made an optimization conclusion
and then never noticed that it has now become dated.  (This has
happened to me plenty of times...)

 use File::Slurp ;
 
 my $text = do_lots_of_work_and_return_all_the_text() ;
 
 print $text ;
 write_file( $tee_file, $text ) if $tee_file ;
 
 it makes for a very good api too in all

Re: [Boston.pm] Simultaneous redirect to STDOUT File?

2005-05-10 Thread Ben Tilly

On 5/10/05, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
[...]
   BT Maintainability is more important than optimization.  I often use
   BT this strategy for maintainance reasons.  Going full-cycle, one way
   BT to accomplish all of this without changing code is to tie to a
   BT filehandle that accumulates data and prints it later.
 
 but what if you don't want to print it but log it or send it to a
 message? what if you want a status sub to be useful in many different
 ways? making it use a handle or printing directly limits your
 flexibility and control. delaying printing until you are ready also
 means you can use write_file which is faster than print as it bypassed
 perlio.

With tie you can do all of that.  It may involve some hoops, but you
can do it.  You may need to write your own Tie class though.

The key point was without changing code.  I should have been more
explicit about that.  This is a strategy to consider if you have existing
code and wish to refactor in a way which is inconsistent with how it
was intended to work.  I would not normally choose to write new
code on that plan.

As for performance, again I consider optimization less important
than maintainability until proven otherwise.  Besides, in my
experience the bulk of I/O time tends to be spent waiting for
resources (another process, filesystems etc) rather than stdio
buffering.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Simultaneous redirect to STDOUT File?

2005-05-09 Thread Ben Tilly

Be aware that IO::Tee has limitations.  It only works for output that
goes through Perl's IO system.  In particular if your program makes
a system call, the child process will NOT see the tee.

Cheers,
Ben

On 5/9/05, Duane Bronson [EMAIL PROTECTED] wrote:
 If it's Unix-only, you can open (tee output.log |) and write to that.
 
 And search.cpan.org tells me there's IO::Tee
 
 Or you could use something like log4perl which I think allows you to
 configure multiple appenders of which one can be stdout and another
 can be a log file.  That might be overkill, though.
 
 
 Palit, Nilanjan wrote:
 
 I want to redirect print output to both stdout  a file at the same
 time: I can think of writing a sub that executes 2 print statements (one
 each to stdout  the filehandle), but I was hoping someone has a more
 elegant solution.
 
 Thanks,
 
 -Nilanjan
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm
 
 
 
 
 --
 Sincerely   *Duane Bronson*
 [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 http://www.nerdlogic.com/
 453 Washington St. #4A, Boston, MA 02111
 617.515.2909
 
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Simultaneous redirect to STDOUT File?

2005-05-09 Thread Ben Tilly

On 5/9/05, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
 
   BT Be aware that IO::Tee has limitations.  It only works for output that
   BT goes through Perl's IO system.  In particular if your program makes
   BT a system call, the child process will NOT see the tee.
 
 i bet you can work around that by saving STDOUT, reopening it on IO::Tee
 and having IO::Tee output to the file and the saved STDOUT. i leave
 implementing this as an exercise to the reader. but using shell to do
 this is probably the easiest as you can just use tee and all stdout
 piped to it (from the perl program or its subprocesses) will get
 teed. as larry says, the shell has to be useful for something!

You'd lose that bet.

IO::Tee is implemented through tying a filehandle inside of Perl.  The
entire mechanism only makes sense from within Perl.  A launched
subprocess (or poorly written XS code) goes through a fileno that the
operating system knows about.  Since the OS does not know about
Perl's abstractions of I/O, there is no way to get the OS to direct output
through them.

If you want to avoid the shell, the cookbook has a fork recipe for
postprocessing your own output.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] [getting OT] Controlling Windows with Perl?

2005-03-22 Thread Ben Tilly

While you can't uninstall IE, you can reduce its exposure to the web.

A friend of mine developed a lockdown approach that included
installing Mozilla, removing all visible signs of IE, pointing IE at a
proxy server, and creating a login script that continually repoints
IE at said proxy server.  This proxy server allows through a couple
of things that really need it (eg Microsoft Update), and otherwise
displays a static page telling you to use a better browser.  The
proxy server is a necessary step because you can't actually
remove IE.

Last I heard the lab that he runs (which is used by a bunch of
teenagers) had avoided getting any significant virus infections
in over a year.  He is proud of that fact, but complains that
achieving that goal takes a *lot* more work than it should.

Cheers,
Ben

On Tue, 22 Mar 2005 00:20:03 -0500, Anthony R. J. Ball [EMAIL PROTECTED] 
wrote:
 
   Windows cannot really live without IE, too many things embed it. I have
 just been playing with Macromedia Breeze and it obviously uses embedded IE
 to talk to the Macromedia site in its powerpoint plugin.
 
   Like it or not, the only way to unistall IE is to unistall Windows...
 
   Hrm... doesn't sound like an awful idea ;)
 
 On Mon, Mar 21, 2005 at 09:14:41PM -0800, Ranga Nathan wrote:
  Accessing inernet when you are logged on as administrator is like inviting
  AIDS (sorry, this sounds drastic but it is :) ).
  At home where I dont have too much security, I always log on as a common
  low-privilege user. while on internet. Using Mozilla is always wise.
  I can not believe that there is still no way to remove IE from Windows
  The worst nightmare is some casino site that attaches to IE like a leech!
  I even called those folks one day and they refuse to own up to anything!
  __
  Ranga Nathan / CSG
  Systems Programmer - Specialist; Technical Services;
  BAX Global Inc. Irvine-California
  Tel: 714-442-7591   Fax: 714-442-2840
 
 
 
 
  Bob Rogers [EMAIL PROTECTED]
 
  Sent by: [EMAIL PROTECTED]
  03/21/2005 07:03 PM
 
  To
  Ben Tilly [EMAIL PROTECTED]
  cc
  boston-pm@pm.org, Ranga Nathan [EMAIL PROTECTED]
  Subject
  Re: [Boston.pm] [getting OT] Controlling Windows with Perl?
 
 
 
 
 
 
 From: Ben Tilly [EMAIL PROTECTED]
 Date: Mon, 21 Mar 2005 18:21:38 -0800
 
 And now that there is serious venture capital behind adware, some
 of the more difficult security exploits are getting hit hard.  For
  instance
 I've heard that that internal Windows messages have *no* security
 infrastructure.  Any application can send a message to any other
 application and there is no way for the recipient to figure out who the
 message is really from.  (To exploit you have to send the right
 message to the right application when it is expecting to see a
 message that can be confused with yours.)
 
  That is correct.  It is apparently easy to subvert apps such as
  antivirus that run as Administrator via their GUI, if they are foolish
  enough to present a GUI on a less-privileged desktop.
 
 But if you're using IE as your trojan horse, and you already have
  enough control over it to send messages to other app windows, then you
  have full access to the privs of the IE user, so why bother?  Odds are
  it's a home system, and you won't even have to get Administrator privs
  in order to install adware, spyware, etc.
 
 A friend who supports a lot of small businesses is predicting that by
 the end of this year, Windows will essentially be unusable on the
 Internet.  This seems extreme to me, but I don't keep track of these
 things, he does, and he has pretty good insight into the industry.
 
  It seems extreme to me, too, even if we were just talking about home
  systems.  If I understand correctly, this window message thing is a
  fundamental design flaw in the older Windows APIs, but there is current
  technology that addresses the problem.  Unfortunately, it is less
  convenient for users, so the trick will be to get vendors to switch to
  using it.  But if it threatens to hit MS in their pocketbook, it will
  happen.
 
 But then, I do my best to ignore Windows, and have been largely
  successful at it, so I'm hardly an expert.
 
-- Bob Rogers
   http://rgrjr.dyndns.org/
 
  ___
  Boston-pm mailing list
  Boston-pm@mail.pm.org
  http://mail.pm.org/mailman/listinfo/boston-pm
 
 
 
  ___
  Boston-pm mailing list
  Boston-pm@mail.pm.org
  http://mail.pm.org/mailman/listinfo/boston-pm
 
 
 --
  www.suave.net - Anthony Ball - [EMAIL PROTECTED]
 OSB - http://rivendell.suave.net/Beer
 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 To find fault is easy; to do better may be difficult. - Plutarch
 
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman

Re: [Boston.pm] Controlling Windows with Perl?

2005-03-22 Thread Ben Tilly

On Tue, 22 Mar 2005 19:02:51 +, David Cantrell
[EMAIL PROTECTED] wrote:
 On Mon, Mar 21, 2005 at 06:21:38PM -0800, Ben Tilly wrote:
[...]
  A fun issue is popups.  Everything works fine and then someday
  there is an unexpected error and a popup stops everything in its
  tracks.  Sure, you can probably put in some kind of search for
  popups that makes them go away, but do you dare?  Until you see
  them do you know whether it is safe to ignore this popup?
 
  The issue here isn't even gui vs command line, the same problem
  was the bane of expect scripts.  It is fairly simple to teach the
  computer what happens if everything goes right.  But in
  manipulating someone else's user interface you discover
  boundary cases the hard way - one by one.  The result is very
  fragile.
 
 Whenever your test script meets something it has not been taught to
 expect that is either a bug in what you are testing, or a bug in the
 test script.  In either case, the only correct thing to do is stop.

I almost commented on this in my previous email.

What you say is absolutely correct for test scripts.  It most
emphatically is NOT correct for production jobs.  When you have
a chain of dependencies, halting the whole process dead in its
tracks at the first sign of trouble leads to a process that does not
complete.  Instead you want to log anything that seems minor,
pause and alert someone on anything that seems major (or is
unknown), and be able to be told to continue, told to stop, and
be able to be restarted from a known clean state without having
lost all of your work so far.  If you have frequent intermediate
states that you can restart from, then it may be OK to always
stop and just get restarted from there.

Adding that extra logic adds a lot of complexity.  Adding it while
working remotely through your (possibly incompletely
understood) interface adds a lot more complexity.  And once
written, even minor UI tweaks are likely to cause havoc, so you
cannot easily upgrade the program being driven.

If you're calling a library (by contrast), that extra logic is usually
easier to add.  (For one thing you don't have to categorize
errors - they already come as warnings or fatal.)  The API tends
to be a lot simpler than the UI was, and API changes from one
version to the next are far less likely to cause problems, so
you are much more able to upgrade.

If your job is testing software, then driving a UI is necessary,
there is no other way to test the UI.  If your job is maintaining a
production system, you really want a programmatic API.

Cheers,
Ben

PS Disclaimer: I've had to run nightly production jobs.  I have
never been responsible for automated testing.  Assume
appropriate biases.
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Controlling Windows with Perl?

2005-03-21 Thread Ben Tilly

On Mon, 21 Mar 2005 08:04:31 -0600 (CST), [EMAIL PROTECTED]
[EMAIL PROTECTED] wrote:
 I've seen programs that can monitor your keystrokes and mouse clicks, etc,
 in order to replay them against the operating system.  Does perl have the
 ability to do something like that?

Yes.

 The purpose of my search is that I want to automate certain
 responsibilities which necessitate using windows based programs, but not
 being a Windows programmer, I have no clue on how to do this.  I don't
 know if it's possible, or if perl can do the trick.  But I'm hoping
 someone else does.

Danger alert!  Danger alert!

What you'll find is that you can write the script, and it will mostly work,
but there will be constant issues.  For instance someone will walk in
to look at the batch job, will jiggle the mouse, then everything breaks.

 Command line functionality is not an option as many of the programs are
 gui only.

Many gui programs can be manipulated through Win32::OLE.

Many gui programs have a command-line replacement, or can
be rewritten in Perl.

Both approaches will be far more reliable than trying to drive a
gui programmatically.

 For an example, lets say I wanted to write a script that would open
 quickmail on my system, click the new message button, type in some stuff
 in the window, and then click send...

There are Perl modules that allow you to send mail directly.
Using them will be far simpler and more reliable.  Trust me on
this.

 Am I off in la-la land, or can this be done, and be done with perl?

It can.  Been there, done that, have the scars.

Which is why I'm telling you to only use that as a method of last
resort.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Controlling Windows with Perl?

2005-03-21 Thread Ben Tilly

On Mon, 21 Mar 2005 17:49:45 -0800, Ranga Nathan [EMAIL PROTECTED] wrote:
 You would use Windows Scripting tool for that. Check-out WSH (Windows
 Scripting Host).
 There are many macros that do just that and as it was pointed out, this
 has caused many security exploitations in windows.

And now that there is serious venture capital behind adware, some
of the more difficult security exploits are getting hit hard.  For instance
I've heard that that internal Windows messages have *no* security
infrastructure.  Any application can send a message to any other
application and there is no way for the recipient to figure out who the
message is really from.  (To exploit you have to send the right
message to the right application when it is expecting to see a
message that can be confused with yours.)

A friend who supports a lot of small businesses is predicting that by
the end of this year, Windows will essentially be unusable on the
Internet.  This seems extreme to me, but I don't keep track of these
things, he does, and he has pretty good insight into the industry.

 There is software like Win Runner (Mercury tools I think) and Load Runner
 that do this kind of thing for repeated testing of Windows applications.
 You should be able to do this in Perl too.
 You will be playing keystrokes to get to the buttons, basically like
 screen-scraping.

Sounds like a lot more work than, say, finding the right module to
send email directly.

A fun issue is popups.  Everything works fine and then someday
there is an unexpected error and a popup stops everything in its
tracks.  Sure, you can probably put in some kind of search for
popups that makes them go away, but do you dare?  Until you see
them do you know whether it is safe to ignore this popup?

The issue here isn't even gui vs command line, the same problem
was the bane of expect scripts.  It is fairly simple to teach the
computer what happens if everything goes right.  But in
manipulating someone else's user interface you discover
boundary cases the hard way - one by one.  The result is very
fragile.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] HTML Renderer

2005-03-08 Thread Ben Tilly

On Tue, 8 Mar 2005 14:02:47 -0500 (EST), Chris Devers [EMAIL PROTECTED] wrote:
[...]
 Similarly -- and this way lies madness, I admit up front -- just run the
 script on a system that can use AppleScript or COM (or WSH or whatever
 it is, I'm not a Windows programmer) to just automate interacting with a
 regular browser like Firefox or Safari, and save the result that way. If
 you run it on OSX, you can go straight from this to a PDF file for free.

I've done this on Windows for web pages that were IE only.  It was a
small PITA to get running (you have to install a driver to print to PDF
files and there were some magic parameters that had to be set by
hand in IE so that it would print to a file), but not that hard.  What was
hard was that it was unreliable, and every so often needed to be
kicked.  Which was OK since it was a batch process that produced a
bunch of them that were stored as files.  (I would NOT do this for an
interactive web page!)

I was very happy when those web pages got cleaned up so that we
could switch to html2pdf instead.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Perl popularity and interview questions

2005-03-07 Thread Ben Tilly

On Mon, 7 Mar 2005 08:22:04 -0800, Palit, Nilanjan
[EMAIL PROTECTED] wrote:
 
 -Original Message-
  From: Greg London [mailto:[EMAIL PROTECTED]
  Sent: Monday, March 07, 2005 11:17 AM
 
  As for the triple-plus operator   ;)
  I'd think perl would take x, do a ++ on it,
  get 2, and then do the +1 on it to get three.
  But oh well. just won't use that in my code.
 
 No. In Perl (or C), $x++ = use  then increment, whereas ++$x =
 increment  then use. Thus the expression will use the existing value of
 x (1) to compute the value of y  then increment x itself.
 
And in C++.  However there note that x++ implies an implicit
cloning operation - you need the original value to increment
and the returned value.  If your constructor is heavyweight, it
can be much better to write ++x, and avoid that clone (which
cannot be optimized away by the compiler because there may
be nontrivial semantic effects).

I consider it very ironical that C++ demonstrates that ++C is
a better way to write that idiom.  (Also I'd prefer a language
that was improved before I used it...)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Perl Popularity and Interview Questions

2005-03-07 Thread Ben Tilly

On Mon, 07 Mar 2005 08:36:56 -0500, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Mon, 2005-03-07 at 01:51 -0500, James Freeman wrote:
 
  [...] If you know more trivia
 than I do (I've yet to see that), then I would hire you on the spot.

Let's turn this into, Let's try to stump Aaron!

Here are a few tries from me:

1. What is the output of the following, and why?

  package Foo;
  our $_;
  print grep {$_} -5..5;

2. Explain what's special about 0 but true and why
that's never actually needed.

3. Why don't you get a warning from:

  perl -we 'ignore useless use of a constant'

4. As Larry points out in perlsyn, he has never felt the need
to use goto LABEL in Perl.  (With good reason, as was first
proven in the 70's, any algorithm that can be written with
goto can be written, with the same efficiency, with named
loop control.)  Why, then, did he include the feature?

5. Who made the syntax $somesub-() work, and why?

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] why popularity matters

2005-03-03 Thread Ben Tilly

On Thu, 3 Mar 2005 14:46:19 -0500 (EST), Greg London
[EMAIL PROTECTED] wrote:
 
 Chris Devers said:
  I think it would be nice if Perl were more popular. I don't think
  advocacy is a bad thing. I don't think certification, or courses, are
  unreasonable. But of the ways I can think of to make Perl more popular,
  I'm not sure that any of these will be more effective than simply
  writing great software that a lot of people benefit from: setting up a
  certification program; setting up a marketing campaign; ranting about
  the matter endlessly on mailing lists; shouting down people who think
  that having more great software would be a good thing.
 
 It seems that the poeple who want to talk about certification are
 the ones getting shouted down here.

I think that you're seriously misrepresenting what is happening here.

 ---
 Hey, maybe we could create certifcation as a way to advocate perl.
 Shut up. certification doesn't prove anything.

I think responses are more along the lines of, certification
introduces a lot of problems, and we don't see how you'll
make a certification become accepted.

 But it might make advocate perl in the business world.
 Shut up. the business world is an idiot if it doesn't use perl when it should.

I think responses are more along the lines of, we don't think
that the business world pays as much attention as you think.
If you think that it does, then please explain the success,
past and present, of C, C++, PHP and Perl.

 But if they pay attention to certification, why not give it to them?
 Shut up. Certification only makes money for the certification company.

Again, we're dubious that companies pay as much attention
as you think that they do, and it comes with costs.

 But what if OReilly could offer free web-based certification?
 Shut up and do it yourself.

I think that several have mentioned the cheap (used to be free)
web-based certification at http://www.brainbench.com/.  You
have not explained why we should expect another one to
have significantly better uptake than that one.

 I can't do it myself, that's why I'm bringing it to the Perl Monger list.
 Shut up and advocate then, and stop arguing on the mailing list.

Strange how that works.  You don't feel that you can tackle the
task and so argue on a mailing list, most of the members of
whom are in no better position to do it than you are.  What do
you expect to happen?  You then find out that this is a common
source of discussion, and a lot of people who are in better
positions to do something about it than you are also dubious
about it.  But you don't seem to be trying to understand why,
you're just frustrated that we are not acting on it.  Again, would
you predict this to be useful?

If you really wanted you could say, I'm going to tackle this, I
need help, anyone who wants to help me please sign up here.
That would be more likely to go somewhere.

Better yet, I pointed you at a past discussion which showed
you that there are prominent people who agree with you.  You
could go to one of them (Tim Maher comes to mind) and say,
I understand that you're interested in getting Perl certification
off of the ground, is there any way that I can help?  He has
several advantages over you.  He runs a training business, he
is well-known within the community, he has a better idea than
you do of who is likely to help and who isn't.  That option
sounds depressingly effective, you might actually get
somewhere.

But no.  Instead you're spending time talking about how
important this is without actually doing anything about it.  And
then you're wondering why it is going nowhere.  For an
excellent overview of why this usually won't work I highly
recommend reading _The Logic of Collective Action_.

 ---
 
 The mailing list is a shared channel. I don't shout people
 down because they're talking about something I don't want
 them to talk about. If you don't like certification, fine.
 If you have some historical information to give about it, fine.
 If you have some knowledge about certifciations attempt, fine.

I think that I gave you all of the things that you say are fine.
In fact I've given you all of them in this post.

 But after a certain point, the conversation moved from pointing
 out all the problems with certification, to attempting to push
 the conversation off the mailing list completely. There was at
 least one request to Ron to kill this thread, maybe two, I can't
 remember. And now most of the resistance is coming from people
 who have the attitude of Just fucking do it, stop talking about it.

Ron's request was to stop personal attacks in this thread.  Which
is fair, and may likely have been directed at something that I
said.  As for continued resistance, at some point if a discussion is
going nowhere, it makes sense to drop it.

 Some poeple seem to have broken out in hives at the mere mention
 of certification and are foaming at the mouth to the point where
 they cannot remain silent while a couple people talk about

Re: [Boston.pm] OT: O'Reilly

2005-03-02 Thread Ben Tilly

On Thu, 03 Mar 2005 00:50:34 +, Federico Lucifredi
[EMAIL PROTECTED] wrote:
 Hey Ben,
 
  How do you feel when you have a nice process in place through
  which people are supposed to contact you, and customers keep on
  persisting in trying to get direct numbers to inside contacts?  I tend
  to get irritated by that, but YMMV.  Maybe a random editor will be
  like me, maybe not.
 
 
  I am not trying to go *around* the process, I am just trying to get
 some advice from someone more in the know than myself, and
 someone on the inside is ideal to answer two or three stoopid
 questions before I send things in through the appropriate official
 channels.

Well from O'Reilly's point of view you certainly are going around
the process, they have somewhere that they want you to start
with to reach them, and you want to go contact an insider instead.
I have no idea how specifical employees there will feel about it
though.

  I as asking because, yeah, I can also figure that chromatic and
 Rael are editors there, but I am, indeed, concerned about
 bugging them out of the blue. Enough said.

You might try contacting any O'Reilly author instead to get
feedback/a better idea who might be sympathetic.

  You must have missed Brian's talk of 'bribing' two weeks ago --
 I am not going that far (yet!) =)

I definitely did miss that talk.  Remember, I'm only possibly
going to move to Boston, right now I'm in Santa Monica.

 PS: given how friendly the ppl at Pearson/AW seem to be,
 O'Reilly must really be under a deluge of proposals like Uri
 noted!

Well, they certainly are popular.  (Authors know that any
given title is likely to sell a lot better if it is published by
O'Reilly than it will when published by someone else.)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] perl6/pugs

2005-03-01 Thread Ben Tilly

On Tue, 1 Mar 2005 16:02:08 -0500, Adam Turoff [EMAIL PROTECTED] wrote:
 On Mon, Feb 28, 2005 at 03:39:30PM -0500, Gyepi SAM wrote:
  It must be: I am using LISP, after a long hiatus, and really liking it. I
  simply did not appreciate its power upon introduction six years ago.
 
 Yep.  I never fully understood closures until I used them in Perl.
 After that, Lisp and Scheme were no big deal.  [Except for the
 Y-Combinator, and ((call/cc call/cc) (call/cc call/cc)); those still
 make my brain hurt].

The Y-Combinator made my brain hurt until I figured it out.
Heavy use of continuations still make my brain hurt, and I'm
favourable to the opinion that a continuation is like a goto,
only worse.

Here's an explanation of the Y-Combinator.  It won't work in
Perl because Perl doesn't do lexical binding of input
parameters.  JavaScript does and most should know that, so
I'll do it in JavaScript.

Our goal is to be able to write a recursive function of 1
variable using only functions of 1 variables and no
assignments, defining things by name, etc.  (Why this is our
goal is another question, let's just take this as the
challenge that we're given.)  Seems impossible, huh?  As
an example, let's implement factorial.

Well step 1 is to say that we could do this easily if we
cheated a little.  Using functions of 2 variables and
assignment we can at least avoid having to use
assignment to set up the recursion.

  // Here's the function that we want to recurse.
  X = function (recurse, n) {
if (0 == n)
  return 1;
else
  return n * recurse(recurse, n - 1);
  };

  // This will get X to recurse.
  Y = function (builder, n) {
return builder(builder, n);
  };

  // Here it is in action.
  Y(
X,
5
  );

Now let's see if we can cheat less.  Well firstly we're using
assignment, but we don't need to.  We can just write X and
Y inline.

  // No assignment this time.
  function (builder, n) {
return builder(builder, n);
  }(
function (recurse, n) {
  if (0 == n)
return 1;
  else
return n * recurse(recurse, n - 1);
},
5
  );

But we're using functiions of 2 variables to get a function of 1
variable.  Can we fix that?  Well a smart guy by the name of
Haskell Curry has a neat trick, if you have good higher order
functions then you only need functions of 1 variable.  The
proof is that you can get from functions of 2 (or more in the
general case) variables to 1 variable with a purely
mechanical text transformation like this:

  // Original
  F = function (i, j) {
...
  };
  F(i,j);

  // Transformed
  F = function (i) { return function (j) {
...
  }};
  F(i)(j);

where ... remains exactly the same.  (This trick is called
currying after its inventor.  The language Haskell is also
named for Haskell Curry.  File that under useless trivial.)
Now just apply this transformation everywhere and we get
our final version.

  // The dreaded Y-combinator in action!
  function (builder) { return function (n) {
return builder(builder)(n);
  }}(
function (recurse) { return function (n) {
  if (0 == n)
return 1;
  else
return n * recurse(recurse)(n - 1);
}})(
5
  );

Feel free to try it.  alert() that return, tie it to a button, whatever.
That code calculates factorials, recursively, without using
assignment, declarations, or functions of 2 variables.  (But
trying to trace how it works is likely to make your head spin.
And handing it, without the derivation, just slightly reformatted
will result in code that is sure to baffle and confuse.)

You can replace the 4 lines that recursively define factorial with
any other recursive function that you want.

 /me wonders how different the world would be if EvilLarry didn't let map
 and filter^Wgrep slip into Perl...

I'm rather more thankful for closures.  After all, being list-oriented,
I can define map/grep quite easily.  But closures I need to be in
the language...

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] RPM building (was: Bottom Up)

2005-03-01 Thread Ben Tilly

On Tue, 1 Mar 2005 18:01:13 -0500, Gyepi SAM [EMAIL PROTECTED] wrote:
 On Tue, Mar 01, 2005 at 03:16:06PM -0500, Duane Bronson wrote:
[...]
 I don't know of any CPAN distributions. However, if you are on an RPM based
 system, you might try my ovid program
 
  http://search.cpan.org/~gyepi/Ovid-0.06/ovid
 
 which recursively converts CPAN modules into rpms by following dependencies.
 It makes a normally painful and tedious task very easy.
 It's rpm specific because that's what I usually use, but that needn't be.

The Debian equivalent is dh-make-perl.

I haven't used it extensively, so I don't know how well it works.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] OT: O'Reilly

2005-03-01 Thread Ben Tilly

On Tue, 01 Mar 2005 23:35:40 +, Federico Lucifredi
[EMAIL PROTECTED] wrote:
 Hello Uri,
 
 I have a bookish request: does anybody have an editorial contact at
 O'Reilly I can exchange a few ideas with? I am cooking a proposal
 for them and I need a few tips here and there.
 
BT I'd start with http://www.oreilly.com/oreilly/author/intro.html.
 
 been there, done that. What I need is someone to talk to *before* I send them 
 the proposal, hence my hope someone might have an editor's email.

How do you feel when you have a nice process in place through
which people are supposed to contact you, and customers keep on
persisting in trying to get direct numbers to inside contacts?  I tend
to get irritated by that, but YMMV.  Maybe a random editor will be
like me, maybe not.

You could lurk on use.perl.org and figure out that it looks like
chromatic and gnat work at O'Reilly.  Then contact them and see
if they're interested in helping you.  You might irritate them, you
might get good advice.  I don't know.

I'm going to guess that they'll tell you to start with
http://www.oreilly.com/oreilly/author/intro.html.  When your
proposal gets there, it doesn't have to be perfect.  If they think
that it has promise, they'll work with you on it.

Note that when I say, it has promise, I mean that it fits into their
idea of what they want their catalog to look like.  A great idea
for something that they have something pretty close to will lose
to an average proposal for something that they feel is a hole in
their offerings.

  and contact manning.com as well. they are open to proposals too. if you
  can't find the contact i should have some info still.
 
 I will keep that in mind, but right now I think this is such a fit for ORA 
 that I have a hard time thinking of going to another publisher.

ORA may or may not agree.  As I noted, the quality of the idea
is not the only factor in their decision.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] (also) Perl

2005-03-01 Thread Ben Tilly

On Tue, 1 Mar 2005 14:55:58 -0600 (CST), Alex Brelsfoard
[EMAIL PROTECTED] wrote:
  My impression is that the language which is making the most
  inroads on traditional Perl areas is PHP.  Is that because of the
  wonderful certifications that PHP has which Perl doesn't?  Or is
  it because PHP is seen as easier to get started with than Perl?
  Also PHP has the huge advantage that hosting environments
  allow it to be used in a shared hosting environment, while
  mod_perl requires dedicated servers.  (That is because PHP is
  less capable, so it is hard for one site to cause problems for
  other sites running in the same Apache process.)
 
 Are you telling me that this DOESN'T keep you up at nights?  I know I'm
 exaggerating, but this is partly what gets me riled up: that simply
 because something is easier to get started with it's better.  Hell the PHP
 documentation itself explains why it's easier to get started with: it
 gives su-root permissions on install.  So you don't need to configure
 anything.  Just sit down and play; no worries about not being able to do
 anything, because you basically have root.  I'm sorry, but I'm not willing
 to take on that huge a security hole just to make the setup process
 easier.  To me this just gives me more reason to fight harder to tell
 managers that Perl is the way to go.

If everything that depressed me kept me up at nights, I'd never get
any sleep.  It sounds to me like you should read the classic essay,
Worse is Better:

  http://www.dreamsongs.com/WIB.html

No matter how much I may wish it otherwise, the world will be as it
is regardless of what I can do.  So I'll try to educate my corner and
then survive as best I can.

  Suppose that we try this and it doesn't work.  Does the argument
  then become that we need to get our certification backed by
  someone prominent because a certification that nobody has
  heard of is proving to be useless?
 
 We're just trying to find ways to communicate to managers that know
 nothing about Perl.  This is just one idea.  And I think well worht
 TRYING.

If certification had no potential downsides, then I'd cheer you.
But it has potential downsides that concern me, so I won't.
Fortunately, unlike worse is better, what I'd like to have
happen will happen naturally without any effort on my part.

  Also I have a different theory.  My theory is that the non-savvy
  manager is going to ask someone he trusts for an opinion, who
  is either going to be someone whose competence has been
  proven (less likely), or is going to be someone else of about
  the same position and abilities (more likely).  In neither case
  does the existence of a certification enter into the process.
 
 Well if he is about the same position and abilities then the
 certification program will be advertising to him/her as well.

Here are some questions to ask yourself about this.

 - How much money do you wish to spend on advertising?
 - Where do you expect that money to come from?
 - Would that be a cost-effective use of that money?
 - Will the people whose money you're expecting to use agree?

  So now I need to take an endless stream of training from an
  approved source?
 
 Well, as was explained before.  Certification is only PART of the hiring
 process.  If you get one certificate and then spend years working with
 perl you obviously don't need another certificate.  Your experience will
 trump your certificate at that point.

I'm trying to see how this certificate does more than being able to
put on your resume, I've taken these courses from trainer X.  I've
seen people say that on their resumes, and I paid attention.  I did
not necessarily recommend the hire, but you don't need a
recognizable certificate to realize value from training.

  And remember to give the correct answer
  on a test even when I think it is wrong?  (Quick: is our a good
  thing?  Read http://www.perlmonks.org/?node_id=48379
  before answering.  Yet as cool feature of the day I'm sure that a
  certification would have required me to talk up how great it was.)
 
 Well, chalk that up to the proper design of the certification program.  At
 this point we're past deciding whether or not to DO the certification.
 We're at the point of deciding how best to do the certification.

If you're going to dream of a certification, why not dream of a
perfectly adminstered one?  My point is that existing certifications
are notorious for having specific shortcomings.  Unless you give
me a good reason to believe that this would be different, I'm going
to believe that your certification would be as bad as the rest.

  A certification that has very prominent and vocal opponents
  within the community is likely to have an uphill battle to
  acceptance.  A certification that didn't have enough support
  for people to learn what they need to pass it is going to find
  that the hill is looking more like a cliff.
 
 I thought we were discussing this because we were already looking up said
 hill.  And my

Re: [Boston.pm] perl6/pugs

2005-02-28 Thread Ben Tilly

Ruby is easier for Perl people to get into than Haskell.  By the same
token, learning Ruby will expand your horizons less than Haskell.

Which is preferable depends on your point of view.

Cheers,
Ben

On Mon, 28 Feb 2005 13:49:59 -0500, Benjamin Kram [EMAIL PROTECTED] wrote:
 I just grabbed binary of Haskell. I'm thinking of poking around with that as 
 well, and Ruby...
 
 b
 
 On Mon, Feb 28, 2005 at 01:40:35PM -0500, Kenneth A Graves wrote:
  On Mon, 2005-02-28 at 13:32, Aaron Sherman wrote:
   On Mon, 2005-02-28 at 12:51, Benjamin Kram wrote:
Has anyone had a chance to play with pugs?
I just svned down a copy and was going to toy with it a bit.
  
   Only a little bit. I am, however, sure that the correct way to boost the
   popularity of your favorite niche language is to write a compiler /
   interpreter in it for a popular language. Pugs will certainly boost
   Haskell in this way ;-)
 
  I haven't gotten around to playing with Pugs yet, but I did build
  Haskell this weekend.  It's a functional-programming conspiracy.
 
  --kag
 
 
 
  ___
  Boston-pm mailing list
  Boston-pm@mail.pm.org
  http://mail.pm.org/mailman/listinfo/boston-pm
 
 --
 it would be horrid to be robbed
  by the wrong kind of people
  -archy
  Don Marquis, the big bad wolf, 1935
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] advocacy

2005-02-28 Thread Ben Tilly

On Mon, 28 Feb 2005 16:04:34 -0500, Tom Metro [EMAIL PROTECTED] wrote:
 Sean Quinlan wrote:
[...]
 If Amazon, Yahoo, Ticketmaster, etc. are already using Perl in a big
 way, then why not put effort into making that more visible?
 
 One way is through a silly button campaign. Built with Perl, Powered
 by Perl, whatever. We've all seen them around for other products. I've
 even seen them for Perl, though I don't recall there being any standard
 or effort to encourage them.
 
 If such a thing existed, the next step would be getting the big name
 users of Perl to put them on their sites. It's better that an IT manager
 notices that Amazon uses Perl when he is shopping for books, than having
 a page somewhere on perl.com that lists Amazon among the big names.

This step is easier said than done.

Small companies have an incentive to not talk about their
technology very much.  As for why, Paul Graham put it well
in http://www.paulgraham.com/avg.html:

  And so, I'm a little embarrassed to say, I never said
  anything publicly about Lisp while we were working
  on Viaweb. We never mentioned it to the press, and
  if you searched for Lisp on our Web site, all you'd find
  were the titles of two books in my bio. This was no
  accident. A startup should give its competitors as little
  information as possible. If they didn't know what
  language our software was written in, or didn't care, I
  wanted to keep it that way.

When it comes to large companies, that real estate becomes
valuable territory and they're not going to donate it for free.
The technology you use is an internal decision.  It has no
relevance to the customer.  What is the business case for
putting it out there?  If you're going to ruin your branding by
advertising something, any healthy business is going to
advertise something that makes them money.

Think about things this way.  If you were selling, say, cars,
how much would you expect to have to pay to get that
advertising on those pages?  That's the amount that you're
asking them to give away.  What's your business case for
doing so.  That it would be great for you?  There are lots of
people and groups who could say that.  What about for
them?

 Another approach would be to get people from these companies to
 contribute articles to general IT publications. It's great that some of
 them show up at Perl conferences, but that's preaching to the converted.

They do from time to time.  However even that is not really
in the company's interest.  It is very much in the interest of
the person who gets published since it looks good on the
resume.  But it means that this valuable employee either is
likely to cost more or may leave.  And probably doesn't
help your core business.  (Unless you're in a business like
consulting, in which case you're likely to value the publicity.)

Actively discouraging employees from publishing is likely
to cause them to leave as well, so smart companies don't
discourage.  But they don't generally encourage either, and
aren't about to start.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] OT: O'Reilly

2005-02-28 Thread Ben Tilly

I'd start with http://www.oreilly.com/oreilly/author/intro.html.

Cheers,
Ben

On Tue, 01 Mar 2005 01:52:29 +, Federico Lucifredi
[EMAIL PROTECTED] wrote:
 Hello fellow Mongers,
  I have a bookish request: does anybody have an editorial contact at O'Reilly 
 I can exchange a few ideas with? I am cooking a proposal for them and I need 
 a few tips here and there.
 
  Best - Federico
 
 _
 -- 'Problem' is a bleak word for challenge - Richard Fish
 
 Muad'Dib of Caladan (Federico L. Lucifredi)- Harvard University  BU
 http://metcs.bu.edu/~lucifred
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm

 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] short-listing languages for applications software development

2005-02-25 Thread Ben Tilly

On Fri, 25 Feb 2005 09:04:51 -0500, James Linden Rose, III
[EMAIL PROTECTED] wrote:
 On Friday, February 25, 2005, at 08:28 AM, Tolkin, Steve wrote:
 
  I think this is the best point that has been advanced in favor of using
  perl:
  Amazon, Google, Yahoo, Morgan Stanley all use Perl in production ...
 
  Does anyone have additional details, e.g. the names of the projects,
  number of servers, number of users, estimated cost, estimated savings
  by
  using perl, etc.

That kind of additional detail would usually be considered proprietary,
and hence is unlikely to become public knowledge.

 I think it mentioned in the book eBoys that the guy who founded Ebay
 (Iranian guy whose name escapes me) wrote Ebay in Perl... and aside
 from that, I wrote KanjiCafe.com's Ice Mocha in Perl as well (^_^).

Pierre Omidyar.  The same thing was described in The Perfect Store.
But when they needed to scale, they went to C++.  So that's not a
very good advertisement for Perl.

However I've heard rumor that eBay recently aquired Rent.com,
which apparently is written in Perl...

Other well-known companies in the LA area who are using Perl in
a big way include Ticketmaster and City Search.

You can find more success stories at
http://perl.oreilly.com/news/success_stories.html

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] (also) Perl

2005-02-25 Thread Ben Tilly

On Fri, 25 Feb 2005 15:51:46 -0500, James Linden Rose, III
[EMAIL PROTECTED] wrote:
 On Friday, February 25, 2005, at 03:04 PM, Alex Brelsfoard wrote:
  I think part of the problem is that it is an open source system that
  doesn't have a fund for advertising.  I think if we simply saw some
  commercials on tv talking about Perl, or telling about all it's success
  stories.  Heck even if they're just like the Intel commercials simply
  saying Yeah, here we are.  We're Perl.  We're cool.  Yeah, so like
  us.
  It wouldn't take many to make a difference.

/me thinks of all of the dot coms who had advertising policies that
resembled that.  All failed of course, because they were wrong...

 Perl isn't completely without commercial allies.  Being the dominant
 publisher of Perl related texts, it has certainly been in O'Reily's
 interest to promote its use.  That aside, over the last 10 years, the
 number of shared CGI scripts written in perl and available to the web
 developing community is vast.  I'm sure it dwarfs all other languages.

I'm not sure that what is available in Perl dwarfs what is available
in PHP.  Furthermore shared CGI scripts tend to be truly awful.
(There are, admittedly, some exceptions.)

 What Perl is really lacking is a widely recognized, widely accessible
 certification program.  When you hire Java programmers they walk in the
 door with papers proving that somebody said they know what they're
 doing.  Perl is generally practiced outside this whole vetting process.

Welcome to the routine debate about whether Perl should have a
certification program.  You're free to start one, but you'll have a lot
of trouble getting prominent people to sign on.

   That makes less technically experienced bosses woozy with fear.  You
 know you're a genius with Perl, but no 3rd party has printed up a
 certificate telling your employer this.

Actually in my experience the people who are most confident of
their abilities tend to be mediocre at best.  Top notch people are
generally aware of ways that they can be better.  (If you don't
spend time painfully aware that improvement is needed, then
improvement doesn't tend to happen...)

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] (also) Perl

2005-02-25 Thread Ben Tilly

On Fri, 25 Feb 2005 19:18:54 -0500, Bogart Salzberg
[EMAIL PROTECTED] wrote:
 
 On Feb 25, 2005, at 6:08 PM, Alex Brelsfoard wrote:
 
  Ideas?
 
 How about an alliance with Apple? Ditch AppleScript and replace it with
 Perl, marry Perl to a GUI and turn Mac users into Perl-hacking
 sysadmins.
 
 Does anyone know of a good book on database theory? Really.

Joe Celko is well-regarded and has several books aimed at
programmers at different levels.  Pick one that you feel might be
at your level.

If you're using Oracle, I'll highly recommend anything that you
feel is applicable by Thomas Kyte.  (Many of his books are
intended for DBAs, you probably don't want those.)

Speaking personally, I don't have a ton of book recommends
because I did most of my learning about SQL from
co-workers.  I suspect that many Perl programmers who use
databases are in the same boat, which may be why you
have been getting so few responses to that request.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: Debian v CPAN RE: [Boston.pm] Install problems

2005-02-15 Thread Ben Tilly

On Tue, 15 Feb 2005 06:24:51 + (GMT), Simon Wilcox
[EMAIL PROTECTED] wrote:
 On Mon, 14 Feb 2005, Ben Tilly wrote:
 
  I've also been told that Module::Build doesn't do a good job for
  people who want to install a module into a personal directory -
  it tries to install it into the system directory and then fails if you
  don't have permissions. :-(
 
 That's not been my experience. install_base seems to tdo the right thing
 as far as I can tell. From the man page:

 install_base
 
 You can also set the whole bunch of installation paths by supplying
 the install_base parameter to point to a directory on your system. For
 instance, if you set install_base to /home/ken on a Linux system, you'll
 install as follows:
 
  lib = /home/ken/lib
  arch= /home/ken/lib/i386-linux
  script  = /home/ken/scripts
  bin = /home/ken/bin
  bindoc  = /home/ken/man/man1
  libdoc  = /home/ken/man/man3
 
 Note that this is different from how MakeMaker's PREFIX parameter
 works. PREFIX tries to create a mini-replica of a site-style installation
 under the directory you specify, which is not always possible (and the
 results are not always pretty in this case). install_base just gives you a
 default layout under the directory you specify, which may have little to
 do with the installdirs=site layout.
 
 The exact layout under the directory you specify may vary by system -
 we try to do the sensible thing on each platform.
 
 So ./Build --install_base=/home/simonw should do what you want. Do you
 have other experience ?

Here's my experience.  I'm using Module::Build with the compatibility
layer in my modules.  I had a user attempt to install my module into
a personal directory using CPAN.  Said user met with abysmal
failure.  Perhaps there was something obvious to do, but the user
didn't find it, given my situation at the time (travelling with only
minimal Internet access and no ability to test anything) I couldn't
easily debug the problem, and I was therefore left unhappy.

Given that my needs are very simple (test that this plain Perl module
with few or no dependencies works, then copy it to the appropriate
system directory), I've become very doubtful that Module::Build is
buying me anything other than another dependency which
sometimes can cause trouble.

Once I deal with a failed hard drive at home, I'm planning to switch
my modules away from Module::Build.  I originally switched to it
because I liked the idea of making it easier to do installs on
multiple platforms, including Windows.  But my impression is that
virtually nobody on Windows actually uses Module::Build, they
aren't really starting to, it complicates life on Unix, and I don't see
at this point that it is buying me anything.  (I haven't seen it buying
me anything for a very long time, but until I had a frustrated user
that I couldn't easily help I had no reason to get rid of it.)

I also have disagreements with the Module::Build team,
particularly their attitude towards providing backwards compatibility
with the rest of the world.  You can find my opinions summed up at
http://www.perlmonks.org/?node_id=354276.  Feel free to
disagree with me as much as you want.  But please remember
that I'm criticizing a social issue, not a technical one.  You can
have all of the technical choices right, but if you can't get people
to adopt it, then you've lost.  Yes, it might be great if the world+dog
adopted Module::Build.  But we'll never get to the promised land
unless the necessary social dynamics line up to get world+dog
on board.

Regards,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Install problems

2005-02-14 Thread Ben Tilly

On Mon, 14 Feb 2005 03:39:21 -0500, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
 
   BT My recollection says that Debian likes to place the
   BT modules that it installs in /usr/lib while the ones
   BT that you install go into /usr/local/lib.  Guess which
   BT one is first on Perl's library path?
 
   BT This can cause problems when you've installed the
   BT needed version of a module but Debian has placed
   BT an older version somewhere earlier in your path.
 
   BT My solution is to configure CPAN to pass the
   BT install-time argument UNINST=1, which causes CPAN
   BT to delete the conflicting Debian version of a module
   BT on install.  (If I'm going to CPAN, I know what I want to
   BT do and Debian doesn't know enough to manage the
   BT packages for me.)
 
 i think another solution would be to just rip out debian's
 /usr/lib/perl5 and /usr/bin/perl and install perl from source using
 /usr/local/lib. then all cpan modules will be properly installed there
 and perl will be in /usr/local/bin. also then you get to build perl the
 way you want. my suse 9 has perl built with threads which slows all
 programs down.

I think that Perl is part of the Debian core system - they
have a lot of system utilities that use Perl, and have been
known to trip interesting bugs when they switch versions.
Plus were I to take your suggestion then I'd have to do a
bunch of research to find out what custom modules they
expect to have installed, and install them myself.

Furthermore I could get into interesting fun if in a system
upgrade Debian decided to reinstall its own version of
Perl after all.  And I'm suddenly back into the original
problem with no idea what broke, and what all I need to
fix to get back.

If you want to use a /usr/local/bin/perl on Debian, go
ahead.  But replacing theirs with yours (either in local or
not) seems to me to be bad sysadmining.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: Debian v CPAN RE: [Boston.pm] Install problems

2005-02-14 Thread Ben Tilly

On Mon, 14 Feb 2005 10:37:03 -0500, Ricker, William
[EMAIL PROTECTED] wrote:
  doesn't support Module::Build so any modules
 
 Ouch
 
  iirc even if they have a compatibility Makefile.PL.
 
 Double ouch. Maybe it needs a patch.
 
 Thanks for the warning, that may put me off adopting Module::Build.

I've also been told that Module::Build doesn't do a good job for
people who want to install a module into a personal directory -
it tries to install it into the system directory and then fails if you
don't have permissions. :-(

Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Install problems

2005-02-13 Thread Ben Tilly

On Sat, 12 Feb 2005 14:38:23 -0500, Joel Gwynn [EMAIL PROTECTED] wrote:
 Thanks for your response.  See below
[...]
  6. what has changed since that installation?
 
 I can't think of anything. I may have upgraded something using apt-get

My recollection says that Debian likes to place the
modules that it installs in /usr/lib while the ones
that you install go into /usr/local/lib.  Guess which
one is first on Perl's library path?

This can cause problems when you've installed the
needed version of a module but Debian has placed
an older version somewhere earlier in your path.

  7. do you know about any other problems from others with Geo::Code::US (it
  has failed before according to CPAN) and Bundle::CPAN (looks okay)?
 
 Both failed.
 
  8. do you have all the prerequisites already installed for these two
  modules?
 
 
 I thought the CPAN module was supposed to handle the dependencies.
[...]
The CPAN module is supposed to handle dependencies.

Debian is supposed to handle dependencies.

When the two argue about what to do and how to do it,
you lose.  And they can't both be in control.

My solution is to configure CPAN to pass the
install-time argument UNINST=1, which causes CPAN
to delete the conflicting Debian version of a module
on install.  (If I'm going to CPAN, I know what I want to
do and Debian doesn't know enough to manage the
packages for me.)

Good luck locating the conflicting module versions.

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Social Meeting Followup

2005-01-22 Thread Ben Tilly

On Thu, 20 Jan 2005 16:16:24 -0500, Ronald J Kimball
[EMAIL PROTECTED] wrote:
 Fifteen Perl Mongers braved the cold last night to attend our Social
 Meeting at Fire+Ice in Harvard Sq., including our guest of honor, Ben
 Tilly.  We ate lots of food, told bad jokes, and discussed Perl mind share,
 evolution, the history of Boston Perl Mongers, bioinformatics, the state of
 our web site, mathematics, and other topics.  A few people stayed until the
 restaurant closed at 11pm (which is when the worst of the bad jokes were
 told!).

And an enjoyable time was had by the guest of honor (aka the
primary bad joke teller)!

But I cannot speak for my victi^Hjoke recipients...

Cheers,
Ben
 
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] mind share

2005-01-15 Thread Ben Tilly

On Tue, 08 Mar 2005 22:37:29 -0500, William Goedicke
[EMAIL PROTECTED] wrote:
Dear Tom -

I've thought a lot about why perl hasn't gained respect in the
deployment/hiring marketplace.

Tom == Tom Metro [EMAIL PROTECTED] writes:

Tom This reminded me of something I've wondered about for a long
Tom time. Why did PHP become as successful and popular as it is,
Tom even though it mostly offers a subset of what Perl can
Tom do.

I think that PHP gained popularity for two reasons. It initially met
a need, that is, to embed logic within html. Second, it was simple.

And a third. It is so limited that hosting companies have no
problem enabling PHP. Therefore if you want to use a $20 host
you get the choice: use PHP and be fast or use Perl CGIs and
be slow. But you can't use mod_perl unless you run your own
server.

Tom Similarly, Java, seemingly through the addition of servlets,
Tom succeeded at enterprise web development, despite Perl having
Tom been there first.

It was more than that. There was a successful marketing campaign
which portrayed security, deployability and state-of-the-artness.

And don't forget that Java had the aura of corporate support
before open source had any mindshare. (Sun began
marketing Java before open source was even a phrase!)

Tom Today mod_perl is only rarely recognized as being an
Tom application server.

But, among productivity focused programmers mod_perl is recognized as
one of the best frameworks to deliver web applications.

I'm not sure whether, at this point, there is much in
practice to distinguish mod_perl from competitors like
mod_python. I'm also not sure how many people have
the mindset that mod_* is really an application server
that just happens to work over the web really well.

I'm also dubious of how well disseminated basic
mod_perl best practices are. For instance how many
know to use reverse proxies for performance? See
http://perl.apache.org/docs/1.0/guide/strategy.html#Adding_a_Proxy_Server_in_http_Accelerator_Mode
for details.

Tom More recently, there's Python [...] great success with its
Tom own application server, Zope.

As a perdominately perl programmer I must say I love zope and bemoan
the lack of comparable CMS in perl.

Well you can use Zope from within Perl:

http://www.zope.org/Wikis/zope-perl/FrontPage

Personally I don't like Zope. It made the mistake of
pushing you to have code in an opaque repository
that cannot be trivially integrated (or at least could
not when I last checked, which was a long time ago)
with standard revision control systems. They may
have fixed this since I looked - they certainly have
fixed a lot of potential problems, but from my point
of view this is a deal breaker.

For the same reason, no matter how tempting it is
to have code in a database, don't. Ditto for your
basic configuration information. See
http://www.perlmonks.org/?node_id=394251 for a
slightly longer rant about this.

Tom And lastly, C#, which has borrowed ideas from Perl, Java, and
Tom C++.

Competing with the commercial software world is a whole different animal.

We were already discussing Java which is part of the
commercial software world.

Tom All of these are aspects of the same theme - Perl loosing
Tom mindshare to other technologies. It started out as a quiet,
Tom underground language (telling someone you programmed in Perl
Tom back in the late 80's, early 90's just got a blank stare) and
Tom is perhaps heading back there (I've noticed it getting
Tom dropped off the list of programming languages listed on trade
Tom magazine qualification forms).

Siiggg.. You're right, of course, but, isn't that issue all
about the battle with the commercial world.

My impression is that the Perl job scene has been
improving in the last couple of years. My other
impression is that Perl has an unfortunately high
proportion of programmers who have messed with
basic CGI but do not understand programming very
well.

Having said that, I'm a leader in a consulting firm and I'm struggling
to convince my firm that we should develop a LAMP Enabling practice.
I see tons of organic LAMP deployment occuring. The idea of my
consulting product is that LAMP deployments are immature and that
there's value-adding consulting in making LAMP deployments enterprise
quality and by aligning them with strategic goals.

I love using the phrase enterprise quality and I
hate hearing it. Both for the same reason. You can
mean anything you want by it, but the listener is
likely to give it a very generous interpretation. :-/

Perl's strength, in my mind, is that it has enormous breadth. As an
example; I write some app and after the fact realize I need to process
barcodes. No problem.

This is an important strength, but it becomes less
important as projects become more significant.

What I mean by that is this: for small projects you
can

Re: [Boston.pm] When will the Jan meeting be?

2004-12-31 Thread Ben Tilly

On Fri, 31 Dec 2004 02:26:44 -0500, Sean Quinlan [EMAIL PROTECTED] wrote:
 On Fri, 2004-12-31 at 02:05, Ben Tilly wrote:
  Sunday?  (Checks quickly.)  I'll try to make the next Sun the 19th
  of Jan in 2014 if I'm around.
 
 DOH! Right my calendar is still on Dec. That's my cue I should have been
 off to bed a while ago! :)

I suspected as much...

  In the meantime I'd love to make Thu the 19th of Jan in 2005 at,
  say, 7 pm... ;-)
 
 How about Wed the 19th of Jan 2005? ;-} Of course, at this point I can't
 be certain of much obviously, but at least we've gotten started!

And to quote Homer, D'oh!  My bedtime was also exceeded...

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] When will the Jan meeting be?

2004-12-31 Thread Ben Tilly

On Fri, 31 Dec 2004 11:58:09 -0500, Sean Quinlan [EMAIL PROTECTED] wrote:
 On Fri, 2004-12-31 at 11:43, Ronald J Kimball wrote:
   Sorry for the inconvenience! If we can quickly select an alternate day
   for Jan I'll try to get it scheduled ASAP.
 
  How about a technical meeting on Tuesday, Jan 25?  Meanwhile, we can
  discuss what day would be best for future meetings.
 
 OK, I'll find us a room for the 25th.
 
  After some negotiation, we have agreed upon the 19th of January, 2005,
  being a Wednesday.  Now we just need to decide upon a location.  I'd favor
  somewhere we don't usually go to (i.e. not Boston Beer Works or Cambridge
  Brewing Company).
 
 LMAO! Well, I think Uri suggested FireIce in Harvard Sq., which is
 certainly someplace new. Parking in Harvard is the pits, but access by T
 is good and parking can be found free or cheap around Davis Sq. or at
 Alewife if you have to drive and don't want to face Harvard.

I think that I'll be near Mass General that week, which I think is near
Harvard.  (I don't know Boston, it may be on the other side of the
city for all I know.)  So Harvard is likely to be convenient for me.

 Non-Harvard alternates in Kenmore Sq. are Bertuchi's and Uno's.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] multiple encodingd - utf8

2004-12-30 Thread Ben Tilly

On Wed, 29 Dec 2004 23:16:24 -0500, Jeff Finn [EMAIL PROTECTED] wrote:
 hey all,
 
 I have a group of files in a directory on a linux box where the file names
 are either encoded with utf-8 or shift_jis.  Unfortunately, not knowing
 japanese, I have no idea which is which.  Is there a way to go through the
 directory and determine how the filenames are encoded?  Ultimately, I want
 to put this directory listing on the web, and I want the browser to be able
 to display the correct names of all files without having to manually toggle
 the charecter encoding.

I've never used it but the Jcode module exports a getcode function
that looks like it will do what you want.

The documentation for Jcode suggests that it should be superceded
in Perl 5.8 by the Encode module, but I didn't browse its documentation
enough to verify that.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

2004-12-30 Thread Ben Tilly

On Thu, 30 Dec 2004 18:02:07 -0500, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Wed, 2004-12-29 at 18:10, Ben Tilly wrote:
 
  Under normal circumstances, to get non-miniscule odds of having
  a collision somewhere between MD5 keys, you'd need about 2**64
  keys.  If you have less than, say, a billion keys then you can ignore
  that possibility for all practical intents and purposes.
 
 I understand risk assessment and the idea that nothing is 100% safe, but
 when you have a situation where you KNOW from day one that some keys
 will collide, and your data will be corrupted, you don't build that into
 your system if you have an easy out.

Then I recommend that you never use rsync.  As for me, I'm
sometimes willing to accept the possibility of algorithm failures
which are less than the odds of my program going wrong
because of cosmic radiation.

 This is hashing 101. You hash, you bucket based on the hashes, and then
 you store a list at each bucket with key and value tuple for a linear
 search. There are other ways to do it, but this is the classic.

Yes, I'm familiar with this, and outlined it in a previous email in
this thread.

 Of course, Perl does this for you. That extra time that I measured is
 almost certainly the time spent comparing the two strings, which your
 tie interface will also have to do because of collisions.

Want to bet whether Perl spends more time in computing hash
values or comparing strings?

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] Can anyone lend me a dorm fridge?

2004-12-30 Thread Ben Tilly

I'll be in New England from Jan 3 through Jan 22.  It would be
extremely nice for me to have a portable fridge so that I can keep my
son's food in my hotel room.  I'd rather not buy one for just a few
week stay.

If anyone has a fridge that they can lend me for that period, please
get back to me or Linda Julien ([EMAIL PROTECTED]).

Thanks,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Can anyone lend me a dorm fridge?

2004-12-30 Thread Ben Tilly

On Fri, 31 Dec 2004 01:53:31 -0500, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
 
   BT I said New England for a reason.  I'll be in a number of hotels and a
   BT number of states.  While I'd expect some hotels to work out, I don't
   BT think that I should plan on always being lucky, hence my desire for a
   BT portable fridge that I can take with me...
 
 CLUEBATi get it!/CLUEBAT
 
 well, calling them all is still possible. lugging around a dorm fridge
 is a pain unless you must have it.

Calling them all means figuring out what they all are in advance.
That takes something known as planning, which I've never been
known for.

Ironically I used to own a fridge that would fit the need, but threw
it out because I moved into somewhere that had a real fridge.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] When will the Jan meeting be?

2004-12-30 Thread Ben Tilly

On Fri, 31 Dec 2004 01:30:46 -0500, Sean Quinlan [EMAIL PROTECTED] wrote:
[...]
 In honor of Ben's visit I hereby propose a social event for Sun the 19th
 of Jan at say 7pm. any interesting new suggestions for a location? Given
 the season the closer to a T station the better. And of course a good
 beer selection is required! ;-}

Sunday?  (Checks quickly.)  I'll try to make the next Sun the 19th
of Jan in 2014 if I'm around.

In the meantime I'd love to make Thu the 19th of Jan in 2005 at,
say, 7 pm... ;-)

 --
 Sean Quinlan [EMAIL PROTECTED]
 
 
 ___
 Boston-pm mailing list
 Boston-pm@mail.pm.org
 http://mail.pm.org/mailman/listinfo/boston-pm
 
 

___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

2004-12-29 Thread Ben Tilly

On Wed, 29 Dec 2004 10:49:19 -0500, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Tue, 2004-12-28 at 13:46, Ian Langworth wrote:
  On 28.Dec.2004 01:14AM -0500, Tom Metro wrote:
 
   If you are concerned about the performance impact of long
   keys, and your application fits a write-once, read-many
   model, then you could always hash the hash keys. Say generate
   an MD5 digest of the key string, and then use the digest as
   the hash key.
 
  This might make a nice Tie:: module, if there already isn't
  one. But then again, tie itself is allegedly slow...
 
 No, that would defeat the point
 
 Or at least that's what I was going to say... I had a whole rationale
 typed up, but then I went to benchmark my hypothesis and I get this:
 
 $ perl -MBenchmark -e 'my $long=ax10_000;my %x;timethis(100_000,sub 
 {$x{$long}++});print Final: $x{$long}\n'
 timethis 10:  7 wallclock secs ( 6.54 usr +  0.00 sys =  6.54 
 CPU) @ 15290.52/s (n=10)
 Final: 10
 $ perl -MBenchmark -e 'my $long=ax10_000;my %x;timethis(100_000,sub 
 {my $tmp=unpack(%32C*,$long) % 65535;$x{$tmp}++});my 
 $tmp=unpack(%32C*,$long) % 65535;print Final: $x{$tmp}\n'
 timethis 10:  2 wallclock secs ( 2.16 usr +  0.00 sys =  2.16 
 CPU) @ 46296.30/s (n=10)
 Final: 10
 
 Is there a bug in my code, or is there really that substantial a
 savings?

The savings is somewhere between nothing and the ratio of
lengths of string, so it is in the range that I expected.  I see no
obvious errors in your code.  So I'd suspect that you're seeing
what the savings looks like.

 Of course, there's a substantial problem with the above: hashes DO
 conflict. Your module would have to do the same conflict resolution that
 perl's built-in hashing would do, and that's probably where the extra
 overhead comes in (though I admit I'm not seeing it... perhaps in
 comparing the long value to the original?)

Think about what Perl has to do to do a hash lookup.

1. Compute a hash value.  This is a calculation that goes
  character by character, and hence takes an amount of
  time that is proportional to the length of the key.

2. Figure out which bucket that hash value would go into.
  This is a fixed numerical calculation.

3. Walk through the linked list that that bucket points to,
  checking whether or not you have the right value.  As an
  optimization Perl does this check by first comparing the
  hash value and string lengths and only then does it
  compare the strings for equality.  Walking the list is
  (on average if the hashing algorithm works well) a fixed
  time operation.  But testing for equality takes time
  proportional to the length of the key.

So you see that (if the hashing algorithm does a good job of
distributing keys to buckets), the time to access a hash
element is independent of the number of things in the hash
but proportional to the length of the hash key.

 In a case where collisions wouldn't be a real problem, I guess that's a
 non-issue, but those are rare cases.

If your hashing algorithm does not take time proportional to
the length of the thing to be hashed, then it is ignoring
possible differences in big chunks of that thing.  Cases where
you'd be willing to do that are likely few and far between.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

2004-12-29 Thread Ben Tilly

On Wed, 29 Dec 2004 13:13:22 -0800, Palit, Nilanjan
[EMAIL PROTECTED] wrote:
 Folks,
 
 Thanks for the good ideas  the performance discussion. I'll try out the
 different suggestions.
 
 Now, regarding Tom Metro's original suggestion for using an MD5 Digest:
 I read that the original MD5 algorithm has known issues with collisions.
 Any experiences with how well Digest::MD5 does when used with many
 millions of keys? Do I need to test for collisions myself (at the
 expense of lost performance), or is it pretty well tested (or proved?)
 to stand up to an intensive application?

FYI, known issues means that we have a known way to produce
two files with the same MD5 hash that is faster than just looking for
one.

Under normal circumstances, to get non-miniscule odds of having
a collision somewhere between MD5 keys, you'd need about 2**64
keys.  If you have less than, say, a billion keys then you can ignore
that possibility for all practical intents and purposes.

That said, the suggestion of using MD5 keys is a non-starter for
eliminating the performance issue.  Calculating an MD5 hash of a
string of length n is O(n).  In fact _any_ decent hashing algorithm
is going to take time proportional to the length of the string because
if you try to take less time, then you have to skip parts of the string
and then you can't notice changes in the skipped part of the string.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

2004-12-29 Thread Ben Tilly

On Wed, 29 Dec 2004 18:54:41 -0500, Tom Metro [EMAIL PROTECTED] wrote:
 Ben Tilly wrote:
  That said, the suggestion of using MD5 keys is a non-starter for
  eliminating the performance issue.  Calculating an MD5 hash of a
  string of length n is O(n).
 
 The qualifier I added to my suggestion of using MD5 was that the
 application be of a write-once, read-many nature, with respect to the
 keys. Thus once you generate the digest of the long key, you cache it.
 It isn't a universal solution.

Where do you cache it, and given that you've cached it, why not
cache the hash value instead?

That is, if you're going to write once, read many times, you can do
something like this:

  my $value_ref = \ $hash{$big_key};
  # now access $$value_ref lots of times

This strikes me as simpler and more efficient than the following:

  my $short_key = md5($big_key);
  # now access $hash{$short_key} lots of times

The only problem with this scheme is that if $big_key is not in
the hash it can be inconvenient.  But even so, you may find that
two regular hash lookups in Perl are faster than computing one
md5 hash.  (Or you may not - benchmark it.)

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] When will the Jan meeting be?

2004-12-28 Thread Ben Tilly

On Tue, 28 Dec 2004 13:01:28 -0500, Ronald J Kimball
[EMAIL PROTECTED] wrote:
 On Sun, Dec 19, 2004 at 11:35:45PM -0800, Ben Tilly wrote:
  As I said before, I'll be in Boston for part of January.  January 19
  would be particularly convenient for me to meet with Boston.pm people.
   If it is another time I could try to make it (but probably won't
  succeed).  But I'll need a basic plan soonish because my access to a
  computer will be spotty through January.
 
 I just realized that we don't have a date reserved at BU for the January
 meeting.
 
 Ben, would you prefer meeting people at a social meeting or a technical
 meeting?

I'd prefer meeting people at whichever kind of meeting would result
in my meeting more people and having more of an opportunity to
talk to them. :-)

However if a technical meeting is better for that, don't look at me to
present anything.  I'm spending January taking care of my son, while
my wife goes to residency interviews, and Jan 19 will be my first
break from baby care all year.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] Max hash key length

2004-12-27 Thread Ben Tilly

On Mon, 27 Dec 2004 16:36:38 -0800, Palit, Nilanjan
[EMAIL PROTECTED] wrote:
 I wanted to know if there are any limitations to the max key length used
 for hashes in Perl. Also, what are the performance implications, if any,
 of using long keys? I have an application that needs key lengths in the
 range of ~1000, but with relatively limited numbers of keys (few to low
 tens of thousands).

There is no upper limit beyond access to RAM.

The performance implication is that computing a hash value takes
time proportionate to the length of the key.  So doing a hash lookup
for a key of length 1000 could be up to 100 times slower than doing
a hash lookup for a key of length 10.  (It won't actually be 100 times
as slow because there are other steps which take the same time, but
without benchmarking it I don't know what the likely time is.)

As always, the average time to access an element in a hash should
be independent of the number of keys that you have.

Cheers,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

[Boston.pm] When will the Jan meeting be?

2004-12-19 Thread Ben Tilly

As I said before, I'll be in Boston for part of January.  January 19
would be particularly convenient for me to meet with Boston.pm people.
 If it is another time I could try to make it (but probably won't
succeed).  But I'll need a basic plan soonish because my access to a
computer will be spotty through January.

Thanks,
Ben
___
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-15 Thread Ben Tilly

On Sat, 13 Nov 2004 17:43:37 -0500, Uri Guttman [EMAIL PROTECTED] wrote:
  BT == Ben Tilly [EMAIL PROTECTED] writes:
 
   BT How was I confusing issues?  What I meant is that calling mmap does
   BT not use significant amounts of RAM.  (The OS needs some to track
   BT that the mapping exists, but that should be it.)  Once you actually use
   BT the data that you mmapped in, file contents will be swapped in, and
   BT RAM will be taken, but not until then.
 
 mmap uses virtual ram which means physical ram and swap space. so mmap can
 suck up as much physical ram as you have if you allocate it.

We are both right, and we are both wrong.  The reality is that
any such behaviour on the part of mmap is OS and
implementation dependent.  And an intelligent OS will make
much of it configurable.  

Let me refer to my local Linux manpage.  For this purpose I'd
specify MAP_SHARED.  In that case edits made to memory are
made to the file, and there is no need to reserve RAM or swap.
(Not that Linux would reserve RAM anyways, Linux allows
overcommitting.)  Something close to your described behaviour
may happen if you use MAP_PRIVATE and don't specify
 MAP_NORESERVE.  If you specify MAP_NORESERVE you won't use
swap.  (You might get a SIGSEGV if you make a write when RAM
is not available.)

Which reinforces the point that specific details about the
side-effects of mmap (or any system call) are implementation
dependent and should never be assumed.

   BT As for a 3 GB limit, now that you mention it, I heard something
   BT about that.  But I didn't pay attention since I don't need it right now.
   BT I've also heard about Intel's large addressing extensions (keep 2GB
   BT in normal address space, page around the top 2 GB, you get 64 GB
   BT of addressible memory).  I'm curious about how (or if) the two can
   BT cooperate.
 
 eww, that sounds like manual management of extended memory. like the
 days of overlays or even pdp-11 extended memory (which i used to access
 22 bit address (yes, 22 bit PHYSICAL space) from a 16 bit cpu.). not
 something you want to deal with unless you have to.

You got it.  Worse yet, from the way that I see it, in the next 5 years our
industry must decide whether the bad old days are going to return, and
I don't know which way it will jump.

The problem is that consumer computers will soon use over 4 GB of
RAM.  The obvious clean solution is to switch to 64 bit CPUs. But as
soon as you go to 64-bit pointers, a lot of programs and data
structures grow (worst case they double) so there is a big jump in
memory needs.  (Rather than being a bit over your needs, you are a
lot over.)  And when data bloats, moving that data around the
computer slows down as well.  Programmer pain is invisible.  That
size and performance hit is not.

A couple of years ago Intel quietly added an addressing extension to
allow for up to 64 GB of RAM, and then pursued a non-consumer
64-bit strategy (which flopped).  The way that I read that is that Intel
expects the consumer industry to do as it did for a decade after the
16 to 32 bit conversion should have happened - stick with smaller
pointers and swallow the addressing difficulty.

AMD's strategy was more public.  They came out with a 64-bit CPU
that solved a bunch of problems with the x86 architecture, meaning
that if you switch to 64-bit mode there is a good chance that your
code will speed up.  Their CPU has been successful enough that
Intel has been compelled to issue a copycat CPU.

But which way the industry will jump is still unclear to me.  To me
the first good test is what happens when high-end games start
having memory requirements that are too big for straightforward
32-bit access.  (A limit which conventially is placed at 2 GB, but can
be stretched to somewhere between 2 GB and 4 GB.)  Will they
manually manage some big chunks of data, or will they require an
AMD Athalon-compatible computer?

as for the 
 original
 problem, i keep saying that mmap will give little help if the input
 matrix won't fit into physical ram. once you start swapping (manually or
 virtually) all bets are off on speed of any transpostion algorithm. you
 have to start counting disk accesses and once you do, who care how it
 was done (mmap or whatever)?

mmap with MAP_SHARED may reduce your RAM requirements, and
improves your access speed.  I agree that the difference is pretty
marginal.

As for your counting disk accesses, I've already pointed out that
disk accesses are not created equal.  If you really care about the
performance of the application, you need to benchmark, not make
simplistic estimates.  Because unless you know a lot of detail about
how the disk drive works, you can't easily predict what the actual
performance will be.

[...]
   BT However the over-committed allocation comment confuses me.
   BT Why would a single mmap result in over committing memory?
 
 you can allocate all the virtual space allowed

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-15 Thread Ben Tilly

On Mon, 15 Nov 2004 15:58:11 -0500, Aaron Sherman
[EMAIL PROTECTED] wrote:
 On Sat, 13 Nov 2004 11:40:25 -0800, Ben Tilly [EMAIL PROTECTED] wrote:
  On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] wrote:
   On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote:
  [...]
Um, mmap does not (well should not - Windows may vary) use any
RAM
 
   You are confusing two issues. using RAM is not the same as allocating
   process address space.
 
  How was I confusing issues?
 
 Let me demonstrate:

  What I meant is that calling mmap does
  not use significant amounts of RAM.
 
 Calling mmap uses NO RAM. It doesn't interact with RAM at all. But
 does allocate (potentially huge) amounts of process address space, and
 reserves it in such a way that your process can no longer allocate it
 for uses like libc's memory allocator (which you access through
 functions like malloc).

I feel like we are all talking past each other.  Let's go back to basics.

When I say RAM I mean the physical RAM on the computer.
Whether or not that RAM is currently allocated to your process.  So
if you do something and that makes something else get paged out,
then you've used RAM in my view.  Whether that RAM is in pages
that are attached to your process, or was used by the kernel I still
see that as you using RAM.

 If you mmap a 3GB file (actually less than 3GB, but I'll use that
 number as an example for now) on an x86 linux box and then call
 malloc, you get back a NULL pointer because malloc will fail. This
 is actually not quite true. That malloc will likely work because it
 will be allocated from some existing page of address space that
 malloc's internal page allocator reserved before you called mmap, but
 that won't work for long.

I'm aware of this and wasn't disputing it.

   (The OS needs some to track
  that the mapping exists, but that should be it.)
 
 Actually, no. The place that mmap is tracked is a) in the file
 descriptor table, which is outside of your 3GB process space in
 kernel-space and b) in the system page table, which is not in your
 address space at all, but in hardware.

Where it is tracked doesn't concern me.  That it is tracked, does.

However I realize that I don't know enough about how the memory
management is handled.  I would think that this would be dynamic
in some way - on creating a process the kernel should need to
write very little data, but will then write more later.  But I don't know
enough to verify that one way or the other.

  Once you actually use
  the data that you mmapped in, file contents will be swapped in, and
  RAM will be taken, but not until then.
 
 RAM will be taken is a meaningless term here. Ignore RAM for
 purposes of this conversation.

On the one hand you are saying that I'm confused about what I
meant by a comment about using RAM, and on the other you
are telling me that I am to ignore RAM for the purposes of this
conversation, it is meaningless.  There is a contradiction there.
For discussing what *you* want to talk about it may be
meaningless, but for discussing what *I* had been talking
about it isn't.  And for deciding whether or not I was confused
it most definitely isn't.  (Perhaps you're confused about what I
was talking about?)

It appears that you want to discuss what the world looks like
to a process.  For that I wholeheartedly agree, talking about
what is in RAM is generally counterproductive, if the
abstraction of virtual memory works, then you should never
know or care about what is or is not in RAM.

But I was talking about what things lead to resource
consumption that could adversely affect a machine which is
carrying out a particular computation.  For that it matters a
great deal whether particular operations are going to cause
pages of RAM to be discarded and allocated for something
else.  Because when it comes to actual performance, the
abstraction of virtual memory leaks badly.

(And what I was saying about resource consumption is that
mmap doesn't.  Consume in meaningful amounts that is.)

  As for a 3 GB limit, now that you mention it, I heard something
  about that.  But I didn't pay attention since I don't need it right now.
 
 Suffice to say that your process cannot be larger than 3GB under x86
 Linux. There are extensions, options and hacks if you want to go
 larger, but after 3GB it gets very dicey.

Are you saying that Linux does not give an user-level API to Intel's
addressing extensions?  Or that it does but you recommend
against using it?

  I've also heard about Intel's large addressing extensions (keep 2GB
  in normal address space, page around the top 2 GB, you get 64 GB
  of addressible memory).  I'm curious about how (or if) the two can
  cooperate.
 
 The ability to re-map memory like this is quite common, and the *OS*
 can take advantage of it, but as long as you're on an x86 and using
 32-bit pointers, your one process can still only have 3GB of address
 space (4GB-1GB for system area). But it could, for example

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-15 Thread Ben Tilly

On Mon, 15 Nov 2004 18:46:15 -0500 (EST), Dan Sugalski [EMAIL PROTECTED] 
wrote:
 On Mon, 15 Nov 2004, Ben Tilly wrote:
 
  On Mon, 15 Nov 2004 15:58:11 -0500, Aaron Sherman
  [EMAIL PROTECTED] wrote:
   On Sat, 13 Nov 2004 11:40:25 -0800, Ben Tilly [EMAIL PROTECTED] wrote:
On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] 
wrote:
 On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote:
 [Massive snippage of two ships passing in the night]

Heh. :-)

In Perl I'd expect it to be possible but fragile.  If Parrot could make
it possible and not fragile, that would be great.
  
   In parrot it's quite robust. Parrot supports buffers as core PMC
   types. A buffer can refer to any part of memory with any read-only or
   copy-on-write semantics you like.
 
  That would be nice.
 
  Incidentally will Parrot also support efficiently building strings
  incrementally?  I like the fact that in Perl 5 it is O($n) to do
  something like:
 
$string .= hello for 1..$n;
 
  In most other languages that is quadratic, and I'm wondering
  what to expect in Perl 6.
 
 That's not O(n) in Perl 5, it's just smaller than O(n^2). The same's true
 for Parrot -- we've got mutable strings and generally over-allocate, so
 it's not going to be quadratic time. Neither, though, is it going to be
 linear. Expect somewhere in between.

I don't have source-code in front of me, but my memory says that
when you have to reassign a string in Perl 5, the amount of extra
length that you give is proportional to the length of the string.  (If
that is not how Perl does it, then Perl darned well should!)  In that
case it really is O(n).

What happens is that the total recopying work can be bounded
above by a geometric series that converges to something O(n).
Everything else is O(n).  So the result is O(n).  I gave you more
details at http://www.perlmonks.org/?node_id=276051.  (Hey, my
math background has to be good for something...)

Perl uses the same strategy for other data structures, including
growing a hash and on various array operations.

Of course the main reason that I'm familiar with this trick is that my
2 miniscule core contributions to Perl were performance
enhancements from using this trick in places where it had not
been previously used.  (i.e. unshift and map.)  Unlike you, I
don't have any other performance knowledge to confuse me...

Cheers,
Ben
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-13 Thread Ben Tilly

On Fri, 12 Nov 2004 23:04:46 -0500, Aaron Sherman [EMAIL PROTECTED] wrote:
 On Fri, 2004-11-12 at 13:22 -0800, Ben Tilly wrote:
[...]
  Um, mmap does not (well should not - Windows may vary) use any
  RAM
 
 You are confusing two issues. using RAM is not the same as allocating
 process address space. Allocating process address space is, of course,
 required for mmap (same way you allocate address space when you load a
 shared library, which is also mmap-based under Unix and Unix-like
 systems). All systems have to limit address space at some point. Linux
 does this at 3GB up to 2.6.x where it becomes more configurable and can
 be as large as 3.5, I think.

How was I confusing issues?  What I meant is that calling mmap does
not use significant amounts of RAM.  (The OS needs some to track
that the mapping exists, but that should be it.)  Once you actually use
the data that you mmapped in, file contents will be swapped in, and
RAM will be taken, but not until then.

As for a 3 GB limit, now that you mention it, I heard something
about that.  But I didn't pay attention since I don't need it right now.
I've also heard about Intel's large addressing extensions (keep 2GB
in normal address space, page around the top 2 GB, you get 64 GB
of addressible memory).  I'm curious about how (or if) the two can
cooperate.

 To be clear, though, if you had 10MB of RAM, you could still mmap a 3GB
 file, assuming you allowed for over-committed allocation in the kernel
 (assuming Linux... filthy habit, I know).

Exactly what I was referring to.

However the over-committed allocation comment confuses me.
Why would a single mmap result in over committing memory?

  mmap should not cause any more or less disk accesses than
  reading from the file in the same pattern should have.  It just lets
  you do things like use Perl's RE engine directly on the file
  contents.
 
 Actually, no it doesn't as far as I know (unless the copy-on-write code
 got MUCH better recently).

Where does a write happen?  I was thinking in terms of using the
RE engine (with pos) as a tokenizer.

I was thinking that you'd use something like Sys::Mmap's mmap
call directly so that there is a Perl variable that Perl thinks is a
regular variable but which at a C level has its data at an mmapped
location.  Fragile, I know (because Perl doesn't know that it cannot
reallocate the variable), but as long as you are careful to not cause
it to be reallocated or copied, there should be no limitations on
what you can do.

 Like I said, you probably won't get the win out of mmap in Perl that you
 would expect. In Parrot you would, but that's another story.

In Perl I'd expect it to be possible but fragile.  If Parrot could make
it possible and not fragile, that would be great.

Cheers,
Ben
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-12 Thread Ben Tilly

On Fri, 12 Nov 2004 07:38:57 -0500, Gyepi SAM [EMAIL PROTECTED] wrote:
 On Fri, Nov 12, 2004 at 02:11:37AM -0500, Aaron Sherman wrote:
[...]
 I think mmap would be just as ideal in Perl and a lot less work too.
 Rather than indexing and parsing a *large* file, you must mmap
 and parse it. In fact, the CSV code, which was left as an exercise in you
 pseudo-code, would be the only code required.

It depends on your definition of ideal.  A Perl string is far more
complex than a C string, and translating between the two adds
complexity.  It requires an external module and adds platform
dependencies.

 I should point out though that mmap has a 2GB limit on systems
 without 64bit support. Such systems can't store files larger than
 that anyhow.

This is at best 2/3 correct.

First you're right that mmap has a 2 GB limit because it maps
things into your address space, and so the size of your pointers
limit what you can address.

It is also correct that there are complications in handling large
files on 32 bit systems.  Most operating systems didn't handle
that case.

However today most 32 bit operating systems have support
for large files, and Perl added the necessary hooks to take
advantage of it several versions ago.  So if you have a
relatively up to date system, odds are very good that you
don't have a 2 GB limit.  Certainly not on Windows or Linux.

Cheers,
Ben
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

Re: [Boston.pm] transposing rows and columns in a CSV file

2004-11-12 Thread Ben Tilly

On Fri, 12 Nov 2004 10:05:27 -0500, Uri Guttman [EMAIL PROTECTED] wrote:
  GS == Gyepi SAM [EMAIL PROTECTED] writes:
[...]
 this talk about mmap makes little sense to me. it may save some i/o and
 even some buffering but you still need the ram and mmap still causes
 disk accesses.

Um, mmap does not (well should not - Windows may vary) use any
RAM (other than what the memory manager needs to keep track of
the fact that the mapping has happened).  Using mmap does not
imply any particular algorithm, it is an optimization saying that
you're going to leave the ugly details of paging the file to/from your
process to the OS.

mmap should not cause any more or less disk accesses than
reading from the file in the same pattern should have.  It just lets
you do things like use Perl's RE engine directly on the file
contents.

 if the original file is too big for ram then the
 algorithm chosen must be one to minimize disk accesses and mmap doesn't
 save those. this is why disk/tape sorts were invented, to minimize the
 slow disk and tape accesses. so you would still need my algorithm or
 something similar regardless of how you actually get the data from disk
 to ram. and yes i have used mmap on many projects.

I'm not sure what you mean by something similar, but yes,
you'll need SOME algorithm to solve the problem.  Which
statement is so general as to be meaningless.  I'm sure that
there are some possible algorithms that you'd never have
thought of.  (Mostly because they're bad.)

Disk/tape sorts were invented because back in the day there
was not enough RAM to do anything useful and so everything
had to go to disk.  Of course once you're forced to go to disk,
why not optimize it...?

Of course this problem said to guarantee being able to do the
sort, not necessarily to do it most efficiently.  Therefore no
single criteria - including disk accesses - necessarily MUST
dominate your choice.  Furthermore disk accesses are not
created equal.  There are multiple levels of cache between
you and disk.  Accessing data in a way that is friendly to cache
will improve performance greatly.

In particular managing to access data sequentially is orders of
magnitude faster than jumping around.  The key is not how
often you access disk, it is how often your hard drive has to
do a seek.  When it needs to seek it reads far more data than
it is asked for and puts that in cache.  When you read
sequentially, most of your accesses come from cache, not
disk.  That is why databases use merge-sort so much, it
accesses data in exactly the way that hard drives are designed
to be accessed most efficiently.  A quick sort has fewer disk
accesses, but far more of them cause an unwanted seek.

 when analyzing algorithm effienciency you must work out which is the
 slowest operation that has the steepest growth curve and work on
 minimizing it. since disk access is so much slower than ram access it
 becomes the key element rather than the classic comparison in sorts. in

You must, must, must.  What is this preoccupation with must?

As I just pointed out, disk accesses are not all equal.

Secondly in many applications you will *parallelize* the
slowest step, not minimize it.  For instance good databases
not only like to use mergesort internally, they often distribute
the job to several processes or threads that all work at once,
that way if one process is waiting on a disk read, others may
be going at the same time.

Thirdly, and most importantly, it is more important to make
code work than to make it efficient.  If a stupid solution will
work and a smart one should be faster, code the stupid
solution first.

 a matrix transposition in ram, i would count the matrix accesses and/or
 copies of elements. with a larger matrix, then ram accesses would be
 key. my solution would load as much matrix into ram as possible (maybe
 using mmap but that is not critical anymore) and transpose it. then
 write the section out. that is 2 (large) disk accesses per chunk (or 1
 per disk block). then you do a merge (assuming you can access all the
 sction files at one time) which is another disk access per section (or
 block). and one more to write out the final matrix (in row order). so
 that is O((2 + 2) * section_count) disk accesses which isn't too bad.

You said that you want to assume that we can access all section
files at once.  Well suppose that I take a CSV file which is 100
columns by 10 million rows, transpose it, then try to transpose it
again.  Your assumption just broke.  Maybe it would work for the
person with the original problem, maybe not.

Here is the outline of a solution that avoids all such assumptions.

1. Run through the CSV file and output a file lines of the format:
  $column,$row:$field
You'll need to encode embedded newlines some way, for instance
s/\\//g; s/\n/\\n/g; - you may also want to pre-pad the columns and
rows with some number of 0's so that an ASCII-betical sort does
The Right Thing.

2. Sort the intermediate

[Boston.pm] Passing through

2004-11-11 Thread Ben Tilly

I'll be in town early next year and would like to meet some of the locals.

Exact dates have not been nailed down yet, but I should be in Boston
from January 14-21 or so.  If a boston.pm meeting could happen in that
time, I'd be interested in going.  If nothing official happens, I'd be
up for something unofficial.

The best day for me probably will be Wed, Jan 19.

Cheers,
Ben
___
Boston-pm mailing list
[EMAIL PROTECTED]
http://mail.pm.org/mailman/listinfo/boston-pm

< 1 2

101 - 183 of 183 matches

Mail list logo