Re: fun programming problem
On Tue, Nov 10, 2009 at 6:08 AM, Shawn H Corey shawnhco...@gmail.comwrote: Matthew Sacks wrote: I am unabashedly posting a quiz question I have about regular expressions: Looking for suggestions. I am thinking 1) make the set of regular expressions into one big expression? 2) search the seach strings, somehow, for common substrings. acme.org would be example. Each hit on acme.org would indicate a match on one of the original search strings? comments invited. You have 100,000 strings which are regex patterns intended to match URLs (including hostname and possibly a URI). When you receive a message, you need to see if it matches at least one of these patterns. The naive approach would be to go through your list of patterns linearly and attempt a regex match on each one. Suggest an alternative that would be more efficient. Trade memory for speed. Below is my solution: create a hash of hashes of hashes, etc. Each level has one letter in the pattern or a wildcard marker or a successful match marker. For each string, break it into characters and do a breath-wide search down through the tree. If at the end of the string, there is a match amrker, the string is matched. Known bugs: foo***bar will fail if the number of characters between foo and bar is less than 3. To do: record the pattern in the hash so if a match occurs, you know what pattern matched. #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # Make Data::Dumper pretty $Data::Dumper::Sortkeys = 1; $Data::Dumper::Indent = 1; # Set maximum depth for Data::Dumper, zero means unlimited $Data::Dumper::Maxdepth = 0; binmode STDOUT, ':utf8'; my %patterns = (); print \nPatterns\n; while( DATA ){ last if m{ __DATA__ }msx; print; chomp; my $p = \%patterns; while( m{ ( \\? . ) }gmsx ){ my $c = $1; $c = '-wildcard' if $c eq '*'; $c = $1 if $c =~ m{ \\ (.) }msx; $p-{$c} = {} unless $p-{$c}; $p = $p-{$c}; # print \x{ab}$c\x{bb} ; } $p-{-matches} = 1; # print \n; } # print 'patterns = ', Dumper \%patterns; $Data::Dumper::Maxdepth = 3; print \nStrings\n; while( DATA ){ chomp; my @patterns = ( [ \%patterns, 0 ] ); for my $c ( split // ){ # print \x{ab}$c\x{bb} , Dumper \...@patterns; STDIN; my @next_patterns = (); for my $set ( @patterns ){ push @next_patterns, $set if $set-[1]; my $p = $set-[0]; if( exists $p-{$c} ){ push @next_patterns, [ $p-{$c}, 0 ]; } if( exists $p-{-wildcard} ){ push @next_patterns, [ $p-{-wildcard}, 1 ]; } } @patterns = @next_patterns; } # print \n; my $matches = 0; for my $p ( @patterns ){ # print 'checking ', Dumper $p; STDIN; if( exists $p-[0]{-matches} || exists $p-[0]{-wildcard}{-matches} ){ print $_ matches\n; $matches = 1; last; } } print $_ no match\n unless $matches; } __DATA__ *amazon.com* *craigslist.* acme.org*sub=7* a1.vcl.com* ebay.com/sports* foo\*bar foo food foo\*blah __DATA__ www.amazon.com?x=123 books.amazon.com?y=123 acme.org acme.org/stuff?sub=7 acme.org?w=1sub=7 vcl.com/suba/subb?qs=abc a1.vcl.com?w=3 a2.vcl.com/xyz?w=1 -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Hi Matthew, I have to agree with Shawn, as you follow the tree branch by branch you get to the final match, the total number of checks required should on average be lower then it is when checking every single regex, which you would have to do for every string that does not match anything else. However due to a URI now or very soon (lost track of that a bit) being able to consist of any international character the list will be a lot longer then just 'a' to 'Z' and '0 to '9' so checking one character at a time could easily be way more work then one would initially think. It might therefore be better to match the first two, three or how ever many characters to reduce the initial number of regular expressions one has to go over for every single string. Then on a next step again it might pay to match a set of characters rather then just one at a time as this would reduce the number of required checks to decide if a match is possible or not, and to go down the next branch or toss the string back with a does not match response. I would say that depending on how often you are going to do the checking and how dynamic your list of regular expressions is you might want to spend a lot more time on finding the fastest programmatic way to order this hash of hashes then you will have to care about the resulting time needed to do the matching of any number of strings using this
RE: Monitoring multiple child processes
On some systems, waitpid may return something rather than a child pid or -1. This would happen when the wait was interrupted by something other than a child death. Most likely, it would be zero. The return status, $?, is a 16-bit word of three packed values. See `perldoc perlvar` and search for /\$\?/. It should be set to a non-zero value by the child if the child does not terminate successfully. Most UNIX commands do this; most user-written scripts do not. Also, Windows ignores the return value and always returns zero. $? should not change unless a child process died. Ahhh, I see. I think I understand now. Thankyou for your help and explanations. Andy Capgemini is a trading name used by the Capgemini Group of companies which includes Capgemini UK plc, a company registered in England and Wales (number 943935) whose registered office is at No. 1 Forge End, Woking, Surrey, GU21 6DB. This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: fun programming problem
Rob Coops wrote: I have to agree with Shawn, as you follow the tree branch by branch you get to the final match, the total number of checks required should on average be lower then it is when checking every single regex, which you would have to do for every string that does not match anything else. However due to a URI now or very soon (lost track of that a bit) being able to consist of any international character the list will be a lot longer then just 'a' to 'Z' and '0 to '9' so checking one character at a time could easily be way more work then one would initially think. It might therefore be better to match the first two, three or how ever many characters to reduce the initial number of regular expressions one has to go over for every single string. Then on a next step again it might pay to match a set of characters rather then just one at a time as this would reduce the number of required checks to decide if a match is possible or not, and to go down the next branch or toss the string back with a does not match response. I would say that depending on how often you are going to do the checking and how dynamic your list of regular expressions is you might want to spend a lot more time on finding the fastest programmatic way to order this hash of hashes then you will have to care about the resulting time needed to do the matching of any number of strings using this resulting hash. Well, if you really want speed, use C. Use Perl just to work out the algorithm. Which is why I chose to do it one character at a time. In C, comparing strings is only slightly faster than doing one character at a time. Also, I would put all my nodes in one vast array. That way I can use indexes rather than pointers to link them. I can then store the whole thing in a file without serializing and de-serializing it every time I write and read it. And I would add a caching mechanism to it so I could pretend the entire thing was in memory all along. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: fun programming problem
On Mon, Nov 9, 2009 at 9:22 PM, Matthew Sacks matthewsacks1...@yahoo.com wrote: I am unabashedly posting a quiz question I have about regular expressions: Looking for suggestions. I am thinking 1) make the set of regular expressions into one big expression? 2) search the seach strings, somehow, for common substrings. acme.org would be example. Each hit on acme.org would indicate a match on one of the original search strings? comments invited. You have 100,000 strings which are regex patterns intended to match URLs (including hostname and possibly a URI). When you receive a message, you need to see if it matches at least one of these patterns. The naive approach would be to go through your list of patterns linearly and attempt a regex match on each one. Suggest an alternative that would be more efficient. For me, the question of optimization hinges on two questions: 1) is the goal to optimize a single match, or to optimize the *process*, and 2) will a single long-running program (deamon, even) process many messages, or will the program be launched for each incoming message? The branching approach is effective for a program that will execute separately for each incoming message, but assuming that the program will process many messages, and that the goal is to optimize the average, I would suggest the following strategy: 1) Compile all the regex (se the qr// operator) 2) Store the regex as a linked list indexed with an integer 3) Begin with the naive approach, recording hits (either in a separate hash or another layer nested in the list) 4) Periodically reorder the list pointers based on the recorded hits, so that you are always matching against the most likely targets first. Further optimizations: You may want to introduce aging in a long-running filter so that you are sure the currently-trending patterns always occupy the front of the list. You might introduce Shawn's branching tree to reduce start-up time for the first few (hundred? thousand?) matches until you fell you have a large enough sample to be predictive. It might also be worth it--depending on the particular use case--to store match everything against only the top few (dozen? hundred?) hits, and then go then branch. Branching optimization: If you go the tree route, as either a start-up, or as a complete solution, consider breaking out the branches of the tree for simultaneous processing (multi-threading, forking). If you don't multi-thread, consider dynamically reordering the branches based on predictive data. HTH, -- jay -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom! -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
verilog-perl-3.221 usage troubles
I try the latest version 3.221 and found some difference between manual. The Verilog::Netlist::Net object seem not recognize its member function -data_type is there any advise? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Monitoring multiple child processes
On Nov 9, 3:31 am, shawnhco...@gmail.com (Shawn H Corey) wrote: Taylor, Andrew (ASPIRE) wrote: Hello I have a script that, much like the Little Old Lady who lived in a shoe, has so many children it's not sure what to do. Basically, the script does some stuff, then kicks off multiple copies of an external process that all run in parallel. It then has to wait until all of them have completed (successfully) then carry on with the next bit. What I've cobbled together so far (i.e. swiped of the web then modified...) use strict; use warnings; # Do some stuff here for (my $i=1; $i =$num_processes; $i++) for my $i ( 1 .. $num_processes ) { defined(my $pid = fork) or die( Cannot fork : $!\n); # Put all the pids into an aray if ($pid) { push @pids, $pid; } else { exec external giggery pokery } } foreach my $child_pid (@pids) { waitpid($child_pid, 0); my $kid = waitpid($child_pid, 0); unless( 0==$? ) { die (Oh No! An external process has failed!!\n); } redo if $kid != $child_pid; Hm, I'm not sure why you'd want to 'redo' in the case of a blocking wait. As I read the docs, a blocking wait on a specific pid returns only that pid if reaped normally or -1 if there's no such pid (or that pid's been already reaped). In either case, I'd think you'd just want to continue processing the pid loop. In contrast (at least on POSIX compliant OS's), a non-blocking blocking wait with -1 rather than a specific pid could potentially reap other pid's: use POSIX :sys_wait_h; #... do { $kid = waitpid(-1, WNOHANG); } while $kid 0; This might reap an extraneous process such as one launched via backticks maybe in an forked pid for instance. On some systems, waitpid may return for any interrupt, not just when the child terminates. Also the POSIX non-blocking wait has the advantage in that case by providing macro's that can reveal some additional exit/termination status: use POSIX :sys_wait_h; ... $kid = waitpid( -1, WNOHANG); print $kid: signal termination if WIFSIGNALED($?); -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Net::FTP fails when invoked from browser
Dermot, I did set domain and user environemt variables, no change. Uri, thanks for the tips. I've implemented localtime() and next if, but still see same results. Mr. Scott, I do believe you are onto something there. I do see the browser being updated after each file is FTPd, but the process simply stalls after about 150 files. Have tried different file sets and different destination computers, all same result. Output to the browser is after each file is FTPd. Takes less than 1 second per file, entire ftp step (150+ files) takes a little less than 2 minutes. However, I did notice that when running the script from the command line, I see a lot of Net::FTP GLOB messages. These are not being output to the browser. Once I noticed that, I set debug from 1 to 0. That seems to have fixed the issue. Now the entire set of files get FTPd without the process stalling. So maybe the FTP GLOB messages were confusing Apache. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: verilog-perl-3.221 usage troubles
On 11/9/09 Mon Nov 9, 2009 7:48 PM, cute cutew...@wildmail.com scribbled: I try the latest version 3.221 and found some difference between manual. The Verilog::Netlist::Net object seem not recognize its member function -data_type is there any advise? Yes. Post a short program that illustrates the problem you are having. Since the Verilog::Netlist::Net documentation clearly includes a description of the data_type() method, the problem must lie somewhere before your call to that method. For example, are you sure you have a valid Verilog::Netlist::Net object instance? -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
check a string against a array
Hi, I'm an oracle user. One of my implementation needs a little bit of perl usage. i've an array(@hexa_tableau) which contains restricted hexadecimal characters and a string which is converted into hexadecimal. while ((@carac) = $sel-fetchrow_array) { push(@hexa_tableau, $carac[0]); } my (@chars) = unpack(H* = $line); my (@hexed) = map {$_} @chars; my $hexed = uc join(' ' = @hexed); I just wanted to check that any of the hexadecimal code present in array (@hexa_tableau) appears in the string($hexed) For example if my @hexa_tableau = (00,01,02); and $hexed = '6100636465414353206664736161647361640A' if (check the presence here) { print exists; } Thanks in advance -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Monitoring multiple child processes
C.DeRykus wrote: On Nov 9, 3:31 am, shawnhco...@gmail.com (Shawn H Corey) wrote: redo if $kid != $child_pid; Hm, I'm not sure why you'd want to 'redo' in the case of a blocking wait. As I read the docs, a blocking wait on a specific pid returns only that pid if reaped normally or -1 if there's no such pid (or that pid's been already reaped). In either case, I'd think you'd just want to continue processing the pid loop. From `perldoc -f waitpid`: On some systems, a value of 0 indicates that there are processes still running. In other words, waitpid can return a value which is not the pid of the child you requested. Therefore redo, since it's a foreach loop. This, of course, is only true on some systems; it may not be true on yours. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: check a string against a array
rithu wrote: Hi, Hello, I'm an oracle user. One of my implementation needs a little bit of perl usage. i've an array(@hexa_tableau) which contains restricted hexadecimal characters and a string which is converted into hexadecimal. while ((@carac) = $sel-fetchrow_array) { push(@hexa_tableau, $carac[0]); } Since you are not using $carac[1] through $carac[$#carac] there is no need to capture them as well. Just use a list with a single scalar: while ( my ( $carac ) = $sel-fetchrow_array ) { push @hexa_tableau, $carac; } my (@chars) = unpack(H* = $line); my (@hexed) = map {$_} @chars; That is the same as saying: my @hexed = @chars; What were you intending to use the map() for? my $hexed = uc join(' ' = @hexed); I just wanted to check that any of the hexadecimal code present in array (@hexa_tableau) appears in the string($hexed) For example if my @hexa_tableau = (00,01,02); and $hexed = '6100636465414353206664736161647361640A' You are using join() with a space character but there are no spaces in that string. Does @hexed only contain one element? if (check the presence here) { print exists; } my $string = pack 'H*', $hexed; my $pattern = '[' . join( '', map chr hex, @hexa_tableau ) . ']'; if ( $string =~ /$pattern/ ) { print exists\n; } John -- The programmer is fighting against the two most destructive forces in the universe: entropy and human stupidity. -- Damian Conway -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
On map
I have a curiosity maybe someone here can help with. This code: @a=(1,2); map { $_ = 3 } @a; print join(,, @a), \n; ... prints 3,3. That map is changing the @a array as it goes through it. Good. Now this: %a=(1,2); map { $_ = 3 } keys %a; print join(,, keys(%a)), \n; I expected this to print 3, but it doesn't -- it prints 1. If map sets $_ as an alias to the value, why isn't it changing the keys? Thanks! - Bryan -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: On map
On 11/10/09 Tue Nov 10, 2009 1:22 PM, Bryan R Harris bryan_r_har...@raytheon.com scribbled: I have a curiosity maybe someone here can help with. This code: @a=(1,2); map { $_ = 3 } @a; print join(,, @a), \n; ... prints 3,3. That map is changing the @a array as it goes through it. Good. Now this: %a=(1,2); map { $_ = 3 } keys %a; print join(,, keys(%a)), \n; I expected this to print 3, but it doesn't -- it prints 1. If map sets $_ as an alias to the value, why isn't it changing the keys? Because the keys() subroutine generates a list of keys that are not aliases to the keys in the hash. It is not a good idea to change the values of keys in a hash, because finding the (key,value) pair for a specific key depends upon a key value being located in a specific location (bucket) within the hash. If you change the key value in place, the hash access algorithm will no longer be able to find the (key,value) pair. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: On map
BRH == Bryan R Harris bryan_r_har...@raytheon.com writes: BRH I have a curiosity maybe someone here can help with. BRH This code: BRH@a=(1,2); BRHmap { $_ = 3 } @a; BRHprint join(,, @a), \n; BRH ... prints 3,3. That map is changing the @a array as it goes through it. BRH Good. bad. as jim points out, map aliases $_ to its input values. it does this more for efficiency (it doesn't need to copy the values) than for convenience to the coder. note that foreach modifier does the same thing (and also for loops). this aliasing can be useful as in: s/X/Y/g for @values ; but side effects done in map are a bad thing since map's purpose is to generate a new list from its values. it has the aliasing left in for speed and for compatibility with the other loops that alias. BRH Now this: BRH%a=(1,2); BRHmap { $_ = 3 } keys %a; BRHprint join(,, keys(%a)), \n; BRH I expected this to print 3, but it doesn't -- it prints 1. BRH If map sets $_ as an alias to the value, why isn't it changing BRH the keys? why would you expect that? hash keys are not variables, values or anything but fixed strings. they are effectively readonly. it would make no sense to be able to modify them as that is really doing this: $new_key = mung( $old_key ) ; $hash{ $new_key } = delete $hash{ $old_key } that just moves the value from one key to another. if you set all the keys to 3 then you would end up with only one key/value which is 3 and its value would be a random value of all the hashes values (based on iterator ordering). now on the other hand aliasing the values of a hash is very cool. use the for modifier and this is very nice: s/foo/bar/ for values %hash ; we just edited all the values with that s/// op. you can do anything to $_ in there and that will change the value entries in the hash. uri -- Uri Guttman -- u...@stemsystems.com http://www.sysarch.com -- - Perl Code Review , Architecture, Development, Training, Support -- - Gourmet Hot Cocoa Mix http://bestfriendscocoa.com - -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
use module on perl -e
Hello John list, We could use a module in one-liner perl command as: perl -MPOSIX -e ''; But if I want to use something like: use POSIX qw/strftime/; how to do it with perl command line? Thanks. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Perl require and PERL5OPT
Hi, I am trying to use a perl script like the following : main.pl: require abc.pl; print In parent script\n; abc.pl: print In abc.pl script\n; I have some perl options set in PERL5OPT, which work fine for main.pl but not for abc.pl. This is how my Perl5OPT looks: PERL5OPT=-Ilib_path_to_Devel::Cover -MDevel::Cover=-db,cover_db,+ignore,.*5\.10\.0.*,-merge,on,-coverage,statement,-silent,on I have compared the output of perl -V($^X -V) in both these cases, and they are exactly identical. I am not sure what is missing here. I don't have a control over changing the require to use, as it is prodduct code written by someone else, and I am only trying to run them with the PERL5OPT set by me. Really appreciate any help in this regard. Thanks Regards, Vaishnavi. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: use module on perl -e
Orchid Fairy (兰花仙子) wrote: Hello John list, We could use a module in one-liner perl command as: perl -MPOSIX -e ''; But if I want to use something like: use POSIX qw/strftime/; how to do it with perl command line? Thanks. perl -MPOSIX=strftime -e 'print strftime(%H:%M:%S\n,localtime)' -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. I like Perl; it's the only language where you can bless your thingy. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/