Re: toy list processing problem: collect similar terms

2010-09-26 Thread Dr.Ruud

On 2010-09-26 06:05, Xah Lee wrote:


I have a list of lists, where each sublist is labelled by
a number. I need to collect together the contents of all sublists
sharing the same label. So if I have the list

((0 a b) (1 c d) (2 e f) (3 g h) (1 i j) (2 k l) (4 m n) (2 o p) (4 q r) (5 s 
t))

where the first element of each sublist is the label, I need to
produce:

output:
((a b) (c d i j) (e f k l o p) (g h) (m n q r) (s t))


The input is a string on STDIN,
and the output is a string on STDOUT?


Use a hash:

perl -MData::Dumper -wle '$Data::Dumper::Sortkeys = 1;
  my $t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

  push @{ $h{ $1 } }, $2 while $t =~ /(\w+)([^)]*)/g;  # gist

  print Dumper \%h;
'

or an array:

perl -wle '
  my $t = "((0 a b) (1 c d) (2 e f) (3 g h) (1 i j)"
. " (2 k l) (4 m n) (2 o p) (4 q r) (5 s t))";

  push @{$a[$1]},$2 while $t =~ /(\w+)\s+([^)]*)/g; # gist.1
  print "((".join(") (",map join(" ",@$_),@a )."))";  # gist.2
'


Or if the list is not just a string, but a real data structure in the 
script:


perl -wle'
  my $t = [ [qw/0 a b/], [qw/1 c d/], [qw/2 e f/], [qw/3 g h/],
[qw/1 i j/], [qw/2 k l/], [qw/4 m n/], [qw/2 o p/],
[qw/4 q r/], [qw/5 s t/] ];

  push @{ $a[ $_->[0] ] }, [ @$_[ 1, 2 ] ] for @$t;  # AoAoA

  printf "((%s))\n", join ") (",
   map join( " ",
 map join( " ", @$_ ), @$_
   ), @a;
'

Etc.

--
Ruud

--
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression negate a word (not character)

2008-02-01 Thread Dr.Ruud
Greg Bacon schreef:
> Dr.Ruud:

>> I negated the test, to make the regex simpler: [...]
>
> Yes, your approach is simpler. I assumed from the "need it all
> in one pattern" constraint that the OP is feeding the regular
> expression to some other program that is looking for matches.

Yes, I assumed about the same, but thought it would be a nice
alternative anyways.
Happy Perling!

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regular expression negate a word (not character)

2008-01-28 Thread Dr.Ruud
Greg Bacon schreef:

> #! /usr/bin/perl
> 
> use warnings;
> use strict;
> 
> use constant {
>   MATCH=> 1,
>   NO_MATCH => 0,
> };
> 
> my @tests = (
>   [ "winter tire",=> MATCH ],
>   [ "tire",   => MATCH ],
>   [ "retire", => MATCH ],
>   [ "tired",  => MATCH ],
>   [ "snowbird tire",  => MATCH ],
>   [ "tired on a snow day",=> MATCH ],
>   [ "snow tire and regular tire", => MATCH ],
>   [ " tire"   => MATCH ],
>   [ "snow tire"   => NO_MATCH ],
>   [ "snow   tire" => NO_MATCH ],
>   [ "some snowtires"  => NO_MATCH ],
> );
> [...]

I negated the test, to make the regex simpler:

my $snow_tire = qr/
 snow [[:blank:]]* tire (?!.*tire)
/x;

my $fail;
for (@tests) {
  my($str,$want) = @$_;
  my $got = $str !~ /$snow_tire/;
  my $pass = !!$want == !!$got;

  print "$str: ", ($pass ? "PASS" : "FAIL"), "\n";

  ++$fail unless $pass;
}

print "\n", (!$fail ? "PASS" : "FAIL"), "\n";

__END__

-- 
Affijn, Ruud

"Gewoon is een tijger."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I am giving up perl because of assholes on clpm -- switching to Python

2007-08-11 Thread Dr.Ruud
RedGrittyBrick schreef:

> treasure the saints, tolerate the irritable and
> ignore the whiners.

*You are what you read.* What is "irritating" to some, is "to the point"
to others.

That should say enough, but some people just can not stand short
replies, they can not hold themselves back from reading all negative
kinds of things into them. Too little attention maybe? (I am just making
the same mistake to show how it works.)

Only rarely someone mistakes the "to the point" for the "irritating",
without acknowledging their mistake in the first or second next reply.
Newbies with the wrong focus, under pressure, probably suffering lack of
sleep? (I am just making things worse.)

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I am giving up perl because of assholes on clpm -- switching to Python

2007-08-11 Thread Dr.Ruud
grocery_stocker schreef:

> In the beginning there was Mathematics
> And all was good
> Then one day God said "Let there be the Lambda Calculus"
> And hence the Lambda Calculus was born.
> However, God felt the the Lambda Calculus needed a mate
> So god said "Let there be Lisp"
> And thus, Lisp was born.
> 
> As the years went on, god became depressed by how impure the Lisp had
> become.
> For from the Lisp, came Emacs Lisp, Java, Perl, Ruby, and Python.

http://xkcd.com/224/ 

-- 
Affijn, Ruud

"Gewoon is een tijger."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: I am giving up perl because of assholes on clpm -- switching to Python

2007-08-11 Thread Dr.Ruud
Paul Boddie schreef:

> let us
> avoid comp.lang.python becoming some kind of linux-kernel ego trip
> where anyone who has stuck around has an interest in perpetuating a
> hostile atmosphere.

"When did you stop beating your wife?" 

-- 
Affijn, Ruud

"Gewoon is een tijger."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Portable general timestamp format, not 2038-limited

2007-07-03 Thread Dr.Ruud
Peter J. Holzer schreef:

> Since a day with a leap second has 86401 seconds (or 86399, but that
> hasn't happened yet)

Many systems allow a seconds value of 0..61, so minutes (actually
months) with two leap seconds are foreseen.

A leap second may be introduced at the end of any month, the preferred
dates are at the end of June and the end of December.

At the estimated rate of decrease, the earth would lose about 1/2 day
after 4,000 years, and about two leap seconds a
month would be needed to keep UTC in step with Earth time, UT1.

(source:
http://www.allanstime.com/Publications/DWA/Science_Timekeeping/TheS
cienceOfTimekeeping.pdf>)

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what are the most frequently used functions?

2006-10-28 Thread Dr.Ruud
robert schreef:

> read more of the context and answer to the OP

That OP is invisible in most relevant contexts.

-- 
Affijn, Ruud

"Gewoon is een tijger."
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: A Sort Optimization Technique: decorate-sort-dedecorate

2006-08-28 Thread Dr.Ruud
Jim Gibson schreef:

> The problem addressed by what is know in Perl as the 'Schwartzian
> Transform' is that the compare operation can be an expensive one,
> regardless of the whether the comparison uses multiple keys. Since in
> comparison sorts, the compare operation will be executed N(logN)
> times, it is more efficient to pre-compute a set of keys, one for
> each object to be sorted. That need be done only N times. The sort
> can then use these pre-computed keys to sort the objects.

Basically it first builds, than sorts an index.

The pre-computed (multi-)keys can often be optimized, see Uri's
Sort::Maker http://search.cpan.org/search?query=Sort::Maker
for facilities.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is a type error?

2006-07-16 Thread Dr.Ruud
Chris F Clark schreef:

> If you have a fixed database, and you do two selects which specify the
> same sets of fields to be selected and the same keys to select records
> with, one expects the two selects to return the same values.

When your "fixed" means read-only, or (fully) locked, then yes.

Modern databases also have modes without locks, so without special
attention (like maybe your "fixed") the two selects do not necessarily
return the same set of records.


> Further, if you
> do an update, you expect certain fields of certain records to change
> (and be reflected in subsequent selects).

Doesn't your "fixed" block updates?


> However, therein lies the rub, if you do a select on some records, and
> then an update that changes those records, the records you have from
> the select have either changed or show outdated values.

Not necessarily outdated: values can be fixed in time for a purpose.
Example: calculating interest is done once per day, but the whole
process takes more than a day.

Some systems implemented a lock plus requery just before the update, to
check for unacceptable changes in the stored data; this to prevent
having to keep locks while waiting.


> If there is
> some way to refer to the records you first selected before the update,
> then you have an aliasing problem, but maybe one can't do that.  One
> could also have an aliasing problem, if one were allowed to do two
> updates simultaneously, so that one update could changed records in
> the middle of the other changing the records.

Some databases allow you to travel back in time: run this query on the
data of 1 year ago. All previous values are kept "behind" the current
value.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: languages with full unicode support

2006-07-01 Thread Dr.Ruud
Chris Uppal schreef:

> Since the interpretation of characters which are yet to be added to
> Unicode is undefined (will they be digits, "letters", operators,
> symbol, punctuation ?), there doesn't seem to be any sane way
> that a language could allow an unrestricted choice of Unicode in
> identifiers.

The Perl-code below prints:

xdigit
22 /194522 =  0.011%  (lower: 6, upper: 6)
ascii
   128 /194522 =  0.066%  (lower:26, upper:26)
\d
   268 /194522 =  0.138%
digit
   268 /194522 =  0.138%
IsNumber
   612 /194522 =  0.315%
alpha
 91183 /194522 = 46.875%  (lower:  1380, upper:  1160)
alnum
 91451 /194522 = 47.013%  (lower:  1380, upper:  1160)
word
 91801 /194522 = 47.193%  (lower:  1380, upper:  1160)
graph
102330 /194522 = 52.606%  (lower:  1380, upper:  1160)
print
102349 /194522 = 52.616%  (lower:  1380, upper:  1160)
blank
18 /194522 =  0.009%
space
24 /194522 =  0.012%
punct
   374 /194522 =  0.192%
cntrl
  6473 /194522 =  3.328%


Especially look at 'word', the same as \w, which for ASCII is
[0-9A-Za-z_].


==8<===
#!/usr/bin/perl
# Program-Id: unicount.pl
# Subject: show Unicode statistics

  use strict ;
  use warnings ;

  use Data::Alias ;

  binmode STDOUT, ':utf8' ;

  my @table =
  # +--Name--+---qRegexp+-C-+-L-+-U-+
  (
[ 'xdigit'   , qr/[[:xdigit:]]/ , 0 , 0 , 0 ] ,
[ 'ascii', qr/[[:ascii:]]/  , 0 , 0 , 0 ] ,
[ '\\d'  , qr/\d/   , 0 , 0 , 0 ] ,
[ 'digit', qr/[[:digit:]]/  , 0 , 0 , 0 ] ,
[ 'IsNumber' , qr/\p{IsNumber}/ , 0 , 0 , 0 ] ,
[ 'alpha', qr/[[:alpha:]]/  , 0 , 0 , 0 ] ,
[ 'alnum', qr/[[:alnum:]]/  , 0 , 0 , 0 ] ,
[ 'word' , qr/[[:word:]]/   , 0 , 0 , 0 ] ,
[ 'graph', qr/[[:graph:]]/  , 0 , 0 , 0 ] ,
[ 'print', qr/[[:print:]]/  , 0 , 0 , 0 ] ,
[ 'blank', qr/[[:blank:]]/  , 0 , 0 , 0 ] ,
[ 'space', qr/[[:space:]]/  , 0 , 0 , 0 ] ,
[ 'punct', qr/[[:punct:]]/  , 0 , 0 , 0 ] ,
[ 'cntrl', qr/[[:cntrl:]]/  , 0 , 0 , 0 ] ,
  ) ;

  my @codepoints =
  (
 0x ..  0xD7FF,
 0xE000 ..  0xFDCF,
 0xFDF0 ..  0xFFFD,
 0x1 .. 0x1FFFD,
 0x2 .. 0x2FFFD,
#0x3 .. 0x3FFFD, # etc.
  ) ;

  for my $row ( @table )
  {
alias my ($name, $qrx, $count, $lower, $upper) = @$row ;

printf "\n%s\n", $name ;

my $n = 0 ;

for ( @codepoints )
{
  local $_ = chr ;  # int-2-char conversion
  $n++ ;

  if ( /$qrx/ )
  {
$count++ ;
$lower++ if / [[:lower:]] /x ;
$upper++ if / [[:upper:]] /x ;
  }
}

my $show_lower_upper =
  ($lower || $upper)
  ? sprintf( "  (lower:%6d, upper:%6d)"
   , $lower
   , $upper
   )
  : '' ;

printf "%6d /%6d =%7.3f%%%s\n"
   , $count
   , $n
   , 100 * $count / $n
   , $show_lower_upper
  }
__END__

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-27 Thread Dr.Ruud
Chris Smith schreef:

> So it seems to me that we have this ideal point at which it is
> possible to write all correct or interesting programs, and impossible
> to write buggy programs.

I think that is a misconception. Even at the idealest point it will be
possible (and easy) to write buggy programs. Gödel!

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: What is Expressiveness in a Computer Language

2006-06-23 Thread Dr.Ruud
Marshall schreef:
> Rob Thorpe:

>> Can I make a type in C that can only have values between 1 and 10?
>> How about a variable that can only hold odd numbers, or, to make it
>> more difficult, say fibonacci numbers?
>
> Well, of course you can't in *C*; you can barely zip you pants with C.
> But I believe you can do the above in C++, can't you?

You can write self-modifying code in C, so I don't see how you can not
do that in C.
;)

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-23 Thread Dr.Ruud
Chris Smith schreef:

> Static types are not fuzzy

Static types can be fuzzy as well. For example: a language can define
that extra accuracy and bits may be used for the implementation of
calculations: d = a * b / c
Often some minimum is guaranteed.


> I see it as quite reasonable when there's an effort by several
> participants in this thread to either imply or say outright that
> static type systems and dynamic type systems are variations of
> something generally called a "type system", and given that static
> type systems are quite formally defined, that we'd want to see a
> formal definition for a dynamic type system before accepting the
> proposition that they are of a kind with each other.

The 'dynamic type' is just another type.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-23 Thread Dr.Ruud
Rob Thorpe schreef:

> I would suggest that at least assembly should be referred to as
> "untyped".

There are many different languages under the umbrella of "assembly", so
your suggestion is bound to be false.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-23 Thread Dr.Ruud
Marshall schreef:

> It seems we have languages:
> with or without static analysis
> with or without runtime type information (RTTI or "tags")
> with or without (runtime) safety
> with or without explicit type annotations
> with or without type inference
>
> Wow. And I don't think that's a complete list, either.

Right. And don't forget that some languages are hybrids in any of those
areas.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Typing (was: Re: What is Expressiveness in a Computer Language)

2006-06-22 Thread Dr.Ruud
Timo Stamm schreef:

> This is actually one of the most interesting threads I have read in a
> long time. If you ignore the evangelism, there is a lot if
> high-quality information and first-hand experience you couldn't find
> in a dozen books.

Much of what is talked about, is in these articles (and their links)
http://www.mindview.net/WebLog/log-0066
http://en.wikipedia.org/wiki/Dynamic_typing

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-21 Thread Dr.Ruud
Rob Thorpe schreef:
> Dr.Ruud:
>> Marshall:

>>> "dynamic types." I don't have a firm definition for
>>> that term, but my working model is runtime type tags. In which
>>> case, I would say that among statically typed languages,
>>> Java does have dynamic types, but C does not. C++ is
>>> somewhere in the middle.
>>
>> C has union.
>
> That's not the same thing.

That is your opinion. In the context of this discussion I don't see any
problem to put C's union under "dynamic types".


> The value of a union in C can be any of a
> set of specified types.  But the program cannot find out which, and
> the language doesn't know either.
>
> With C++ and Java dynamic types the program can test to find the type.

When such a test is needed for the program with the union, it has it.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-21 Thread Dr.Ruud
Marshall schreef:

> "dynamic types." I don't have a firm definition for
> that term, but my working model is runtime type tags. In which
> case, I would say that among statically typed languages,
> Java does have dynamic types, but C does not. C++ is
> somewhere in the middle.

C has union.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expressiveness in a Computer Language

2006-06-16 Thread Dr.Ruud
Torben Ægidius Mogensen schreef:

> Bugs that in dynamically typed languages would
> require testing to find are found by the compiler in a statically
> typed language.  So whil[e ]it may take [l]onger to get a program
that[ ]
> gets past the compiler, it takes less time to get a program that
works.

If it were that simple, I would say: compile time type inference is the
only way to go.

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A critic of Guido's blog on Python's lambda

2006-05-07 Thread Dr.Ruud
Paul Rubin schreef:

> a cryptographic PRNG seeded with good entropy is supposed to be
> computationally indistinguishable from physical randomness

Doesn't your "good entropy" include "physical randomness"?

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re:

2006-04-29 Thread Dr.Ruud
Tagore Smith schreef:


> [addressing John Bokma]
> your objection seems to be less about the
> crossposting, and more about the content.

Why do you think that?

-- 
Affijn, Ruud

"Gewoon is een tijger."


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-23 Thread Dr.Ruud
[EMAIL PROTECTED] schreef:

> The solution that would have the most utility would be one where the
> elements are generated one-by-one, loop-like, so that they can be used
> in the body of a loop, and to avoid the fact that even with exclusion
> the cardinality of the target set EX^n could be in the millions even
> with a full list of wc's, that is, a list containing at least one wc
> of every length in 2..(n-1). I don't know enough Lisp, Haskell or
> Qi/Prolog to know if the solutions so far can be modified to do this.
> The Python program is too slow for large sets.

Use a bitmapping, see also
  news:[EMAIL PROTECTED]

Detect the exclusions with a bitwise AND.

-- 
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-16 Thread Dr.Ruud
[EMAIL PROTECTED] schreef:

> There are many sites
> dedicated to reasonably objective comparisons between languages. Here
> are two examples:
> 
> http://www.smallscript.org/Language%20Comparison%20Chart.asp
> http://www.jvoegele.com/software/langcomp.html

  http://shootout.alioth.debian.org/ 

-- 
Affijn, Ruud

"Gewoon is een tijger."
echo 014C8A26C5DB87DBE85A93DBF |perl -pe 'tr/0-9A-F/JunkshoP cartel,/'
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Perl-Python-a-Day: split a file full path

2005-10-17 Thread Dr.Ruud
Xah Lee:

> In Perl, spliting a full path into parts is done like this:

And then follows Perl-code that only works with an optional .html
"extension",
which is similar to the code in the File::Basename description.
http://www.perl.com/doc/manual/html/lib/File/Basename.html


It is best practice to derive and store the normalized (or absolute)
path, because relative paths can get loose so will get loose.


Consider this:

  $myPath = './example/basename.ext';


and this:

  $myPath = './example/filename.1.23.45-beta';


and this:

  $myPath = 'x:.\example\basename.ext';


(some platforms have a wd per device)


-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Jargons of Info Tech industry

2005-10-09 Thread Dr.Ruud
Roedy Green:

> (Note that the most
> common spam is the Nigerian con and variants which comes as a
> non-formatted message.)

Don't think that that is true for everybody. For example not for people
that are behind central filters that already cope with common spam.

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Jargons of Info Tech industry

2005-10-09 Thread Dr.Ruud
Mike Meyer:

> Try qmail - it may solve the problem with a lot less work.

I checked my .procmailrc, and saw that mail with qmail anywhere in the
headers, goes to a spambox here.

;)

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Jargons of Info Tech industry

2005-10-09 Thread Dr.Ruud
Mike Meyer:
> Paul Rubin:

>> I read mail over an ssh connection to a Unix shell.  I have no easy
>> way to read html email with a graphics browser.
>
> You don't need a grahics browser - you just need a browser. I read
> mail in emacs, and use emacs-w3m to view html in the mailer. Works for
> most things, and doesn't have the nasty side effect of letting the
> sender know I read it by fetching images from their web site.
>
>> I occasionally get html email that I want to read.  I save it in a
>> file and read it with lynx, which so far works perfectly well.  I
>> find html email to be a PITA and as someone else said, html in email
>> is an almost sure sign that it's a message that I want to trash
>> without reading it.
>
> Unfortunately, I've found that HTML email comes in two flavors: That
> which sets content-type to text/html in the headers, and that which
> sets it to some form of multipart in the headers. I used to bounce all
> mail of either form. Then I discovered that the AOL client - used by
> my relatives - could *not* be set to not send HTML email. At least it
> sends text/plain as well. On investigation, most legit email does
> sends multipart/mixed, so I only reject mail whose sole content is
> text/html.

Let procmail make all those decisions and transformations for you.

I have a maildir called 'raw' where I keep a copy of all non-spammish
mail.

Copies of the same messages also get delivered in the right mailboxes,
by procmail.
A message that contains only html, is piped though lynx -dump -stdin.
A message containing both HTML and a plain/text-part, is de-mime-d,
leaving only the plain/text-part (unless that part contains only a silly
remark).
Footers and long signatures are limited or even deleted. Etc., etc. (I
like my mail cooked.)

One of the reasons that I started with Perl, is that I want to rewrite
procmail in Perl.

-- 
Affijn, Ruud

"Gewoon is een tijger."

-- 
http://mail.python.org/mailman/listinfo/python-list