Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread A. Pagaltzis
* Dave Rolsky [EMAIL PROTECTED] [2004-07-14 03:25]:
  Ah, so you reinvented DateTime::Format::Duration.
 
 Actually, I think he reinvented Time::Seconds, which is part of
 the Time::Piece distro.

Well, both, I guess. Goes to show how many, *many* people have
written this sort of thing before in various forms and shapes.

Regards,
-- 
Aristotle
If you can't laugh at yourself, you don't take life seriously enough.


Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread Bruno Negrão
 
  Actually, I think he reinvented Time::Seconds, which is part of
  the Time::Piece distro.
No guys, Time::Seconds doesn't give the same answer my module does. Time::Seconds 
converts seconds entirely in minutes or hours or
days or etc. For example, it says that 7341 seconds are:
2,03916 hours
122,35 minutes
0,08 days
etc.

I really Think i should extend Time::Seconds instead of publishing a new module, but i 
couldn´t contatc the authors of that module.

Time::Duration addresses the same calculation i'm doing, but the way it gives the 
answer is not that good.

i think i'gonna talk with DateTime group a litle bit, they are more inserted in this 
context.

thank you all,
bruno negrao.



Re: New Module: Time::Seconds::GroupedBy

2004-07-14 Thread Bruno Negrão
 
 Enh, sorta.  Most of the work of Time::Duration is figuring out how to 
 whittle down a multiple-units expression of a time to a particular degree 
 of concision.   It also doesn't have the concept of month.  You're 
 probably better off just starting over, since stuff like $mins 
 =  int($secs) / 60; $secs -= $mins * 60; is neither complex nor error-prone.
Hi Sean,  i coudn't get what you mean (it is too colloquial for my poor english 
understanding...)
What do you mean? Should i keep on this project or give it up?

bruno.




Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread Mark Stosberg
On Tue, Jul 13, 2004 at 09:01:30PM -0300, Bruno Negr?o wrote:

  Ah, so you reinvented DateTime::Format::Duration.
  
  use DateTime::Format::Duration;
  my $fmt = DateTime::Format::Duration-new(
  pattern = '%H hours, %M minutes, %S seconds',
  normalize = 1,
  );
  print $fmt-format_duration_from_deltas(
  seconds = 7341,
  ), \n;
  
 Oh, what a sadness. Indeed i never saw the DateTime project before.
 But still my module is far easier to use than DateTime::Format::Duration.
 Do you believe it is worth to publish it in Time::Seconds::GroupBy?

I would rather see more standardization on the use of the DateTime
project, in much the same way that people think of DBI when they think
of accessing databases through Perl.

In this case, perhaps some clear documentation and examples (just like
the one above) would be the best solution. I think if such a solution
was easy to find on Google and clearly documented, people would use it,
especially once there is more awareness of DateTime as a comprehensive
date  time solution for Perl.

Mark

--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread Bruno Negrão
 I would rather see more standardization on the use of the DateTime
 project, in much the same way that people think of DBI when they think
 of accessing databases through Perl.

 In this case, perhaps some clear documentation and examples (just like
 the one above) would be the best solution. I think if such a solution
 was easy to find on Google and clearly documented, people would use it,
 especially once there is more awareness of DateTime as a comprehensive
 date  time solution for Perl.
I agree Mark, i've posted my module on the DateTime mailing list. Let's see what they 
say about it.

But i think the DateTime project is not gaining fair promotion once their modules are 
not even appearing on the main Module List
in the cpan's site at http://www.cpan.org/modules/00modlist.long.html.

If people should concentrate effort in making this framework the solution for Dates 
and times related problems, the DateTime
namespace should at least appear on the Module List, right?

Regards,
bruno.



Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Mark Stosberg
On Wed, Jul 14, 2004 at 01:24:43PM -0300, Bruno Negr?o wrote:

 I agree Mark, i've posted my module on the DateTime mailing list. Let's see what 
 they say about it.
 
 But i think the DateTime project is not gaining fair promotion once their modules 
 are not even appearing on the main Module List
 in the cpan's site at http://www.cpan.org/modules/00modlist.long.html.
 
 If people should concentrate effort in making this framework the solution for Dates 
 and times related problems, the DateTime
 namespace should at least appear on the Module List, right?

I think there is a separate more general issue that the module list is
losing relevance. I think a lot of people (like myself), use
http://search.cpan.org as a primary method for finding useful modules.
As a CPAN user, I don't consult the list when looking for modules. As 
a module writer, I have abandoned caring if my modules appear on the
list, because I have the perception it's not used much anymore.

So I would say a more important issue is that the DateTime modules don't
show up in the first 100 results for Date on that website:

http://search.cpan.org/search?m=allq=dates=1n=100

I think part of the solution to fix that is to have more contributions to the
CPAN ratings system, and consider the ratings in the search results. 

Mark

--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


Re: Perl's Sacrifice Stone

2004-07-14 Thread khemir nadim

Andrew savige [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 khemir nadim wrote:
  I'd love to review the second module that is offered for sacrifice ;-)
  Anything to offer?

 How about Apache::MVC? It was posted for review on Simon's code review
 ladder mailing list in February but didn't get any response AFAICT.
 What has been harming this list, IMHO, is selfish people asking for their
 module to be reviewed -- and then disappearing without reviewing someone
 else's module in return. Maybe there should be a convention that you
 should first review someone else's before posting yours (?).
This is a good idea but to do that you must have a list of modules to pick
from and to have
your module added to the llist you to have tested a module that should come
from the list.

:-) it's a good idea anyhow.

About Apache::MVC, I'm unfortunately the wrong guy for that one. It would
take me ages
to get anything having something to do with web to work (yeah I'm that
good). I'm more on
system module side. But I'll find something to grind soon, promised.


 BTW, I think the code review ladder mailing list is exactly what you are
 looking for. Unless you are prepared to create an account and start by
 posting on PerlMonks Discussion seeking feedback re your Perl Sacrifice
 Stone idea, I don't think Perl Monks will work for you. There is nothing
 stopping you posting a review of your favourite module/s on Perl Monks --
 doing that may give you more credibility at that site, so your ideas may
 then be better received.

 /-\


 Find local movie times and trailers on Yahoo! Movies.
 http://au.movies.yahoo.com




Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Simon Cozens
[EMAIL PROTECTED] (Mark Stosberg) writes:
 I think part of the solution to fix that is to have more contributions to the
 CPAN ratings system, and consider the ratings in the search results. 

The searching in search.cpan.org is, unfortunately, pretty awful. At some
point I plan to sit down and try using Plucene as a search engine for
module data.

This would, of course, be easier if the search.cpan.org code was more
widely available. *cough*

-- 
Death Damned electrons get into everything.
Death I found them in my BUTTERDISH just the other day.


Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Bruno Negrão
 I think there is a separate more general issue that the module list is
 losing relevance. I think a lot of people (like myself), use
 http://search.cpan.org as a primary method for finding useful modules.
 As a CPAN user, I don't consult the list when looking for modules. As
 a module writer, I have abandoned caring if my modules appear on the
 list, because I have the perception it's not used much anymore.
Since i heard this from you, i have always had the idea that the modules in
the Module List were the mainstream modules and we should consider them
first than the other ones in search.cpan.org.

Hmm, i think everybody who is new to perl think the same way i was thinking
and it takes a long time to realize that the Module List is not the main
source for the modules, or the main inspiration source for namespaces.

No, i really endorse that the most important(or popular) modules and/or
namespaces shall appear in the Module List.

bruno.



Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Leon Brocard
Simon Cozens sent the following bits through the ether:

 The searching in search.cpan.org is, unfortunately, pretty awful. At some
 point I plan to sit down and try using Plucene as a search engine for
 module data.

I thought that would be a good idea too, so I tried it. It works
*fairly* well.

  http://search.cpan.org/dist/CPAN-IndexPod/

Leon
-- 
Leon Brocard.http://www.astray.com/
scribot.http://www.scribot.com/

... Stupid is a boundless concept.


Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread Dave Rolsky
On Wed, 14 Jul 2004, Bruno Negrão wrote:

  I would rather see more standardization on the use of the DateTime
  project, in much the same way that people think of DBI when they think
  of accessing databases through Perl.
 
  In this case, perhaps some clear documentation and examples (just like
  the one above) would be the best solution. I think if such a solution
  was easy to find on Google and clearly documented, people would use it,
  especially once there is more awareness of DateTime as a comprehensive
  date  time solution for Perl.
 I agree Mark, i've posted my module on the DateTime mailing list. Let's see what 
 they say about it.

 But i think the DateTime project is not gaining fair promotion once
 their modules are not even appearing on the main Module List in the
 cpan's site at http://www.cpan.org/modules/00modlist.long.html.

Some of them are, but not all.  Frankly, I don't think most people really
look at the list much, nor do most people consider it authoritative.
What'd help more would be some articles on the project.  I've been wanting
to write one for a while, but I'm always short on time.

 If people should concentrate effort in making this framework the
 solution for Dates and times related problems, the DateTime namespace
 should at least appear on the Module List, right?

Some of them _are_ registered, but that document you're referring to
hasn't been regenerated since 2002/08/27!  I wish the CPAN folks would
just remove it if it won't be generated regularly.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Fergal Daly
On Wed, Jul 14, 2004 at 06:08:16PM +0100, Leon Brocard wrote:
 Simon Cozens sent the following bits through the ether:
 
  The searching in search.cpan.org is, unfortunately, pretty awful. At some
  point I plan to sit down and try using Plucene as a search engine for
  module data.
 
 I thought that would be a good idea too, so I tried it. It works
 *fairly* well.
 
   http://search.cpan.org/dist/CPAN-IndexPod/

Does META.yaml have a place for keyowrds? It would be nice if it did and if
search.cpan.org indexed it. That would mean that it would be no longer
necessary to name your module along the lines of

XML::HTTP::Network::Daemon::TextProcessing::Business::Papersize::GIS

so that people can find it,

F



Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread A. Pagaltzis
* Dave Rolsky [EMAIL PROTECTED] [2004-07-14 19:26]:
 Some of them _are_ registered, but that document you're
 referring to hasn't been regenerated since 2002/08/27!  I wish
 the CPAN folks would just remove it if it won't be generated
 regularly.

Does anyone else here think that the list should probably just be
done away with entirely?

Regards,
-- 
Aristotle
If you can't laugh at yourself, you don't take life seriously enough.


Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread Dave Rolsky
On Wed, 14 Jul 2004, A. Pagaltzis wrote:

 * Dave Rolsky [EMAIL PROTECTED] [2004-07-14 19:26]:
  Some of them _are_ registered, but that document you're
  referring to hasn't been regenerated since 2002/08/27!  I wish
  the CPAN folks would just remove it if it won't be generated
  regularly.

 Does anyone else here think that the list should probably just be
 done away with entirely?

Given the fact that most authors seem to not register their stuff, the
[EMAIL PROTECTED] list is slow as heck, and that the web pages never get
regenerated, yes.


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/


Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Simon Cozens
[EMAIL PROTECTED] (Scott W Gifford) writes:
 It would be interesting to calculate the importance of a module by
 how many other modules link to it, either via a use statement or by
 reference in the POD, much like Google does with Web page links.

Someone's already done this for CPAN, but I can't find it at the moment.

 There's a project called Nutch that has abstracted out much of
 PageRank and that sort of thing that would be useful, if anybody is
 interested.

Algorithm::PageRank has also abstracted out much of PageRank... :)

-- 
Oh dear. I've just realised that my fvwm config lasted longer than my
marriage, in that case.
- Anonymous


Re: new module: Time::Seconds::GroupedBy

2004-07-14 Thread A. Pagaltzis
* Dave Rolsky [EMAIL PROTECTED] [2004-07-14 19:41]:
  Does anyone else here think that the list should probably
  just be done away with entirely?
 
 Given the fact that most authors seem to not register their
 stuff, the [EMAIL PROTECTED] list is slow as heck, and that the
 web pages never get regenerated, yes.

Now the question is: how would we make that happen?

Regards,
-- 
Aristotle
If you can't laugh at yourself, you don't take life seriously enough.


Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread A. Pagaltzis
* Scott W Gifford [EMAIL PROTECTED] [2004-07-14 19:38]:
 It would be interesting to calculate the importance of a
 module by how many other modules link to it, either via a use
 statement or by reference in the POD, much like Google does
 with Web page links.

I was thinking the same thing, and I remember that someone
actually posted results from working code for something like that
a while back. I don't have the time to dig through the archives
right now, or I'd shake it up.

 There's a project called Nutch that has abstracted out much of
 PageRank and that sort of thing that would be useful, if
 anybody is interested.  Nutch is written in Java, unfortunately
 for we Perl folks, but isn't too hard to work with.  :)

And as Lucene/Plucene show, it doesn't have to be difficult to
reimplement good libraries in Perl, either. :-)

Regards,
-- 
Aristotle
If you can't laugh at yourself, you don't take life seriously enough.


META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread Randy W. Sims
Fergal Daly wrote:
Does META.yaml have a place for keyowrds?
The spec doesn't currently provide for keywords. I do think it would be 
a good idea, BUT I think it needs to be done in a way to avoid abuse. 
I'd hate to see META.yml files grow by the kb as authors add every 
conceivable keyword they can think of and try to manipulate the search. 
As limiting and as clumsy as it seems, I think that if keywords are 
added then it should be from a limited set of keywords, i.e. more of a 
classification scheme, really, where modules can appear under multiple 
classifications.

Randy.


Re: META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread Matthew Sachs
On Jul 14, 2004, at 12:11, Randy W. Sims wrote:
Fergal Daly wrote:
Does META.yaml have a place for keyowrds?
As limiting and as clumsy as it seems, I think that if keywords are 
added then it should be from a limited set of keywords, i.e. more of a 
classification scheme, really, where modules can appear under multiple 
classifications.
Keywords are necessarily specific to the domain of the module, so I 
don't think that any global entity can designate an appropriate fixed 
set.  For instance, my module Net::OSCAR implements the protocol used 
by AOL Instant Messenger, so I'd give it keywords [OSCAR, AIM, 
IM, AOL Instant Messenger, instant messenger, instant 
messaging, chat].


Re: META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread darren chamberlain
* Randy W. Sims ml-perl at thepierianspring.org [2004/07/14 15:11]:
 Fergal Daly wrote:
 
 Does META.yaml have a place for keyowrds?
 
 The spec doesn't currently provide for keywords.

Is anyone generating META.yaml files by hand?  I thought they were all
generated (and regenerated) by Module::Build/MakeMaker?  How would that
work in the case of keywords?

(darren)

-- 
I interpret advertising as damage and route around it.


pgpDGGRIRi0Bz.pgp
Description: PGP signature


Re: META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread Mark Stosberg
On Wed, Jul 14, 2004 at 03:11:11PM -0400, Randy W. Sims wrote:
 Fergal Daly wrote:
 
 Does META.yaml have a place for keyowrds?
 
 The spec doesn't currently provide for keywords. I do think it would be 
 a good idea, BUT I think it needs to be done in a way to avoid abuse. 
 I'd hate to see META.yml files grow by the kb as authors add every 
 conceivable keyword they can think of and try to manipulate the search. 

The search algorithm could pay attention to the first X keywords and
ignore the rest. Or at least, it could heavily weight the first few.

I think this is part of how search engines prevent the same kind of
above of the meta-tag keyword system. You can put as many keywords as
you want into the list, but I think the search engines only really care
about the first few.

I would prefer something like this over the choosing from the fix list
idea.

Having free-form tags is a feature I like on: http://del.icio.us/
It allows new classifications to spontaneously appear.

Mark

--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


Re: META.yml keywords

2004-07-14 Thread Randy W. Sims
Matthew Sachs wrote:
On Jul 14, 2004, at 12:11, Randy W. Sims wrote:
Fergal Daly wrote:
Does META.yaml have a place for keyowrds?

As limiting and as clumsy as it seems, I think that if keywords are 
added then it should be from a limited set of keywords, i.e. more of a 
classification scheme, really, where modules can appear under multiple 
classifications.

Keywords are necessarily specific to the domain of the module, so I 
don't think that any global entity can designate an appropriate fixed 
set.  For instance, my module Net::OSCAR implements the protocol used by 
AOL Instant Messenger, so I'd give it keywords [OSCAR, AIM, IM, 
AOL Instant Messenger, instant messenger, instant messaging, chat].
Classification for a module would probably be something like:
Net :: Protocol
Communications :: Chat :: AOL Instant Messenger
(That last comes from sf.net's topic system)
With the classification above AND a good one line synopsis of the module 
(which is already part of META.yml) most, if not all, of your keywords 
are covered.

Randy.


Re: META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread Scott W Gifford
Mark Stosberg [EMAIL PROTECTED] writes:

[...]

 The search algorithm could pay attention to the first X keywords and
 ignore the rest. Or at least, it could heavily weight the first few.
 
 I think this is part of how search engines prevent the same kind of
 above of the meta-tag keyword system. You can put as many keywords as
 you want into the list, but I think the search engines only really care
 about the first few.

My understanding is that nowadays, most search engines ignore keywords
altogether, because they were so badly abused they became worthless.

ScottG.


New module: Regexp::Trie

2004-07-14 Thread David Landgren
Hello,
I gave a talk at the French Perl Workshop in June about some work I was 
doing to produce really large (i.e. length($re)  5) regular 
expressions for Postfix access maps. (Postfix can be compiled with the 
PCRE library). A number of people expressed interest in the approach and 
wondered if and when it would be available as a module on CPAN.

The idea is that sometimes you have a large set of regular expressions 
(e.g. 2000), and you want to test whether a string matches any of them. 
You don't particularly care *which* expression matches, the fact that 
one matches is sufficient. Brute forcing it with a loop is not very 
efficient. Concatenating them with | is not efficient either, if a large 
subset of the expressions share a common prefix.

I know about Jarkko's Regex::PreSuf and another module whose name 
escapes me this instant, but they both suffer from the limitation of not 
being metacharacter-aware. For instance:

use Regex::PreSuf;
print presuf(qw(a\d+foo a\D+123));
produces:
a\\(?:D\+123|d\+foo)
The module I'm developing works with variable length tokens, and thus 
deals with the above correctly:

use Regexp::Trie;
my $rt = Regexp::Trie-new;
$rt-add( qw/a \\d+ f o o/ );
$rt-add( qw/a \\D+ 1 2 3/ );
print $rt-re;
produces:
a(?:\D+123|\d+foo)
(modulo me getting the backslashes escaped correctly here -- the 
algorithm does the right thing).

The above example contains (IMO) too much make-work code, so I'm 
planning on distributing a number of helper packages as well, which will 
take care of the general cases. I'm thinking of something like

my $rt = Regexp::Trie-new(
Regexp::Trie::Lex::simple(qw( a\\d+foo a \\D+123 ))
);
print $rt-re;
I.e., the Regexp::Trie::Lex::* namespace, whose packages simply have to 
return an object that contains an 'add' method. The Regexp::Trie::new 
constructor will simply take the returned object and call its 'add' 
method until exhaustion, and fill up its own internal structure.

As another example, I have a set of regexps that should never contain a 
bare . (dot) metachar. To do so is an error. Writing a seperate lexer 
package for this allows such error-checking to take place.

I spoke with Nicholas Clark about the item in the latest perltodo, 
specifically:

quote
=head2 common suffices/prefices in regexps (trie optimization)
Currently, the user has to optimize Cfoo|far and Cfoo|goo into
Cf(?:oo|ar) and C[fg]oo by hand; this could be done automatically.
/quote
This is apparently a non-trivial undertaking to do in core and he 
suggested I pursue the release of this module regardless (I'm targetting 
5.005_03 as a baseline anyway).

If I haven't put you to sleep by now, I have the following questions:
1. Has this been done before (i.e. shoot me now and put me out of my 
misery).

2. Is Regexp::Trie a good name? (I fall into the regexp is spelt with a 
p camp, but if Regex is preferred that's fine by me. I can never 
remember which, if either, is deprecated).

3. Is the lexer namespace a good idea? Or is there a better way do to 
this? I'm open to any design suggestions on this issue since nothing is 
written yet.

Thanks for reading this far,
David Landgren


Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Tim Bunce
On Wed, Jul 14, 2004 at 06:30:59PM +0100, Fergal Daly wrote:
 On Wed, Jul 14, 2004 at 06:08:16PM +0100, Leon Brocard wrote:
  Simon Cozens sent the following bits through the ether:
  
   The searching in search.cpan.org is, unfortunately, pretty awful. At some
   point I plan to sit down and try using Plucene as a search engine for
   module data.
  
  I thought that would be a good idea too, so I tried it. It works
  *fairly* well.
  
http://search.cpan.org/dist/CPAN-IndexPod/
 
 Does META.yaml have a place for keyowrds? It would be nice if it did and if
 search.cpan.org indexed it. That would mean that it would be no longer
 necessary to name your module along the lines of
 
 XML::HTTP::Network::Daemon::TextProcessing::Business::Papersize::GIS
 
 so that people can find it,

That's what the Description field is for.

Tim.


Re: New module: Regexp::Trie

2004-07-14 Thread Scott W Gifford
Sounds like cool stuff!

David Landgren [EMAIL PROTECTED] writes:

[...]

 2. Is Regexp::Trie a good name? (I fall into the regexp is spelt with
 a p camp, but if Regex is preferred that's fine by me. I can never
 remember which, if either, is deprecated).

There are 688 modules with Regex in their name, and 582 with Regexp,
so I don't think a clear consensus has emerged.  :)

I personally think ::Trie is a bit technical; a user who needs this
module isn't likely to search for 'Trie', or find this by browsing
among the other 582 modules with Regexp in their name.  I would think
Regexp::Optimize{,r,d} would be better names, but that's just MHO.

 3. Is the lexer namespace a good idea? Or is there a better way do to
 this? I'm open to any design suggestions on this issue since nothing
 is written yet.

shrug It depends on whether you foresee a large number of Lex
modules being required, or updating the Lex portions of your module
independently of the Regexp::Trie portions.  If either of these seem
likely, it's a good idea; otherwise it's just an implementation
decision.

-ScottG.


Re: New module: Regexp::Trie

2004-07-14 Thread Randy W. Sims
On 7/14/2004 5:29 PM, David Landgren wrote:
Hello,
I gave a talk at the French Perl Workshop in June about some work I was 
doing to produce really large (i.e. length($re)  5) regular 
expressions for Postfix access maps. (Postfix can be compiled with the 
PCRE library). A number of people expressed interest in the approach and 
wondered if and when it would be available as a module on CPAN.

The idea is that sometimes you have a large set of regular expressions 
(e.g. 2000), and you want to test whether a string matches any of them. 
You don't particularly care *which* expression matches, the fact that 
one matches is sufficient. Brute forcing it with a loop is not very 
efficient. Concatenating them with | is not efficient either, if a large 
subset of the expressions share a common prefix.

I know about Jarkko's Regex::PreSuf and another module whose name 
escapes me this instant, but they both suffer from the limitation of not 
being metacharacter-aware. For instance:

use Regex::PreSuf;
print presuf(qw(a\d+foo a\D+123));
produces:
a\\(?:D\+123|d\+foo)
The module I'm developing works with variable length tokens, and thus 
deals with the above correctly:

use Regexp::Trie;
my $rt = Regexp::Trie-new;
$rt-add( qw/a \\d+ f o o/ );
$rt-add( qw/a \\D+ 1 2 3/ );
print $rt-re;
produces:
a(?:\D+123|\d+foo)
(modulo me getting the backslashes escaped correctly here -- the 
algorithm does the right thing).

The above example contains (IMO) too much make-work code, so I'm 
planning on distributing a number of helper packages as well, which will 
take care of the general cases. I'm thinking of something like

my $rt = Regexp::Trie-new(
Regexp::Trie::Lex::simple(qw( a\\d+foo a \\D+123 ))
);
print $rt-re;
I.e., the Regexp::Trie::Lex::* namespace, whose packages simply have to 
return an object that contains an 'add' method. The Regexp::Trie::new 
constructor will simply take the returned object and call its 'add' 
method until exhaustion, and fill up its own internal structure.

As another example, I have a set of regexps that should never contain a 
bare . (dot) metachar. To do so is an error. Writing a seperate lexer 
package for this allows such error-checking to take place.

I spoke with Nicholas Clark about the item in the latest perltodo, 
specifically:

quote
=head2 common suffices/prefices in regexps (trie optimization)
Currently, the user has to optimize Cfoo|far and Cfoo|goo into
Cf(?:oo|ar) and C[fg]oo by hand; this could be done automatically.
/quote
This is apparently a non-trivial undertaking to do in core and he 
suggested I pursue the release of this module regardless (I'm targetting 
5.005_03 as a baseline anyway).

If I haven't put you to sleep by now, I have the following questions:
1. Has this been done before (i.e. shoot me now and put me out of my 
misery).
I haven't seen anything, but I haven't really looked either. I do see it 
as being a usefull module though.

2. Is Regexp::Trie a good name? (I fall into the regexp is spelt with a 
p camp, but if Regex is preferred that's fine by me. I can never 
remember which, if either, is deprecated).
Definately in the Regexp::* namespace. Trie?
3. Is the lexer namespace a good idea? Or is there a better way do to 
this? I'm open to any design suggestions on this issue since nothing is 
written yet.
What about Japhy's new Regexp::Parser ?
Thanks for reading this far,
David Landgren




Future of the Module List

2004-07-14 Thread Tim Bunce
On Wed, Jul 14, 2004 at 12:40:03PM -0500, Dave Rolsky wrote:
 On Wed, 14 Jul 2004, A. Pagaltzis wrote:
 
  * Dave Rolsky [EMAIL PROTECTED] [2004-07-14 19:26]:
   Some of them _are_ registered, but that document you're
   referring to hasn't been regenerated since 2002/08/27!  I wish
   the CPAN folks would just remove it if it won't be generated
   regularly.
 
  Does anyone else here think that the list should probably just be
  done away with entirely?

The _file_ should go, yes. The concept of registering modules is different.

 Given the fact that most authors seem to not register their stuff, the
 [EMAIL PROTECTED] list is slow as heck, and that the web pages never get
 regenerated, yes.

Those are all fixable. Volunteers?

The real issues are bigger and deeper. I've appended a couple of emails.

Tim.


On Mon, Feb 16, 2004 at 10:37:12AM +1300, Sam Vilain wrote:
 On Mon, 16 Feb 2004 01:32, Tim Bunce wrote;
 
I'd like to see a summary of what those needs of the community
are.  (Maybe I missed it as I've not been following as closely as
I'd have liked. In which case a link to an archived summary would
be great.)
It's very important to be clear about what the problems actually
are.
 
 I don't really want to argue this side of things, I think that the
 problems pretty much speak for themselves.  But I hate unspoken
 consensus, so let me suggest a few from my perspective; this applies
 to the combined Perl 5 modules list / using search.cpan.org:

I'll play devils advocate here and point out some alternative remedies
for the problems. By doing so I'm _not_ trying to detract for your
suggestion, which I like, I'm just trying to show how existing mechanisms
could be improved incrementally.

   a) searching for modules for a particular task takes a long time and
  unless you get your key words right, you might not find it at
  all.  Refer the recent Mail::SendEasy thread.

Calls for a richer set of categories and cross-links of some kind.
(Editorial content alone is basically just more words to a search engine.)

   b) it is very difficult to find good reviews weighing the pros and
  cons of similar modules; they exist, but are scattered.
 
  CPAN ratings was a nice idea, but has too many First Post!
  style reviews to be useful in its current form IMHO.

Argues for moderation of reviews and a minimum review length.
A was this review helpful mechanism would also help to bring
better reviews to the top.  Also the search.cpan.org should not
just show the overall rating, it should show the underlying three
individual ratings (docs, interface, ease of use).

   c) it is nearly impossible to tell which modules are the wisest
  choices from a community size point of view; using modules that
  are more likely to fall out of maintenance is easy to do.

Argues for more stats. I think useful *relative* download stats
could be extracted from a sample of CPAN sites. Also search.cpan.org
could provide relative page-*view* stats for modules.

   d) some great modules are not registered (I am referring of course
  to such masterpieces as Pixie, Heritable::Types, Maptastic :),
  Spiffy, Autodia, Want ... and those are just the ones missing
  in my bag of tricks)

Argues for fixing the registration process.


Originally the Module List had two goals:
 1: to help people find perl modules for a particular task.
 2: to provide a second-tier of modules above the 'anarchy' of 
people uploading half-baked ideas with half-baked names.
 
 Honourable goals, which it solved adequately for a period of time, and
 full credit where it is due.
 
 But now let's look at where we are.  We've got masses of modules,
 truckloads of categories and thousands of contributors.  This task
 cannot be left in the hands of a handful of hackers, no matter how
 much awe they inspire, they probably still have lives and day jobs ;)

The registration process can, and should, be automatic for any modules
for which no one objects. You'd apply and RT would automatically
register if no one commented on the application.


 I will maintain that the current format, or even simply adding some
 more fields to the database is *not* enough information to give
 uninformed people looking for a module the information to make an
 informed decision.

 It is my gut feeling that only editorial content, managed by people
 who are experts in the field, will truly perform this task - and that
 to gain maximum support, that it should be included in the content
 mirrored along with the rest of cpan.org.

I agree that comparative editorial reviews would be very valuable
for Goal 1 above. I wouldn't address Goal 2 effecively at all.


 I think we're mature enough as a community to be able to produce this
 content without it disolving into flamewars or being too one-sided.
 
 In particular, I really think that as little red tape should be
 applied to this system as possible.  Let's just set up a few 

Re: New module: Regexp::Trie

2004-07-14 Thread David Landgren
Randy W. Sims wrote:
[...]
3. Is the lexer namespace a good idea? Or is there a better way do to 
this? I'm open to any design suggestions on this issue since nothing 
is written yet.

What about Japhy's new Regexp::Parser ?
Hmm. Yes, I've know about it, and even downloaded it to play with it 
this weekend. But wrote tests instead :)

I'm not sure how fine-grained it is, and/or how much fine-grainedness I 
need. For instance, given '(?:foo|bar| (?:cat|dog))',  I don't care 
about isolating the (?:cat|dog) part. It's up to you to feed it the 
tokens at the granularity you want to deal with.

Thanks for the pointer.
David


Re: New module: Regexp::Trie

2004-07-14 Thread A. Pagaltzis
* David Landgren [EMAIL PROTECTED] [2004-07-14 23:30]:
 1. Has this been done before (i.e. shoot me now and put me out
 of my misery).

I haven't particularly looked for such a thing, but nor have I
heard of it. And I've been around mentions of ::PreSuf so often
that I think it reasonable to believe I would have seen a side
note about such a thing somewhere.

 2. Is Regexp::Trie a good name? (I fall into the regexp is
 spelt with a p camp, but if Regex is preferred that's fine by
 me. I can never remember which, if either, is deprecated).

The big name modules are all in Regexp::. AFAIK the story is
that noone really likes regexp, but it became the official
namespace long ago and now holds that position.

 3. Is the lexer namespace a good idea? Or is there a better way
 do to this? I'm open to any design suggestions on this issue
 since nothing is written yet.

I'm not sure a separately visible lexer is even necessary. Does
the regex syntax change that often? Is it useful to expose the
lexer output? I think the answers are no and maybe, so I can't
say.

Regards,
-- 
Aristotle
If you can't laugh at yourself, you don't take life seriously enough.


Re: Future of the Module List

2004-07-14 Thread Randy W. Sims
On 7/14/2004 5:51 PM, Tim Bunce wrote:
On Wed, Jul 14, 2004 at 12:40:03PM -0500, Dave Rolsky wrote:
On Wed, 14 Jul 2004, A. Pagaltzis wrote:

* Dave Rolsky [EMAIL PROTECTED] [2004-07-14 19:26]:
Some of them _are_ registered, but that document you're
referring to hasn't been regenerated since 2002/08/27!  I wish
the CPAN folks would just remove it if it won't be generated
regularly.
Does anyone else here think that the list should probably just be
done away with entirely?

The _file_ should go, yes. The concept of registering modules is different.

Given the fact that most authors seem to not register their stuff, the
[EMAIL PROTECTED] list is slow as heck, and that the web pages never get
regenerated, yes.

Those are all fixable. Volunteers?
The real issues are bigger and deeper. I've appended a couple of emails.
Tim.
On Mon, Feb 16, 2004 at 10:37:12AM +1300, Sam Vilain wrote:
On Mon, 16 Feb 2004 01:32, Tim Bunce wrote;
  I'd like to see a summary of what those needs of the community
  are.  (Maybe I missed it as I've not been following as closely as
  I'd have liked. In which case a link to an archived summary would
  be great.)
  It's very important to be clear about what the problems actually
  are.
I don't really want to argue this side of things, I think that the
problems pretty much speak for themselves.  But I hate unspoken
consensus, so let me suggest a few from my perspective; this applies
to the combined Perl 5 modules list / using search.cpan.org:

I'll play devils advocate here and point out some alternative remedies
for the problems. By doing so I'm _not_ trying to detract for your
suggestion, which I like, I'm just trying to show how existing mechanisms
could be improved incrementally.

 a) searching for modules for a particular task takes a long time and
unless you get your key words right, you might not find it at
all.  Refer the recent Mail::SendEasy thread.

Calls for a richer set of categories and cross-links of some kind.
(Editorial content alone is basically just more words to a search engine.)
Are we talking about the same thing: perl.module-authors:2601 ?
 b) it is very difficult to find good reviews weighing the pros and
cons of similar modules; they exist, but are scattered.
CPAN ratings was a nice idea, but has too many First Post!
style reviews to be useful in its current form IMHO.

Argues for moderation of reviews and a minimum review length.
A was this review helpful mechanism would also help to bring
better reviews to the top.  Also the search.cpan.org should not
just show the overall rating, it should show the underlying three
individual ratings (docs, interface, ease of use).
This is definately a trouble area. Not long ago I was exploring the 
cpanratings site and discovered the unhelpful rampage by one 
particular reviewer http://cpanratings.perl.org/a/181. Maybe breaking 
the reviews into catagories would be helpful? Rate: installation, 
interface, robustness, overall, etc.

 c) it is nearly impossible to tell which modules are the wisest
choices from a community size point of view; using modules that
are more likely to fall out of maintenance is easy to do.

Argues for more stats. I think useful *relative* download stats
could be extracted from a sample of CPAN sites. Also search.cpan.org
could provide relative page-*view* stats for modules.
Narrow the interface for CPAN such that all viewing takes place on a 
single server where it can be monitored, and all download requests are 
distributed to mirror sites (ala sf.net).

As for the best of the best, I still believe there is a lot of merrit in 
the list built from dependencies idea.

 d) some great modules are not registered (I am referring of course
to such masterpieces as Pixie, Heritable::Types, Maptastic :),
Spiffy, Autodia, Want ... and those are just the ones missing
in my bag of tricks)

Argues for fixing the registration process.

This is why I am mailing you to ask: what's going on?  Why is such
an outdated module list being published in an authoritative location,
and where can I get an up-to-date list?

Module List *document* was maintained by hand.  When managment of
the Module List *data* was automated there was a desire to automate
maintainance of the document but the document had a slightly richer
structure than the data. That small hurdle meant automation never
happened and the document was left unmaintained.
Around the same time search.cpan.org became functional so the
document had less relevance and busy people had other things to do.
I'll happily conceed that the *document* isn't important these days.
But I feel strongly that the *principle* (of moderated naming and
categorization) is.
The main pieces currently missing are:
1. Automated handling of module registration. [Where has that got to?]
2. Better integration of registration data into search.cpan.org
   So registration details are includes in search results, for example.
3. A 'fast path' process to 

Re: Finding prior art Perl modules (was: new module: Time::Seconds::GroupedBy)

2004-07-14 Thread Fergal Daly
On Wed, Jul 14, 2004 at 10:34:08PM +0100, Tim Bunce wrote:
 On Wed, Jul 14, 2004 at 06:30:59PM +0100, Fergal Daly wrote:
  XML::HTTP::Network::Daemon::TextProcessing::Business::Papersize::GIS
  
  so that people can find it,
 
 That's what the Description field is for.

There's a Description field? I accept responsibility for not knowing about
this, I've never made an effort to see what is available. However, if
search.cpan.org had allowed me to search by Description field I probably
would have included one in all of my modules,

F



META.yml keywords (was: Re: Finding prior art Perl modules)

2004-07-14 Thread Randy W. Sims
Fergal Daly wrote:
Does META.yaml have a place for keyowrds?
The spec doesn't currently provide for keywords. I do think it would be 
a good idea, BUT I think it needs to be done in a way to avoid abuse. 
I'd hate to see META.yml files grow by the kb as authors add every 
conceivable keyword they can think of and try to manipulate the search. 
As limiting and as clumsy as it seems, I think that if keywords are 
added then it should be from a limited set of keywords, i.e. more of a 
classification scheme, really, where modules can appear under multiple 
classifications.

Randy.


Re: New module: Regexp::Trie

2004-07-14 Thread Jeff 'japhy' Pinyan
On Jul 14, David Landgren said:

Randy W. Sims wrote:

 What about Japhy's new Regexp::Parser ?

Hmm. Yes, I've know about it, and even downloaded it to play with it
this weekend. But wrote tests instead :)

I uploaded v0.10 the other day.  It's got a much better hierarchy system
(read: one that works).  You might be able to just make your module a
subclass that returns the optimized form on -visual or -qr.

-- 
Jeff japhy Pinyan %  How can we ever be the sold short or
RPI Acacia Brother #734 %  the cheated, we who for every service
http://japhy.perlmonk.org/  %  have long ago been overpaid?
http://www.perlmonks.org/   %-- Meister Eckhart



Re: Future of the Module List

2004-07-14 Thread Dave Rolsky
On Wed, 14 Jul 2004, Randy W. Sims wrote:

 As for the best of the best, I still believe there is a lot of merrit in
 the list built from dependencies idea.

Only in some areas.  For example, the top templating modules are probably,
TT, HTML::Template,  Mason.  How many modules depend on any of those?
Darn few, because they are kind of at the end of the module food chain.

OTOH, many modules in the DateTime suite depend on DateTime.pm.  This
doesn't make it best of breed (though I think it is for other reasons ;)


-dave

/*===
House Absolute Consulting
www.houseabsolute.com
===*/