Re: Name advice: check license of dependencies

2005-11-02 Thread Christopher Hicks

On Tue, 1 Nov 2005, Chris Dolan wrote:
Thanks for the feedback Mark and Sam.  I chose Module::License::Report 
and posted my implementation to CPAN this afternoon.


Bravo and thanks.

Further feedback would be VERY welcome, as the module uses a few sketchy 
heuristics to guess at the license.  To mitigate the uncertainty, I made 
it report a confidence number so the user can set a sketchiness 
threshold.  The confidence is just a guess, but it's better than 
nothing.  I'm thinking the next version should use multiple algorithms 
to guess the license and report higher confidence if they all agree.


That sounds even cooler.  Perhaps a flag or a different function names 
could be used to get results as one license with one confidence value or 
multiple licenses per component with each having a seperate confidence. 
Each would have its place.


On a side note, I discovered that using Data::Dumper on my output object 
causes memory use to go through the roof.  I think Data::Dumper is 
chasing into the CPANPLUS data structures and thrashing my machine.  Is 
anyone familiar enough with CPANPLUS internals to know whether 
Data::Dumper problems are well known, or if I've stumbled on some new 
bug?


Data::Dumper is only good for looking at small chunks of stuff.  Its very 
very very inefficient and there have been cases where Data::Dumper failed 
to produce something that could be eval'd back in once upon a time and 
eval'ing the result is inefficient too.  Storable for the win here. 
Storable does everything Data::Dumper does poorly well and oopsy we don't 
care about presenting it visually.  So use Storable for storing. 
Data::Dumper is just for glancing.


--
/chris

John Lundin once shaped the electrons thusly:

Ah. Okay, on the nice to do list. (Which in practice translates to
won't do unless it becomes inconvenient or is fixed upstream.)


RE: Name advice: check license of dependencies

2005-11-02 Thread Orton, Yves
Title: RE: Name advice: check license of dependencies





 On a side note, I discovered that using Data::Dumper on my output 
 object causes memory use to go through the roof. I think 
 Data::Dumper is chasing into the CPANPLUS data structures and 
 thrashing my machine. Is anyone familiar enough with CPANPLUS 
 internals to know whether Data::Dumper problems are well 
 known, or if I've stumbled on some new bug?


Assuming you are on Win32 then yes this is definitely a well known bug. The main problem is that under normal Win32 builds perl uses the OS'es malloc/realloc which doesn't seem to be smart enough to just expand the previously allocated buffer when possible. This means that every time DD appends part of the data structure it has to copy the entire existing structure. A second problem is that DD needs to catalog every single SV that it encounters in order to detect reference cycles, if there are many SV's involved this can be a lot of metadata. 

Its worth noting that on Win32 many times setting the $Data::Dumper::Useqq=1; (or in later versions the $Data::Dumper::Useperl=1;) will force DD not to use the XS implementation. It seems the pureperl code doesn't suffer from this performance degradation as badly so often a dump that will overflow your available memory in XS will finish in a reasonable time in Pureperl.

Another option is to try using Data::Dump::Streamer instead. DDS takes longer to dump on average but never degrades like DD does as it doesn't build its output in memory before outputting unless specifically asked to do so. The fact its easier to read and much more accurate and correct than DD is another reason to consider it. (It can dump closures properly, including enclosed vars!)

BTW, there is a last case where DD has real problems. It relates to pseudo hashes and a rather insidious bug:


my @hash_list=({foo=[]});
my $x=$hash_list{foo};


This will cause perl to use the address of [] as the index in the @hash_list to do the pseudo hash lookup on. Which can result in the array being extended to a huge size. Its possible that perl will be ok with this, but when DD goes to build an in memory string with several million undefs in it it gets really unhappy for obvious reasons. DDS otoh doesn't suffer from this problem as several million undefs in an array are emitted as a list constructor like (undef) x $count, so while the dump will take a long time, the memory usage will be low and the program will terminate without exhausing available ram.

Yves





Re: Name advice: check license of dependencies

2005-11-01 Thread Chris Dolan
Thanks for the feedback Mark and Sam.  I chose  
Module::License::Report and posted my implementation to CPAN this  
afternoon.


Further feedback would be VERY welcome, as the module uses a few  
sketchy heuristics to guess at the license.  To mitigate the  
uncertainty, I made it report a confidence number so the user can set  
a sketchiness threshold.  The confidence is just a guess, but it's  
better than nothing.  I'm thinking the next version should use  
multiple algorithms to guess the license and report higher confidence  
if they all agree.


On a side note, I discovered that using Data::Dumper on my output  
object causes memory use to go through the roof.  I think  
Data::Dumper is chasing into the CPANPLUS data structures and  
thrashing my machine.  Is anyone familiar enough with CPANPLUS  
internals to know whether Data::Dumper problems are well known, or if  
I've stumbled on some new bug?


Thanks all,
Chris

On Oct 31, 2005, at 2:51 PM, Sam Vilain wrote:


On Mon, 2005-10-31 at 11:20 -0500, Mark Stosberg wrote:


I don't like any of the names I've come up with so far.  It seems
clear that it should be in the Module:: namespace, but beyond that
I'm unsure.  Possibilities:
   Module::GuessLicense
   Module::License
   Module::LicenseChain
   Module::DistributionRights

 From your description, this is much as about a module's  
dependencies as

it as a about a specific module. So I'll suggest:

Module::Depends::LicenseReport

Including Report signifies that the module has a read-only  
purpose.




On the other hand, a license might be seen to implicitly imply
dependency on all of the things that it depends on (esp. if any are
GPL).

I think your point is very valid, and perhaps the logic for figuring a
module's dependencies recursively should be available independently  
from

the logic to show the license for an individual module.

So, perhaps

   Module::Depends

and

   Module::License
   Module::License::(Report|Chain|...)

Sam.



--
Chris Dolan, Software Developer, Clotho Advanced Media Inc.
608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703

Clotho Advanced Media, Inc. - Creators of MediaLandscape Software  
(http://www.media-landscape.com/) and partners in the revolutionary  
Croquet project (http://www.opencroquet.org/)




Name advice: check license of dependencies

2005-10-31 Thread Chris Dolan
I'm toying with starting a new module and would like some naming  
advice.  My module will accept the name of another module and, using  
CPAN metadata and/or package contents, determine the license of that  
module's package and the license of all non-core packages that it in  
turn depends on.  This module would be useful for determining  
redistribution rights for aggregations of code, like PAR files.  It  
will probably employ CPANPLUS, YAML, Module::Depends,  
Module::Corelist and a bunch of heuristics to make its determination.


For example, my module CAM::PDF is Artistic/GPL but it depends on  
Text::PDF which is just Artistic.  This new module would help me to  
discover that fact.


I don't like any of the names I've come up with so far.  It seems  
clear that it should be in the Module:: namespace, but beyond that  
I'm unsure.  Possibilities:

   Module::GuessLicense
   Module::License
   Module::LicenseChain
   Module::DistributionRights

Thanks,
Chris
--
Chris Dolan, Software Developer, Clotho Advanced Media Inc.
608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703

Clotho Advanced Media, Inc. - Creators of MediaLandscape Software  
(http://www.media-landscape.com/) and partners in the revolutionary  
Croquet project (http://www.opencroquet.org/)




Re: Name advice: check license of dependencies

2005-10-31 Thread Mark Stosberg
On Mon, Oct 31, 2005 at 10:08:53AM -0600, Chris Dolan wrote:
 I'm toying with starting a new module and would like some naming  
 advice.  My module will accept the name of another module and, using  
 CPAN metadata and/or package contents, determine the license of that  
 module's package and the license of all non-core packages that it in  
 turn depends on.  This module would be useful for determining  
 redistribution rights for aggregations of code, like PAR files.  It  
 will probably employ CPANPLUS, YAML, Module::Depends,  
 Module::Corelist and a bunch of heuristics to make its determination.
 
 For example, my module CAM::PDF is Artistic/GPL but it depends on  
 Text::PDF which is just Artistic.  This new module would help me to  
 discover that fact.
 
 I don't like any of the names I've come up with so far.  It seems  
 clear that it should be in the Module:: namespace, but beyond that  
 I'm unsure.  Possibilities:
Module::GuessLicense
Module::License
Module::LicenseChain
Module::DistributionRights

 From your description, this is much as about a module's dependencies as
it as a about a specific module. So I'll suggest: 

Module::Depends::LicenseReport

Including Report signifies that the module has a read-only purpose. 

Mark


Re: Name advice: check license of dependencies

2005-10-31 Thread Sam Vilain
On Mon, 2005-10-31 at 11:20 -0500, Mark Stosberg wrote:
  I don't like any of the names I've come up with so far.  It seems  
  clear that it should be in the Module:: namespace, but beyond that  
  I'm unsure.  Possibilities:
 Module::GuessLicense
 Module::License
 Module::LicenseChain
 Module::DistributionRights
  From your description, this is much as about a module's dependencies as
 it as a about a specific module. So I'll suggest: 

 Module::Depends::LicenseReport

 Including Report signifies that the module has a read-only purpose. 

On the other hand, a license might be seen to implicitly imply
dependency on all of the things that it depends on (esp. if any are
GPL).

I think your point is very valid, and perhaps the logic for figuring a
module's dependencies recursively should be available independently from
the logic to show the license for an individual module.

So, perhaps

   Module::Depends

and

   Module::License
   Module::License::(Report|Chain|...)

Sam.