Re: Name advice: check license of dependencies
On Tue, 1 Nov 2005, Chris Dolan wrote: Thanks for the feedback Mark and Sam. I chose Module::License::Report and posted my implementation to CPAN this afternoon. Bravo and thanks. Further feedback would be VERY welcome, as the module uses a few sketchy heuristics to guess at the license. To mitigate the uncertainty, I made it report a confidence number so the user can set a sketchiness threshold. The confidence is just a guess, but it's better than nothing. I'm thinking the next version should use multiple algorithms to guess the license and report higher confidence if they all agree. That sounds even cooler. Perhaps a flag or a different function names could be used to get results as one license with one confidence value or multiple licenses per component with each having a seperate confidence. Each would have its place. On a side note, I discovered that using Data::Dumper on my output object causes memory use to go through the roof. I think Data::Dumper is chasing into the CPANPLUS data structures and thrashing my machine. Is anyone familiar enough with CPANPLUS internals to know whether Data::Dumper problems are well known, or if I've stumbled on some new bug? Data::Dumper is only good for looking at small chunks of stuff. Its very very very inefficient and there have been cases where Data::Dumper failed to produce something that could be eval'd back in once upon a time and eval'ing the result is inefficient too. Storable for the win here. Storable does everything Data::Dumper does poorly well and oopsy we don't care about presenting it visually. So use Storable for storing. Data::Dumper is just for glancing. -- /chris John Lundin once shaped the electrons thusly: Ah. Okay, on the nice to do list. (Which in practice translates to won't do unless it becomes inconvenient or is fixed upstream.)
RE: Name advice: check license of dependencies
Title: RE: Name advice: check license of dependencies On a side note, I discovered that using Data::Dumper on my output object causes memory use to go through the roof. I think Data::Dumper is chasing into the CPANPLUS data structures and thrashing my machine. Is anyone familiar enough with CPANPLUS internals to know whether Data::Dumper problems are well known, or if I've stumbled on some new bug? Assuming you are on Win32 then yes this is definitely a well known bug. The main problem is that under normal Win32 builds perl uses the OS'es malloc/realloc which doesn't seem to be smart enough to just expand the previously allocated buffer when possible. This means that every time DD appends part of the data structure it has to copy the entire existing structure. A second problem is that DD needs to catalog every single SV that it encounters in order to detect reference cycles, if there are many SV's involved this can be a lot of metadata. Its worth noting that on Win32 many times setting the $Data::Dumper::Useqq=1; (or in later versions the $Data::Dumper::Useperl=1;) will force DD not to use the XS implementation. It seems the pureperl code doesn't suffer from this performance degradation as badly so often a dump that will overflow your available memory in XS will finish in a reasonable time in Pureperl. Another option is to try using Data::Dump::Streamer instead. DDS takes longer to dump on average but never degrades like DD does as it doesn't build its output in memory before outputting unless specifically asked to do so. The fact its easier to read and much more accurate and correct than DD is another reason to consider it. (It can dump closures properly, including enclosed vars!) BTW, there is a last case where DD has real problems. It relates to pseudo hashes and a rather insidious bug: my @hash_list=({foo=[]}); my $x=$hash_list{foo}; This will cause perl to use the address of [] as the index in the @hash_list to do the pseudo hash lookup on. Which can result in the array being extended to a huge size. Its possible that perl will be ok with this, but when DD goes to build an in memory string with several million undefs in it it gets really unhappy for obvious reasons. DDS otoh doesn't suffer from this problem as several million undefs in an array are emitted as a list constructor like (undef) x $count, so while the dump will take a long time, the memory usage will be low and the program will terminate without exhausing available ram. Yves
Re: Name advice: check license of dependencies
Thanks for the feedback Mark and Sam. I chose Module::License::Report and posted my implementation to CPAN this afternoon. Further feedback would be VERY welcome, as the module uses a few sketchy heuristics to guess at the license. To mitigate the uncertainty, I made it report a confidence number so the user can set a sketchiness threshold. The confidence is just a guess, but it's better than nothing. I'm thinking the next version should use multiple algorithms to guess the license and report higher confidence if they all agree. On a side note, I discovered that using Data::Dumper on my output object causes memory use to go through the roof. I think Data::Dumper is chasing into the CPANPLUS data structures and thrashing my machine. Is anyone familiar enough with CPANPLUS internals to know whether Data::Dumper problems are well known, or if I've stumbled on some new bug? Thanks all, Chris On Oct 31, 2005, at 2:51 PM, Sam Vilain wrote: On Mon, 2005-10-31 at 11:20 -0500, Mark Stosberg wrote: I don't like any of the names I've come up with so far. It seems clear that it should be in the Module:: namespace, but beyond that I'm unsure. Possibilities: Module::GuessLicense Module::License Module::LicenseChain Module::DistributionRights From your description, this is much as about a module's dependencies as it as a about a specific module. So I'll suggest: Module::Depends::LicenseReport Including Report signifies that the module has a read-only purpose. On the other hand, a license might be seen to implicitly imply dependency on all of the things that it depends on (esp. if any are GPL). I think your point is very valid, and perhaps the logic for figuring a module's dependencies recursively should be available independently from the logic to show the license for an individual module. So, perhaps Module::Depends and Module::License Module::License::(Report|Chain|...) Sam. -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/)
Name advice: check license of dependencies
I'm toying with starting a new module and would like some naming advice. My module will accept the name of another module and, using CPAN metadata and/or package contents, determine the license of that module's package and the license of all non-core packages that it in turn depends on. This module would be useful for determining redistribution rights for aggregations of code, like PAR files. It will probably employ CPANPLUS, YAML, Module::Depends, Module::Corelist and a bunch of heuristics to make its determination. For example, my module CAM::PDF is Artistic/GPL but it depends on Text::PDF which is just Artistic. This new module would help me to discover that fact. I don't like any of the names I've come up with so far. It seems clear that it should be in the Module:: namespace, but beyond that I'm unsure. Possibilities: Module::GuessLicense Module::License Module::LicenseChain Module::DistributionRights Thanks, Chris -- Chris Dolan, Software Developer, Clotho Advanced Media Inc. 608-294-7900, fax 294-7025, 1435 E Main St, Madison WI 53703 Clotho Advanced Media, Inc. - Creators of MediaLandscape Software (http://www.media-landscape.com/) and partners in the revolutionary Croquet project (http://www.opencroquet.org/)
Re: Name advice: check license of dependencies
On Mon, Oct 31, 2005 at 10:08:53AM -0600, Chris Dolan wrote: I'm toying with starting a new module and would like some naming advice. My module will accept the name of another module and, using CPAN metadata and/or package contents, determine the license of that module's package and the license of all non-core packages that it in turn depends on. This module would be useful for determining redistribution rights for aggregations of code, like PAR files. It will probably employ CPANPLUS, YAML, Module::Depends, Module::Corelist and a bunch of heuristics to make its determination. For example, my module CAM::PDF is Artistic/GPL but it depends on Text::PDF which is just Artistic. This new module would help me to discover that fact. I don't like any of the names I've come up with so far. It seems clear that it should be in the Module:: namespace, but beyond that I'm unsure. Possibilities: Module::GuessLicense Module::License Module::LicenseChain Module::DistributionRights From your description, this is much as about a module's dependencies as it as a about a specific module. So I'll suggest: Module::Depends::LicenseReport Including Report signifies that the module has a read-only purpose. Mark
Re: Name advice: check license of dependencies
On Mon, 2005-10-31 at 11:20 -0500, Mark Stosberg wrote: I don't like any of the names I've come up with so far. It seems clear that it should be in the Module:: namespace, but beyond that I'm unsure. Possibilities: Module::GuessLicense Module::License Module::LicenseChain Module::DistributionRights From your description, this is much as about a module's dependencies as it as a about a specific module. So I'll suggest: Module::Depends::LicenseReport Including Report signifies that the module has a read-only purpose. On the other hand, a license might be seen to implicitly imply dependency on all of the things that it depends on (esp. if any are GPL). I think your point is very valid, and perhaps the logic for figuring a module's dependencies recursively should be available independently from the logic to show the license for an individual module. So, perhaps Module::Depends and Module::License Module::License::(Report|Chain|...) Sam.