I need a better name than File_Shred
Hi folks, Ive written a module which implements Eric Raymonds 'shred' program, which is pretty well described here. http://www.arstechnica.com/archive/news/1063140308.html File::Shred is my working name My version has a different focus, namely to find duplicate code chunks, write macros for them, and invoke those macros. So maybe a different name is appropriate. File::Macrofy File::Macroize::C a Perl version would create a string-eval equivalent: my $MACRO_NNN = q{ } eval $MACRO_NNN; given its basis in MD5, it cannot find even trivial differences in 2 chunks, FWIW - Ive applied it to bleadperl source code, and am getting what look like reasonable results, though I havent tried compiling yet. (I will b4 the .01 release) Ok, I did, it breaks, have to avoid chunks with unbalanced #ifdefs.. anyway, [EMAIL PROTECTED] bleadperl]$ ls *.c.new |wc 35 35 415 [EMAIL PROTECTED] bleadperl]$ more *.c.macros |wc 29879814 68087 [EMAIL PROTECTED] bleadperl]$ grep MACRO *.c.macros |wc 2821128 14004 [EMAIL PROTECTED] bleadperl]$ grep MACRO *.c.new |wc 428 856 15234
Re: Author's namespace
* at 14/11 10:25 + Fergal Daly said: But what about code that is shared by several CPAN modules but which I don't consider to be worth getting up to standard for general use. It's not that the code is trash, it's fine I just can't see anyone else wanting to use it, even if it was fully documented. I suppose I'll just have to upload Class::OhGodNotAnotherMethodMaker, I really don't see the value of adding this sort of thing to CPAN. If code's going to go on CPAN as it's own distribution then I think it should be properly documented and so on. If a distribution needs a module then either the module should be released to CPAN as a proper distribution or the module should be included as part of the relevant distribution. And if you're including the code in several CPAN modules then shouldn't the code be up to standard for general use? Just because you can't see anyone wanting to use it doesn't mean it shouldn't be documented. Anyone using one of those CPAN modules shouldn't have to ferret around in source code to realise what your convience methods are there for. cheers Struan
RE: Author's namespace
Title: RE: Author's namespace * at 14/11 10:25 + Fergal Daly said: But what about code that is shared by several CPAN modules but which I don't consider to be worth getting up to standard for general use. It's not that the code is trash, it's fine I just can't see anyone else wanting to use it, even if it was fully documented. I suppose I'll just have to upload Class::OhGodNotAnotherMethodMaker, I really don't see the value of adding this sort of thing to CPAN. If code's going to go on CPAN as it's own distribution then I think it should be properly documented and so on. If a distribution needs a module then either the module should be released to CPAN as a proper distribution or the module should be included as part of the relevant distribution. I though that CPAN historically carried stuff like this. Isnt that waht the scripts directory is? yves
Re: Author's namespace
Original Message: - From: Struan Donald [EMAIL PROTECTED] And if you're including the code in several CPAN modules then shouldn't the code be up to standard for general use? Just because you can't see anyone wanting to use it doesn't mean it shouldn't be documented. The code is fine, it's quite simple and doesn't really need docs, however I don't really want anyone else using it because then it becomes a responsibilty. There are plenty of similar modules contained within existing distributions. They are not polished, have no pod etc. They are only to be used from within the distribution itself and only need to be understood by people changing the distribution in question. I don't think this bothers people too much. My module is like these, it has previously shipped inside another distro, undocumented, unexposed. I want to use it with several other modules but I don't want to cut and paste. As it happens, it looks like the original Class::MethodMaker has an undocumented way to do what I want, so for this module it may not be an issue but everyone has their own file slurping routine and various other bits and bobs that they do their own way, rather than copying them into lots of modules, a personal namespace of utility stuff could be useful. Also somewhere to put things which are under review is also useful and seems to have been lost in the methodmaker discussion. Anyone using one of those CPAN modules shouldn't have to ferret around in source code to realise what your convience methods are there for. Ideally, anyone using one of my CPAN modules shouldn't have to ferret around in any of my code documented or not but if they are then chances are that documenting these particular bits would make no difference, F mail2web - Check your email from the web at http://mail2web.com/ .
Re: Author's namespace
* Fergal Daly [EMAIL PROTECTED] [2003-11-14 13:10]: But what about code that is shared by several CPAN modules but which I don't consider to be worth getting up to standard for general use. It's not that the code is trash, it's fine I just can't see anyone else wanting to use it, even if it was fully documented. I wasn't saying the code was trash - but a carelessly chosen name and no documentation do make it clutter.. How about putting the module under the *same* name in all your distributions that use it? This doesn't avoid duplication on CPAN, granted - but does avoid it on the user end. Instead of calling it Test::Deep:MM, Foo::Bar::MM, Baz::Quux::MM etc depending on the distro, just stick it in all distros under the same name, maybe something like Class::MyMethodMk. -- Regards, Aristotle If you can't laugh at yourself, you don't take life seriously enough.
Re: I need a better name than File_Shred
* Jim Cromie [EMAIL PROTECTED] [2003-11-14 09:35]: My version has a different focus, namely to find duplicate code chunks, write macros for them, and invoke those macros. So maybe a different name is appropriate. So it is aimed at processing C sources? Then File:: is the wrong TLNS for it, although off hand I'm at a loss about which one it should be in. I'm not sure Parse::C:: is fitting here? I think the fact that it compares shreds is to be ignored as an implementation detail for the name at least. The fact that it generates macros is important of course.. so is the fact that it does so for common code, though. That should probably be in the name somewhere. Parse::C::CommonToMacros? Awkward and not truly descriptive I think.. Hmm.. -- Regards, Aristotle If you can't laugh at yourself, you don't take life seriously enough.
Re: Author's namespace
From: A. Pagaltzis [EMAIL PROTECTED] I wasn't saying the code was trash - but a carelessly chosen name and no documentation do make it clutter.. I agree it's clutter that's why I'd like it not to be included when people search. The name is chosen for my convenience and mine only. As Mark mentioned in his mail, it's more of a personal style thing and as Mark mentioned it could be a bad idea as people start building repositories of their own secret modules rather than making the effort to release them properly. How about putting the module under the *same* name in all your distributions that use it? This doesn't avoid duplication on CPAN, granted - but does avoid it on the user end. Instead of calling it Test::Deep:MM, Foo::Bar::MM, Baz::Quux::MM etc depending on the distro, just stick it in all distros under the same name, maybe something like Class::MyMethodMk. But that will clash with Blah-Blah's Class::MyMethodMk. Anyway, I'm not too stressed about the whole thing. I'm more interested in a related Proposed:: namespace which came up in passing. As in Proposed::FDALY::Hey::ModuleAuthors::IsThis::A::GoodName::For::ThisModule F mail2web - Check your email from the web at http://mail2web.com/ .
Re: Author's namespace
The following was supposedly scribed by Mark Stosberg on Friday 14 November 2003 09:00 am: I think I have a similar concern. Here's my own case: I use a custom sub-class of CGI::Application that I base most of my web-applications on. Eventually, I would like to distribute some of these on CPAN, with several of them referring to the same custom sub-class itself. However, it don't think the sub-class module itself would be especially interesting to others-- it might-- but it mostly seems like a set of personal style choices about how I like to design web-applications. If it didn't go under an Authors:: namespace, it seems like it would get some other un-descriptive name like CGI::Application::MarksSubClass. If you are releasing a module which uses these functions, it seems that you have only a few choices. You could re-write your module to use only standard helper modules (not usually an appealing option, but you shouldn't rule it out.) You could release your helper module without full documentation, and just explain that it is a matter of coding style (e.g. none of the algorithms are really anything new and it just makes some default choices for you and calls functions from other modules.) You could fully-document the helper module (and maybe make it more configurable?) I like this one the best, and maybe others who work in the same manner could benefit from it. Do you think it is possible to boil-down the you-specific parts of your module into a config file in your home directory? It would be interesting to see how this would work. You could inline all of the helper module functions at the end of your regular module (maybe a dist target in your makefile can automate this for you.) --Eric
Algorithm::Shred
* On 14-Nov-2003 at 11:03AM PST, [EMAIL PROTECTED] said: And because 'shred' is open-source, and part of the Linux vs SCO drama, it serves as something of a touchstone - By understanding the algorithm, you know its advantages/disadvantages; fast but naive compared to parsing to an ASN. Good point. Algorithm::Shred? Its also applicable to any line-oriented text, not just programs, hence the File:: Again, Algorithm::Shred sounds more like it. Yes, by analogy with Algorithm::Diff, I think that makes a lot of sense... SDE
Re: Author's namespace
On Fri, Nov 14, 2003 at 01:33:01PM -0600, Eric Wilhelm wrote: I think I have a similar concern. Here's my own case: I use a custom sub-class of CGI::Application that I base most of my web-applications on. Eventually, I would like to distribute some of these on CPAN, with several of them referring to the same custom sub-class itself. However, it don't think the sub-class module itself would be especially interesting to others-- it might-- but it mostly seems like a set of personal style choices about how I like to design web-applications. If it didn't go under an Authors:: namespace, it seems like it would get some other un-descriptive name like CGI::Application::MarksSubClass. You could fully-document the helper module (and maybe make it more configurable?) I like this one the best, and maybe others who work in the same manner could benefit from it. Do you think it is possible to boil-down the you-specific parts of your module into a config file in your home directory? It would be interesting to see how this would work. You could inline all of the helper module functions at the end of your regular module (maybe a dist target in your makefile can automate this for you.) I think some other people would probably find some of my personalizations useful as well. I'm open to cleaning it up some as you suggest. Still that leaves the issue of naming it. It's still best described as a module for building CGI applications Mark's way. I could give it some generic name like CGI::Application::TurboCharge, but that seems to be of limited usefulness. What's a good way to name these kind of personalization modules? It's these kind of cases that make Authors:: begin to make sense. Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer [EMAIL PROTECTED] Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . .
Re: Author's namespace
The following was supposedly scribed by Mark Stosberg on Friday 14 November 2003 02:02 pm: Still that leaves the issue of naming it. It's still best described as a module for building CGI applications Mark's way. I could give it some generic name like CGI::Application::TurboCharge, but that seems to be of limited usefulness. If your way isn't the best way, there must be something wrong with it or your ego:) Maybe you should name it according to how you would describe your style, so that others with a similar style could find it more easily. CGI::Application::Terse ? As I said before, if it is mostly about wrapping a few functions into one and choosing some reasonable defaults with some options for over-riding these, I'd really like to see it be able to source a file out of the user's home directory (or a machine-wide /etc/ file) that would over-ride these defaults at compile time. I really don't think that it is appropriate for a distributed module to require something out of an Authors:: tree. If the helper functions are not something that others would want to use, it seems that this would make it more difficult to contribute to or subclass your front-end module. Isn't it possible to distribute it under the front-end module? For example, I'm currently working on CAD::Drawing, which will require CAD::Drawing::Manipulate, CAD::Drawing::Defined, and CAD::Drawing::IO to name a few. Everything below CAD::Drawing is rather helpless without it and it is helpless without them, so I had planned to pack them all into a CAD-Drawing-0.01.tar.gz and upload that (with the exception of the various CAD::Drawing::IO::backend modules to be installed as options.) If this setup is possible, then maybe your helper functions should be in CGI::YourModule::helpers and packed-up with it for distribution at least until they can find a home of their own. If you are duplicating the helpers multiple times (in each distributed module under totally separate namespaces) to do this, just note that in the sparse documentation and plan to do better tomorrow. --Eric
Re: Author's namespace
On Fri, Nov 14, 2003 at 02:19:44PM -0600, Eric Wilhelm wrote: The following was supposedly scribed by Mark Stosberg on Friday 14 November 2003 02:02 pm: Still that leaves the issue of naming it. It's still best described as a module for building CGI applications Mark's way. I could give it some generic name like CGI::Application::TurboCharge, but that seems to be of limited usefulness. If your way isn't the best way, there must be something wrong with it or your ego:) Maybe you should name it according to how you would describe your style, so that others with a similar style could find it more easily. CGI::Application::Terse ? This seems like the best option in my case. I'll start discussing specifics on the CGI::Application list and see where it goes. Thanks for the nudge. Isn't it possible to distribute it under the front-end module? This would be OK if there was only one front-end. The code is re-usable enough that is probably supporting a dozen or so projects already. I would hope that over time I would release at least two projects to CPAN that require it. Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark StosbergPrincipal Developer [EMAIL PROTECTED] Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . .
Re: I need a better name than File_Shred
A. Pagaltzis wrote: * Jim Cromie [EMAIL PROTECTED] [2003-11-14 17:43]: Its not 'particular' to C, except in reduce(), the last step, which acts on the detected redundancies. As I outlined, a Perl version could move chunks into strings, then eval that in the many places its needed. Then maybe it should be split into two modules, one making use of the other. Heh - that creates 2 naming problems :-OJoking aside, that might solve the problem of not fitting well in either domain - I just have to ponder that split, and what names make sense for the 2 halves. Me: And because 'shred' is open-source, and part of the Linux vs SCO drama, it serves as something of a touchstone - By understanding the algorithm, you know its advantages/disadvantages; fast but naive compared to parsing to an ASN. Good point. Algorithm::Shred? Hmm - that might solve other problems, such as - What is a File::Shred ? the wad of shreddings ? or the shredder that made them ?! I'll look at some of the other Algorithm::* stuff to if the object models there have a workable similarity. FWIW - In File_Shred::*, I have File_Shred::Shreddings, File_Shred::Chunk, File_Shred::Ribbon, and File_Shred::Comparison. Im not entirely happy with this partitioning, but its a start. Curiously, this shred and knit algorithm has some similarities with Gene Sequencing. There, they chop up DNA such that the 2 sides of the helix fragments are ragged - ie loose tails of 1 side of the helix dangle from both ends. Then they put it in a bath of nucleic acid, and those tails regrow; A-T, C-G. Now the soup has fragments which overlap with the fragments that were split off at either end. Then they send the fragments thru the sequencer, and knit the sequences together. (BTW - this has inaccuracies - IANAB) Its also applicable to any line-oriented text, not just programs, hence the File:: Again, Algorithm::Shred sounds more like it. The C-specific part would then be provided in another module which would have to be named independently, probably leaving C as the last part of the name so there is ::C, ::Perl etc. Yes - Id say it fits, and Schuyler Erle seems to agree. Now theres the question of the other half - is it warranted, and if so, whats appropriate. Ill mull on that over the weekend. thx.