I need a better name than File_Shred

2003-11-14 Thread Jim Cromie
Hi folks,

Ive written a module which implements Eric Raymonds 'shred'
program, which is pretty well described here.
http://www.arstechnica.com/archive/news/1063140308.html

File::Shred is my working name

My version has a different focus, namely to find duplicate code chunks,
write macros for them, and invoke those macros.  So maybe a different name
is appropriate.
File::Macrofy
File::Macroize::C
a Perl version would create a string-eval equivalent:
my $MACRO_NNN = q{  }
   eval $MACRO_NNN;
given its basis in MD5, it cannot find even trivial differences in 2 chunks,

FWIW - Ive applied it to bleadperl source code, and am getting what look 
like reasonable
results, though I havent tried compiling yet.  (I will b4 the .01 release)
Ok, I did, it breaks, have to avoid chunks with unbalanced #ifdefs..

anyway,

[EMAIL PROTECTED] bleadperl]$ ls *.c.new |wc
35  35 415
[EMAIL PROTECTED] bleadperl]$ more *.c.macros |wc
  29879814   68087
[EMAIL PROTECTED] bleadperl]$ grep MACRO *.c.macros |wc
   2821128   14004
[EMAIL PROTECTED] bleadperl]$ grep MACRO *.c.new |wc
   428 856   15234




Re: Author's namespace

2003-11-14 Thread Struan Donald
* at 14/11 10:25 + Fergal Daly said:
 But what about code that is shared by several CPAN modules but which I
 don't consider to be worth getting up to standard for general use.
 It's not that the code is trash, it's fine I just can't see anyone
 else wanting to use it, even if it was fully documented.
 
 I suppose I'll just have to upload Class::OhGodNotAnotherMethodMaker,

I really don't see the value of adding this sort of thing to CPAN. If
code's going to go on CPAN as it's own distribution then I think it
should be properly documented and so on. If a distribution needs a
module then either the module should be released to CPAN as a proper
distribution or the module should be included as part of the relevant
distribution.

And if you're including the code in several CPAN modules then
shouldn't the code be up to standard for general use? Just because you
can't see anyone wanting to use it doesn't mean it shouldn't be
documented. Anyone using one of those CPAN modules shouldn't have to
ferret around in source code to realise what your convience methods
are there for.

cheers

Struan


RE: Author's namespace

2003-11-14 Thread Orton, Yves
Title: RE: Author's namespace





 * at 14/11 10:25 + Fergal Daly said:
  But what about code that is shared by several CPAN modules 
 but which I
  don't consider to be worth getting up to standard for general use.
  It's not that the code is trash, it's fine I just can't see anyone
  else wanting to use it, even if it was fully documented.
  
  I suppose I'll just have to upload 
 Class::OhGodNotAnotherMethodMaker,
 
 I really don't see the value of adding this sort of thing to CPAN. If
 code's going to go on CPAN as it's own distribution then I think it
 should be properly documented and so on. If a distribution needs a
 module then either the module should be released to CPAN as a proper
 distribution or the module should be included as part of the relevant
 distribution.



I though that CPAN historically carried stuff like this. Isnt that waht the scripts directory is?


yves





Re: Author's namespace

2003-11-14 Thread [EMAIL PROTECTED]


Original Message:
-
From: Struan Donald [EMAIL PROTECTED]

 And if you're including the code in several CPAN modules then
 shouldn't the code be up to standard for general use? Just because you
 can't see anyone wanting to use it doesn't mean it shouldn't be
 documented.

The code is fine, it's quite simple and doesn't really need docs, however I
don't really want anyone else using it because then it becomes a
responsibilty. There are plenty of similar modules contained within
existing distributions. They are not polished, have no pod etc. They are
only to be used from within the distribution itself and only need to be
understood by people changing the distribution in question. I don't think
this bothers people too much. My module is like these, it has previously
shipped inside another distro, undocumented, unexposed. I want to use it
with several other modules but I don't want to cut and paste.

As it happens, it looks like the original Class::MethodMaker has an
undocumented way to do what I want, so for this module it may not be an
issue but everyone has their own file slurping routine and various other
bits and bobs that they do their own way, rather than copying them into
lots of modules, a personal namespace of utility stuff could be useful.

Also somewhere to put things which are under review is also useful and
seems to have been lost in the methodmaker discussion.

 Anyone using one of those CPAN modules shouldn't have to
 ferret around in source code to realise what your convience methods
 are there for.

Ideally, anyone using one of my CPAN modules shouldn't have to ferret
around in any of my code documented or not but if they are then chances are
that documenting these particular bits would make no difference,

F




mail2web - Check your email from the web at
http://mail2web.com/ .




Re: Author's namespace

2003-11-14 Thread A. Pagaltzis
* Fergal Daly [EMAIL PROTECTED] [2003-11-14 13:10]:
 But what about code that is shared by several CPAN modules but
 which I don't consider to be worth getting up to standard for
 general use. It's not that the code is trash, it's fine I
 just can't see anyone else wanting to use it, even if it was
 fully documented.

I wasn't saying the code was trash - but a carelessly chosen name
and no documentation do make it clutter..

How about putting the module under the *same* name in all your
distributions that use it? This doesn't avoid duplication on
CPAN, granted - but does avoid it on the user end. Instead of
calling it Test::Deep:MM, Foo::Bar::MM, Baz::Quux::MM etc
depending on the distro, just stick it in all distros under the
same name, maybe something like Class::MyMethodMk.

-- 
Regards,
Aristotle
 
If you can't laugh at yourself, you don't take life seriously enough.


Re: I need a better name than File_Shred

2003-11-14 Thread A. Pagaltzis
* Jim Cromie [EMAIL PROTECTED] [2003-11-14 09:35]:
 My version has a different focus, namely to find duplicate code
 chunks, write macros for them, and invoke those macros.  So
 maybe a different name is appropriate.

So it is aimed at processing C sources? Then File:: is the wrong
TLNS for it, although off hand I'm at a loss about which one it
should be in. I'm not sure Parse::C:: is fitting here?

I think the fact that it compares shreds is to be ignored as an
implementation detail for the name at least.

The fact that it generates macros is important of course.. so is
the fact that it does so for common code, though. That should
probably be in the name somewhere.

Parse::C::CommonToMacros? Awkward and not truly descriptive I
think..

Hmm..

-- 
Regards,
Aristotle
 
If you can't laugh at yourself, you don't take life seriously enough.


Re: Author's namespace

2003-11-14 Thread [EMAIL PROTECTED]
From: A. Pagaltzis [EMAIL PROTECTED]
 I wasn't saying the code was trash - but a carelessly chosen name
 and no documentation do make it clutter..

I agree it's clutter that's why I'd like it not to be included when people
search. The name is chosen for my convenience and mine only. As Mark
mentioned in his mail, it's more of a personal style thing and as Mark
mentioned it could be a bad idea as people start building repositories of
their own secret modules rather than making the effort to release them
properly.

 How about putting the module under the *same* name in all your
 distributions that use it? This doesn't avoid duplication on
 CPAN, granted - but does avoid it on the user end. Instead of
 calling it Test::Deep:MM, Foo::Bar::MM, Baz::Quux::MM etc
 depending on the distro, just stick it in all distros under the
 same name, maybe something like Class::MyMethodMk.

But that will clash with Blah-Blah's Class::MyMethodMk.

Anyway, I'm not too stressed about the whole thing.

I'm more interested in a related Proposed:: namespace which came up in
passing. As in

Proposed::FDALY::Hey::ModuleAuthors::IsThis::A::GoodName::For::ThisModule

F



mail2web - Check your email from the web at
http://mail2web.com/ .




Re: Author's namespace

2003-11-14 Thread Eric Wilhelm
 The following was supposedly scribed by
 Mark Stosberg
 on Friday 14 November 2003 09:00 am:

I think I have a similar concern. Here's my own case: I use a custom
sub-class of CGI::Application that I base most of my web-applications
on. Eventually, I would like to distribute some of these on CPAN, with
several of them referring to the same custom sub-class itself.

However, it don't think the sub-class module itself would be especially
interesting to others-- it might-- but it mostly seems like a set of
personal style choices about how I like to design web-applications.
If it didn't go under an Authors:: namespace, it seems like it would get
some other un-descriptive name like CGI::Application::MarksSubClass.

If you are releasing a module which uses these functions, it seems that you 
have only a few choices.

You could re-write your module to use only standard helper modules (not 
usually an appealing option, but you shouldn't rule it out.)

You could release your helper module without full documentation, and just 
explain that it is a matter of coding style (e.g. none of the algorithms are 
really anything new and it just makes some default choices for you and calls 
functions from other modules.)

You could fully-document the helper module (and maybe make it more 
configurable?)  I like this one the best, and maybe others who work in the 
same manner could benefit from it.  Do you think it is possible to boil-down 
the you-specific parts of your module into a config file in your home 
directory?  It would be interesting to see how this would work.

You could inline all of the helper module functions at the end of your regular 
module (maybe a dist target in your makefile can automate this for you.)

--Eric



Algorithm::Shred

2003-11-14 Thread Schuyler Erle
* On 14-Nov-2003 at 11:03AM PST, [EMAIL PROTECTED] said:
 
  And because 'shred' is open-source, and part of the Linux vs
  SCO drama, it serves as something of a touchstone - By
  understanding the algorithm, you know its
  advantages/disadvantages; fast but naive compared to parsing to
  an ASN.
 
 Good point. Algorithm::Shred?
 
  Its also applicable to any line-oriented text, not just
  programs, hence the File::
 
 Again, Algorithm::Shred sounds more like it.

Yes, by analogy with Algorithm::Diff, I think that makes a lot of
sense...

SDE


Re: Author's namespace

2003-11-14 Thread Mark Stosberg
On Fri, Nov 14, 2003 at 01:33:01PM -0600, Eric Wilhelm wrote:
 
 I think I have a similar concern. Here's my own case: I use a custom
 sub-class of CGI::Application that I base most of my web-applications
 on. Eventually, I would like to distribute some of these on CPAN, with
 several of them referring to the same custom sub-class itself.
 
 However, it don't think the sub-class module itself would be especially
 interesting to others-- it might-- but it mostly seems like a set of
 personal style choices about how I like to design web-applications.
 If it didn't go under an Authors:: namespace, it seems like it would get
 some other un-descriptive name like CGI::Application::MarksSubClass.
 
 You could fully-document the helper module (and maybe make it more 
 configurable?)  I like this one the best, and maybe others who work in the 
 same manner could benefit from it.  Do you think it is possible to boil-down 
 the you-specific parts of your module into a config file in your home 
 directory?  It would be interesting to see how this would work.
 
 You could inline all of the helper module functions at the end of your regular 
 module (maybe a dist target in your makefile can automate this for you.)

I think some other people would probably find some of my
personalizations useful as well. I'm open to cleaning it up some as
you suggest. 

Still that leaves the issue of naming it. It's still best described as
a module for building CGI applications Mark's way.  I could give it
some generic name like CGI::Application::TurboCharge, but that seems
to be of limited usefulness.

What's a good way to name these kind of personalization modules? It's
these kind of cases that make Authors:: begin to make sense.

Mark

--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


Re: Author's namespace

2003-11-14 Thread Eric Wilhelm
 The following was supposedly scribed by
 Mark Stosberg
 on Friday 14 November 2003 02:02 pm:

Still that leaves the issue of naming it. It's still best described as
a module for building CGI applications Mark's way.  I could give it
some generic name like CGI::Application::TurboCharge, but that seems
to be of limited usefulness.

If your way isn't the best way, there must be something wrong with it or your 
ego:)

Maybe you should name it according to how you would describe your style, so 
that others with a similar style could find it more easily.  
CGI::Application::Terse ?

As I said before, if it is mostly about wrapping a few functions into one and 
choosing some reasonable defaults with some options for over-riding these, 
I'd really like to see it be able to source a file out of the user's home 
directory (or a machine-wide /etc/ file) that would over-ride these defaults 
at compile time.

I really don't think that it is appropriate for a distributed module to 
require something out of an Authors:: tree.  If the helper functions are not 
something that others would want to use, it seems that this would make it 
more difficult to contribute to or subclass your front-end module.

Isn't it possible to distribute it under the front-end module?  For example, 
I'm currently working on CAD::Drawing, which will require 
CAD::Drawing::Manipulate, CAD::Drawing::Defined, and CAD::Drawing::IO to name 
a few.  Everything below CAD::Drawing is rather helpless without it and it is 
helpless without them, so I had planned to pack them all into a 
CAD-Drawing-0.01.tar.gz and upload that (with the exception of the various 
CAD::Drawing::IO::backend modules to be installed as options.)

If this setup is possible, then maybe your helper functions should be in 
CGI::YourModule::helpers and packed-up with it for distribution at least 
until they can find a home of their own.  If you are duplicating the helpers 
multiple times (in each distributed module under totally separate namespaces) 
to do this, just note that in the sparse documentation and plan to do better 
tomorrow.

--Eric



Re: Author's namespace

2003-11-14 Thread Mark Stosberg
On Fri, Nov 14, 2003 at 02:19:44PM -0600, Eric Wilhelm wrote:
  The following was supposedly scribed by
  Mark Stosberg
  on Friday 14 November 2003 02:02 pm:
 
 Still that leaves the issue of naming it. It's still best described as
 a module for building CGI applications Mark's way.  I could give it
 some generic name like CGI::Application::TurboCharge, but that seems
 to be of limited usefulness.
 
 If your way isn't the best way, there must be something wrong with it or your 
 ego:)
 
 Maybe you should name it according to how you would describe your style, so 
 that others with a similar style could find it more easily.  
 CGI::Application::Terse ?

This seems like the best option in my case. I'll start discussing specifics
on the CGI::Application list and see where it goes.  

Thanks for the nudge.

 Isn't it possible to distribute it under the front-end module?

This would be OK if there was only one front-end. The code is re-usable
enough that is probably supporting a dozen or so projects already. I
would hope that over time I would release at least two projects to CPAN
that require it.

Mark


--
 . . . . . . . . . . . . . . . . . . . . . . . . . . . 
   Mark StosbergPrincipal Developer  
   [EMAIL PROTECTED] Summersault, LLC 
   765-939-9301 ext 202 database driven websites
 . . . . . http://www.summersault.com/ . . . . . . . .


Re: I need a better name than File_Shred

2003-11-14 Thread Jim Cromie
A. Pagaltzis wrote:

* Jim Cromie [EMAIL PROTECTED] [2003-11-14 17:43]:
 

Its not 'particular' to C, except in reduce(), the last step,
which acts on the detected redundancies.  As I outlined, a Perl
version could move chunks into strings, then eval that in the
many places its needed.
   

Then maybe it should be split into two modules, one making use of
the other.
 

Heh - that creates 2 naming problems :-OJoking aside, that might 
solve the
problem of not fitting well in either domain - I just have to ponder 
that split,
and what names make sense for the 2 halves.

Me:

And because 'shred' is open-source, and part of the Linux vs
SCO drama, it serves as something of a touchstone - By
understanding the algorithm, you know its
advantages/disadvantages; fast but naive compared to parsing to
an ASN.
   

Good point. Algorithm::Shred?
 

Hmm - that might solve other problems, such as - What is a File::Shred ?
the wad of shreddings ? or the shredder that made them ?!
I'll look at some of the other Algorithm::* stuff to if the object 
models there have
a workable similarity.

FWIW - In  File_Shred::*, I have File_Shred::Shreddings, File_Shred::Chunk,
File_Shred::Ribbon, and File_Shred::Comparison.  Im not entirely happy 
with this
partitioning, but its a start.

Curiously, this shred and knit algorithm has some similarities with Gene 
Sequencing.
There, they chop up DNA such that the 2 sides of the helix fragments are 
ragged -
ie loose tails of 1 side of the helix dangle from both ends.  Then they 
put it in a bath
of nucleic acid, and those tails regrow; A-T, C-G.   Now the soup has 
fragments which
overlap with the fragments that were split off at either end.   Then 
they send the fragments
thru the sequencer, and knit the sequences together.

(BTW - this has inaccuracies - IANAB)

 

Its also applicable to any line-oriented text, not just
programs, hence the File::
   

Again, Algorithm::Shred sounds more like it.

The C-specific part would then be provided in another module
which would have to be named independently, probably leaving C as
the last part of the name so there is ::C, ::Perl etc.
 

Yes - Id say it fits, and Schuyler Erle seems to agree.
Now theres the question of the other half - is it warranted, and if so, 
whats appropriate.

Ill mull on that over the weekend.
thx.