I've made some scripts to harvest (web scrape) metadata on Digitalarkivet (DA). Since the task is formidable I've split it into stages and use several scripts for each stage, common "stuff" is put into 2 scripts for reuse to keep scripts cleaner/more readable. These 2 scripts are always "included" in my scripts, and are a candidate for a module. I'm thinking of making these into 1 or 2 modules.The concept works on first 2 stages (just need to code more the rest).
Mainly have five questions (seek advice on these matters) 1) One or two modules? Is it a good idea to split database operations into a separate module. I use DBI and try to avoid non-standard SQL most SQL is basic SELECT or INSERT/ LOAD DATA (more advanced SQL is placed in stored procedures, and I call these when more intricate tasks are needed) So far I've got 4 subs in "DA.pl" and 20 subs for "DA-DBI.pl". There will be more methods when I code for 2 next stages. I always need both scripts for my use. Can't really run without database in back-end (although I often opt storage to file.. mainly for either temporary/speed issues or debugging/informational purpose) or should I just put everything into 1 module since config file can alter database (from MySQL to anything also supported by DBI, some minor things are mysql dependent, and could instead be moved to stored procedures ) 2) Should it be a module at all? Since I heavily depend on database back-end should it be a module of its own? I need to reuse code for many tasks (different scripts) in order to web scrape metadata on the site. Is it more an App? 3) Namespace Not quite sure if I'm going to release all code to scrape site. I've put code in several scripts which may or not be included along side with my module(s). The 2 main reason's are it took me 4 days to scrape site first time. Don't want everyone to scrape whole site just for fun. secondly not completely confident that everyone would respect my licence. I'm happy to share on non-commercial basis. But would like something in return if used commercially If it's released as an app (working code for everyone) then APP namespace should be used if I understood "pause_namingmodules". Otherwise depending on one or modules I've been thinking of DIS::DA & DIS::DA::DBI (DIS is the acronym for the Genealogy society I'm a member of, and making code for. DA is a known acronym for Digitalarkivet (Digital Archive of Norway). If one module DigitalArkivet.pm might be the best choice? 3) Best practice for POD? As a "newbie" on POD, I've put the pod in between in code, reducing the need for (extra) header comments on subs. The POD documents the code of each sub, as a header to each sub. Most POD I've seen puts all pod at the end of the file. (Both can be done, but is the latter highly recommended / BEST practice?) I find it easier to write POD when I see what is going on, also it forces me to write POD at once.. I could copy everything to the end of the file, before "release", but then I feel I've got to (re)write header documentation on each sub. 4) To CPAN or not to - Licence My first thought is to licence it as something like this: DA-DBI.pl by Rolf B. Holte is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at http://dev.perl.org/licenses/artistic.html. Why? I'd like to share code but not for commercial use? Would that be OK, or do I have to use Perl/ artistic license to put on CPAN? Can I prohibit commercial use? -- rbh