I've made some scripts to harvest (web scrape) metadata on Digitalarkivet
(DA). Since the task is formidable I've split it into stages and use
several scripts for each stage, common "stuff" is put into 2 scripts for
reuse  to keep scripts cleaner/more readable. These 2 scripts are always
"included" in my scripts, and are a candidate for a module. I'm thinking of
making these into 1 or 2 modules.The concept works on first 2 stages (just
need to code more the rest).

Mainly have five questions (seek advice on these matters)

1) One or two modules?

Is it a good idea to split database operations into a separate module. I
use DBI and try to avoid non-standard SQL most SQL is basic SELECT or
INSERT/ LOAD DATA (more advanced SQL is placed in stored procedures, and I
call these when more intricate tasks are needed) So far I've got 4 subs in
"DA.pl" and 20 subs for "DA-DBI.pl". There will be more methods when I code
for 2 next stages.  I always need both scripts for my use. Can't really run
without database in back-end (although I often opt storage to file.. mainly
for either temporary/speed issues or debugging/informational purpose)

or

should I just put everything into 1 module since config file can alter
database (from MySQL to anything also supported by DBI, some minor things
are mysql dependent, and could instead be moved to stored procedures )

2) Should it be a module at all?

Since I heavily depend on database back-end should it be a module of its
own? I need to reuse code for many tasks (different scripts) in order to
web scrape metadata on the site. Is it more an App?


3) Namespace

Not quite sure if I'm going to release all code to scrape site. I've put
code in several scripts which may or not be included along side with my
module(s). The 2 main reason's are it took me 4 days to scrape site first
time. Don't want everyone to scrape whole site just for fun. secondly not
completely confident that everyone would respect my licence. I'm happy to
share on non-commercial basis. But would like something in return if used
commercially If it's released as an app (working code for everyone) then
APP namespace should be used if I understood "pause_namingmodules".
Otherwise depending on one or modules I've been thinking of DIS::DA &
DIS::DA::DBI (DIS is the acronym for the Genealogy society I'm a member of,
and making code for. DA is a known acronym for Digitalarkivet (Digital
Archive of Norway).  If one module DigitalArkivet.pm might be the best
choice?


3) Best practice for POD?

As a "newbie" on POD, I've put the pod in between in code, reducing the
need for (extra) header comments on subs. The POD documents the code of
each sub, as a header to each sub. Most POD I've seen puts all pod at the
end of the file. (Both can be done, but is the latter highly recommended /
BEST practice?) I find it easier to write POD when I see what is going on,
also it forces me to write POD at once.. I could copy everything to the end
of the file, before "release", but then I feel I've got to (re)write header
documentation on each sub.

4) To CPAN or not to - Licence

My first thought is to licence it as something like this:

DA-DBI.pl by Rolf B. Holte is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions
beyond the scope of this license may be available at
http://dev.perl.org/licenses/artistic.html.

Why? I'd like to share code but not for commercial use?

Would that be OK, or do I have to use Perl/ artistic license to put on
CPAN? Can I prohibit commercial use?
-- 
rbh

Reply via email to