I don't know what the general feeling is but I've always felt that there
should be an ETL Top level module namespace.  ( if you don't count
practical extraction and reporting language :)  The issue is, there doesn't
appear to be very good community consensus on best practices for ETL
behavior or methods.   I suspect the variation in that namespace early on
might be distracting?  Or maybe if you build it they will come?

I notice that you have Extract and Load covered in your proposal.  Do you
also have transform and logging on the way?

Best Regards,

Jed (JANDREW <https://metacpan.org/author/JANDREW>)

On Tue, May 3, 2016 at 2:23 AM, Nelson Ferraz <nfer...@gmail.com> wrote:

> I'm the maintainer of the DataWarehouse::* modules.
>
> Let me know if you would like to use the DataWarehouse::ETL namespace.
>
>
>
> On Tue, May 3, 2016 at 10:36 AM, Smylers <smyl...@stripey.com> wrote:
>
>> Robert Wohlfarth writes:
>>
>> > I am looking to release a collection of modules for converting data.
>> > The modules read data from a source, convert the data, then add it
>> > into an SQL database.
>> >
>> > The modules are named like this...
>> > * Data::ETL
>> > * Data::ETL::Extract
>> > * Data::ETL::Extract::Excel
>> > * Data::ETL::Extract::DelimitedText
>> > * Data::ETL::Extract::XML
>> > * Data::ETL::Load
>> > * Data::ETL::MSAccess
>> >
>> > In my mind, ETL means "Extract-Transform-Load".
>>
>> That wouldn't've occurred to me, but the Wikipedia page for ‘Extra,
>> transform, load’ is the top link when searching DuckDuckGo for “ETL”, so
>> it seems reasonable to use it in a module name if your target audience
>> is people already working in the field and familiar with its jargon.
>>
>> > Is "Data" an appropriate place?
>>
>> Yes ... and no. Data:: is appropriate for pretty much every module on
>> Cpan, in that an awful lot of code does stuff with data. That makes it a
>> suboptimal namespace, because it doesn't define what's specific about
>> this particular module.
>>
>> In particular, it didn't to me suggest databases, or even data
>> warehousing (which the ETL Wikipedia page suggests is the main use of
>> ETL). It'd be good for the name to indicate that field in some way.
>>
>> > Thoughts on the naming convention "Data::ETL"?
>>
>> The combination of a very broad namespace and an acronym makes it hard
>> to guess at the area of the module — for instance that would be an
>> equally good name for a module that processes data searching for
>> extra-terrestrial life ...
>>
>> If the database-loading part uses DBI connections then the DBIx::
>> namespace would be good for indicating that.
>>
>> Unfortunately for you, DataWarehouse::ETL is already used by another
>> module. Ideally you'd mention that module in your docs, explaining to
>> new users the difference between them. If your name can help to indicate
>> the distinctive feature of yours, so much the better — but often that
>> isn't possible if they are simply different approaches to the same
>> problem.
>>
>> One possibility for a suite of connected modules that only really work
>> together is to concoct a ‘fanciful’ brand name for the framework, like
>> Moose or Catalyst and put all your modules under either $Brand:: or
>> something like DataWarehouse::$Brand::.
>>
>> A framework name works well if, say, your $whatever::Extract::Excel
>> module is only intended to be used with other modules in your framework
>> and doesn't really make sense as a standalone module for somebody just
>> wanting to extract data from an Excel spreadsheet (and get back a Perl
>> data structure they can do what they want with). The brand name
>> indicates that it's part of the framework and to be used with that.
>>
>> Hope that helps.
>>
>> Smylers
>> --
>> http://twitter.com/Smylers2
>>
>
>
>
> --
> Nelson Ferraz
>

Reply via email to