Re: [Perldl] Let us Kvetch! (was: PDL book checking)

David Mertens Mon, 23 Jan 2012 05:37:10 -0800

On Sun, Jan 22, 2012 at 11:06 PM, Clifford Sobchuk <
[email protected]> wrote:

> Here's my opinion. It is really nice to have PDL as a distribution. With
> better documentation a lot of the dependencies would be better highlighted
> and be able to be understood.
>

With the current monolithic PDL, it is easy for this sort of documentation
to slip through the cracks. If we were to create PDL::Slatec as a separate
distribution (for example), it would be abundantly clear whether or not the
external dependency was properly noted in the docs. It would be much easier
to specify and query about these dependencies during the build process and
give meaningful error messages.

> I have seen the emails on perlbrew, alien and local::lib - but I have no
> idea what they are.
>

perlbrew is a Linux/Mac thing (hopefully someday ported also to Windows)
that makes installing and managing multiple versions of Perl about as easy
as checking your directory listings (i.e. "dir" or "ls").

Alien is the conceptual namespace given to managing (or at least querying)
external dependencies via a module that is installed from CPAN. Combined
with local::lib or perlbrew, this allows users to install many external
dependencies without needing administrative control of their system.

local::lib provides a simple cross-platform (Windows, Linux, Mac, probably
others) means for installing libraries under your user account so that you
don't mess with the system perl, and you don't need admin privileges to
install modules. I have used this recently to get around the fact that at
my new job, I don't have admin access. This way I get CPAN without having
to bug my admin.

> When it comes to people who use bits and pieces of perl to get there work
> done and are not perl experts, PDL is becoming more useful. I have been
> able to get two other people in my group to install it and start playing
> with it.

This is terrific, but what changed over the last couple of years in PDL
that made this possible?

I am going to guess that when it comes to commercial applications, unless
> the person performing the analysis already has a good background in perl,
> they will not know anything about its culture, or its mechanisms.

You mean in-house applications, right? If that's the case, then I can't
imagine anybody coming to PDL without either (1) already knowing about perl
or (2) having somebody who knows about Perl and is at least aware of PDL.
At any rate, if they don't know about CPAN, then why not be consistent with
the rest of CPAN? What do we lose by being consistent?

> In our extended group there are probably ~40 people that use perl, all of
> us use ActiveState and several use cygwin, because they have installers
> that work and when they need a new package it is easy for them to find.
> About 25% of them even know what CPAN is never mind knowing how to use it
> or cpanm.
>

Right now, you can easily install PDL for ActiveState and cygwin without
having to care about things. It just works. I would like to see this happen
just as easily with CPAN installs as well, and the first step in this is
streamlining the PDL build process. Splitting PDL into multiple pieces will
not effect PDL's installation reliability for you, by the way, because the
package managers you mention will have no trouble pulling multiple
dependent packages.

So from my point of view, having an installable package that provides 2d
> and 3d interactive graphics is great.

Alas, ActiveState and cygwin have lulled you into complacency: PDL does not
support 2d graphics out of the box. 2.4.11 will address this, but right now
this is one of PDL's greatest shortcomings. Also see my notes below about
cluster computing.

I find myself using it more than R or SciLab now. Although there are still
> things that I find easier to do in both of those applications as well, and
> work will always be like that. One application will be better for sometype
> of analysis than another.

This is good news! One of R's greatest strength, to my knowledge, is CRAN.
It is *precisely* why one of my coworkers used R for a regression analysis
last month, because our boss said "There's already a package in R for
handling this." Wouldn't it be awesome if the same were true with PDL?

> Right now I think that PDL is becoming a very good application
>

I would really like to know how PDL has improved over the last few years in
such a way that you say it is "becoming" a very good library. Aside from
John Cerney's work making pThreading automatic, the core hasn't changed for
years. If you think it has improved, that is *great* and we should give
kudos to whoever has implemented the changes that make your life easier. We
should also be aware of them so that future work does not change them.

- more than a perl distribution and much more than a bunch of loosely
> connected perl packages -

To say that PDL is "more than a [CPAN] distribution" really frustrates me.
Moose is more than a CPAN distribution. BioPerl is more than a CPAN
distribution. DBI is more than a CPAN distribution. They are whole
categories of modules that allow people to get work done, and they have
entire communities surrounding them. PDL *is* a CPAN distribution. We have
not made it easy for related modules to spring into existence.

that don't always work nor are properly supported.

You might think that most of PDL works because the interactions between the
different components are well understood and there is a good test suite,
but this is far from true. The test coverage on the core is abysmal (I
added the first tests of PDL::PP a few months ago!) and most stuff works
because the original implementer worked hard to make it work, without
passing along his institutional knowledge. The interactions that modules
have between one another are not well documented. In short, PDL is brittle.
If we don't touch it, it will continue to perform brilliantly. But it won't
get cool new stuff, either. The few fiddlings I have done with PDL::PP I
have tested with, "Well, if the core still compiles, I guess I didn't break
anything." That is not the right approach, IMHO.

Recently one of the packages that I used for date-time I notices was
> providing inconsistent results. I then found that the person no longer
> supports the package. Once a distribution becomes fractured, you will run
> in to these types of issues as well as the integration quality.
>

Yep, people move on. The exact same thing has happened with a number of
modules in PDL itself. However, as it's all wrapped up into PDL, it's
harder to notice. Our current distribution is already fractured, as
evidenced by the fact that PDL::Fit::Linfit installs even when PDL::Slatec
does not. I pointed this out a couple years ago but nobody has taken it
upon themselves to fix it, in part because the original author is gone, and
in part because it's not a trivial fix with the current PDL build system.

That having been said, just because I propose that we split PDL into
multiple distributions does not mean that the PDL Porters will no longer
claim responsibility for them. In fact, I would like to see a way for the
PDL Porters to accommodate even more. I do not understand, for example, why
PDL::Stats is not part of PDL.

I completely agree Quality Assurance is number 1. Documentation is part of
> Quality Assurance of a product.

Yes, I have found that writing docs for my modules and writing the test
suite often cause me to rewrite parts of my code or change my API. To
borrow something from brian d foy, if it's hard to document, or if it's
hard to test, that is a signal that it's probably poor programming.

Possibly an architecture of plugins is what you would like to see around a
> PDL core.

Well, yes, though I would call them "modules" or "module distributions"
instead of plugins. :-)

The definition of the core though will be key. Matlab, Scilab, have
> somewhat defined a core that would include 2d and 3d interactive graphics,
> and then pluggable modules for specialized analysis - such as signal
> processing, thermodynamics, etc.

I disagree. I want a very lean core. I want something that I could
reasonably request for a computing cluster. Fourier transforms? Yes. Matrix
operations? Preferably, though I could be convinced that they should be in
a separate module. 2D and 3D plotting? Unnecessary for a computing cluster.
However, we could very easily create a Bundle or a Task module that
incorporates different combinations of modules so instead of telling your
coworkers to install PDL, you would tell them to install "Task::PDL::Cliff"
or some such. And no, I'm not joking about that name. See
p3rl.org/Taskfor details about the Task namespace.

> R on the other hand, almost everything is a module, and it drives me crazy
> sometimes to get what I want out of it.
>

Why? What is the most irritating part? What should PDL hope to avoid? I can
tell you that personally I don't like it when I need some Perl module which
has a long dependency chain because installing it in the middle of writing
a script can lead to a five minute interruption. Migration to a new machine
can also be a headache if I have to reinstall all those modules. However,
this is Perl and there are solutions to these sorts of issues.

My 2 cents.
>
> CLIFF SOBCHUK
> Core RF Engineering
> Phone 613-667-1974   ecn: 8109-71974
> mobile 403-819-9233
> yahoo: sobchuk
> www.ericsson.com
>
> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who
> is solely responsible for this email and its contents. All inquiries
> regarding this email should be addressed to Ericsson. The web site for
> Ericsson is www.ericsson.com."
>
> This Communication is Confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_disclaimer
>
>
> -----Original Message-----
> From: chm [mailto:[email protected]]
> Sent: Sunday, January 22, 2012 8:14 PM
> To: David Mertens
> Cc: [email protected]
> Subject: Re: [Perldl] Let us Kvetch! (was: PDL book checking)
>
> On 1/22/2012 9:45 PM, David Mertens wrote:
> > To all -
> >
> > I've changed the original subject. I hope this doesn't bother anybody
> > too much.
>
> Subject change is ok, but you dropped the thread so users don't see the
> earlier part of this discussion to which this appears to be a response.
>
> The first:
> > -------- Original Message --------
> > Subject: Re: PDL book checking
> > Date: Sun, 22 Jan 2012 16:18:32 -0500
> > From: chm <[email protected]>
> > To: Matthew Kenworthy <[email protected]>
> > CC: [email protected] <[email protected]>
> >
> > On 1/22/2012 3:57 PM, Matthew Kenworthy wrote:
> >>>
> >>> I forgot to point out that =ff is just what is needed to put page
> >>> breaks at the start of each chapter...
> >>>
> >>>
> >> Ah! Good to know :)
> >>
> >>> I'm confused.  Yes, the plan was to have a PDL::Book distribution,
> >>> which, by definition, would include the PDL::Book.
> >>>
> >>>
> >> I thought the ultimate idea was to put PDL::Book into the PDL-2.4.10
> >> tarball, but the discussion about the sizes of the included images
> >> nixed that idea. YOu can revive the diea by having PDL::Book only
> >> have text and image generating scripts. I think that your point is to
> >> keep PDL::Book a separate distribution entirely, which is where our
> confusion comes in.
> >
> > OK.  There seems to be an enormous amount of interest in "putting" the
> > PDL Book into the PDL distribution.
> >
> > While it _seems_ simple to just add it into the current "kitchen sink"
> > PDL has, the reality is that if PDL were split into a core
> > distribution and a number of other, separate, distributions
> > corresponding to the external dependencies, we would be *much* better
> > off:
> >
> > (1) The core would already be 100% ported since
> >     it is mostly the external libraries and programs
> >     that are difficult to get working consistently
> >     across all platforms.
> >
> >     For example, a win32 PDL still takes
> >     significant guru expertise to do.  I *still*
> >     can't do it.  Although, if I took the time,
> >     I could follow Rob's instructions and build
> >     it eventually...
> >
> >     We work around that through Rob's generosity
> >     to build and make available up-to-date PPD
> >     versions of PDL CPAN releases, including the
> >     latest developers release.
> >
> > (2) Code improvement in PDL modules could happen
> >     faster without having to wait for the entire
> >     PDL distribution.  By releasing frequent git
> >     snapshots as developers releases, I've been
> >     able to reduce some of the impact of this.
> >
> >     However, the developers releases are even
> >     farther from 1-click installs then the CPAN
> >     official releases.
> >
> > (3) The full on, kitchen sink version of PDL
> >     could still be bundled up and distributed
> >     as a single distribution rather than the
> >     possibly dicey use of cpan or cpanm to
> >     build all the dependencies correctly.
> >
> > (4) For similar reasons, having the PDL-Book-0.0.1
> >     distribution works better: more frequent or
> >     needed updates can be made as required, issues
> >     of format generation and image generation will
> >     continue to be worked out, a book isn't the
> >     same thing as on-line help or documentation
> >     (although they could be viewed with the same
> >     utilities),...
> >
> > Cheers,
> > Chris
> >
> >> And, I should add, at this point, this is a Good Idea.
> >>
> >> The issue of generating the figures occurred to
> >>> me when I saw that the full size image looked fine but that the
> >>> scaled html image had lines that were too thin and hard to see.  It
> >>> would be better to have a separate NxN for HTML and 800x800 for PDF
> >>> output.
> >>>
> >>>
> >> Hmm, I think that good displayable single source images are possible
> >> with
> >> HardLW=>5 and HardCH=>2 for illustrations. But that's something for
> >> the release after this upcoming one!
> >>
> >> Matt
>
> And the second, additional points:
> > -------- Original Message --------
> > Subject: Re: [Perldl] PDL book checking
> > Date: Mon, 23 Jan 2012 00:10:11 +0100
> > From: Henning Glawe <[email protected]>
> > To: [email protected]
> >
> > On Sun, Jan 22, 2012 at 04:18:32PM -0500, chm wrote:
> >> While it _seems_ simple to just add it into the current "kitchen
> >> sink" PDL has, the reality is that if PDL were split into a core
> >> distribution and a number of other, separate, distributions
> >> corresponding to the external dependencies, we would be *much* better
> >> off:
> >>
> >> (1) The core would already be 100% ported since [ ... ]
> >> (4) For similar reasons, having the PDL-Book-0.0.1
> >
> > With my Debian Developer's hat on (those points mainly refer to the
> > 'bleeding edge' of debian development, i.e.
> > testing/unstable):
> >
> > (5) a problem with a single dependency would not kick all of pdl
> >     and all packages it depends on from our testing branch,
> >     which has happened recently due to portability problems with plplot.
> > (6) Less problems with SONAME transitions, as only the relevant
> >     interface module packages would need to be updated.
> > (7) Easier/more reliable way to automatically create package
> >     dependency lists (each interface module depends on the
> >     corresponding library packages). As mentioned recently on this
> >     list, the dependency list of debian's pdl package is a bit
> >     long; I have to do the splitting into depends, suggests
> >     and recommends manually, that's maybe why a bit too much
> >     slipped through... this would be a lot easier if we had
> >     a 'core' with minimal external dependencies and interface
> >     distributions.
> >
> > --
> > c u
> > henning
>
> And this reply (in context):
> > This is a well-worn discussion. The last time this was thoroughly
> > discussed was on the porters list: see the porters' archives starting
> > from October 31, 2009, and running into November. These are from the
> > days when I spent a lot of effort stirring the pot, and a bit less at
> actually writing code.
> > [Note: November 2009 has one of the largest collection messages in the
> > archives, and my pot stirring has been bested by no less than the
> > great Daniel Carrera, whose ability to stir a pot (and get docs and
> > code written) still impresses me.] I have since repented my lack of
> > code writing and I've tried hard to focus more on writing code and
> > less on stirring pots. If you don't believe me, just wait for
> > PDL::Graphics::Prima. :-)
> >
> > Back in late 2009 when we last discussed this, nearly everybody was in
> > favor of splitting PDL into multiple pieces. Judd Taylor voiced a
> > dissenting opinion, stating that PDL needs to have a large collection
> > of numerical capabilities built-in so that it appeals to new folks. If
> > people have to install lots of modules to get what they want, they'll
> > just walk away. He also claimed that the lack of an install-everywhere
> > 2D plotting library was a big issue. Read his email and others'
> > responses to get a fuller picture:
> > http://mailman.jach.hawaii.edu/pipermail//pdl-porters/2009-November/00
> > 1617.html
> >
> > These days, I stand by my original statement, that PDL would be best
> > served split into many smaller pieces. BioPerl underwent a similar
> > transitions a few years ago, and many major frameworks (Test comes to
> > mind) were built like this from the ground up, providing a simple core
> > upon which others can build. We are a Perl technology, and I believe
> > we would do well to embrace the current trend in Perl modules to
> > provide simpler distributions that target specific goals.
> >
> > I see two issues:
> >
> > 1) Quality Assurance. Whenever somebody makes a change to the core,
> > they run the *whole* test suite. If we split the core, changes made in
> > one component will not be easily tested against other components.
> > Solutions include (1) CPAN testers, which should be able to pick out
> > bad interactions within a few days to a week, and (2) a continuous
> > integration server specifically for PDL. For the latter, jitterbug, a
> > Perl-based continuous integration system comes to mind. It would be
> > amazing, it would take time to set up, and it would cost $$ to host
> > the server unless somebody out there has a box sitting idle on a
> > static IP. (Lately I have been thinking about purchasing a $7/month
> > VPS for this very idea, and for hosting the IRC
> > bot.)
> >
> > 2) Knowing where to find things. If we split things up, we must have
> > documentation about where to find information about different PDL
> > capabilities. This is all the more important if users are installing
> > PDL, and need to know what to install. Installation itself is becoming
> > much easier with local::lib, perlbrew, and the Alien packages (shout
> > out to Joel for his recent work on Alien::Base). The current docs are
> > not very tied together and may not give the user and idea of where (in
> > monolithic PDL) where to look, but they do have an Index document that
> > knows about all the installed modules. The solution to this is simple,
> > but hard: write better docs that make thorough references to what's out
> there.
> >
> > I believe that the benefits greatly outweigh the costs, but the
> > greatest missing piece is commitment by PDL porters and users to make it
> happen.
> > Back then, I began working on Module::Build::PDL, but I lost steam
> > when I was told that M::B::PDL would have to build the entire PDL
> > distribution. I have since figured out that what I had accomplished
> > with M::B::PDL can be as easily achieved using Module::Build with well
> > crafted .pm.PL files. My opinion is now this: if we can't achieve our
> > split of PDL into pieces that M::B can handle, then the pieces are still
> too big.
> >
> > So, here's what I say: let us kvetch! We will move forward with 2.4.10
> > and the close follow-up of 2.4.11, but everybody should lay out their
> > thoughts about splitting up PDL (or not). After the dust has settled,
> > as 2.4.11 is taking form, those us of who are truly interested can
> > re-read the discussion and decide how to move forward. In particular,
> > I would *love* to send out another survey, this time asking about what
> > people want and *how many hours people are willing to commit to make
> > it happen.*
> >
> > David
> >
> >
> >
> >
> > _______________________________________________
> > Perldl mailing list
> > [email protected]
> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

This notion that PDL should have everything everybody wants in it already
rubs me the wrong way. It insulates PDL from Perl and CPAN. CPAN is an
amazing resource. Why should we insulate our users from it?

Furthermore, PDL is not only monolithic, it's source file structure differs
substantially from the layout of the vast majority of Perl modules on CPAN.
I know about so much low-hanging fruit that a hacker with the right skills
could easily solve for us, but they are buried in a source tree that would
be hard for newcomers to grok.

In short, there are solutions to the user-level issues that you raise. But
I would like to make it easier to attract *developers* to PDL, and
splitting PDL into well-defined modules is a very important first step. If
nothing else, it signals to the Perl community, "Hey, we're alive and well,
and we're trying to make it easier for you to hack on it."

David

P.S. QA is a big deal for any major next steps. Do you think you might be
able to convince your company to spare some server time, at night perhaps,
to run smoke tests and/or continuous integration tests?

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Let us Kvetch! (was: PDL book checking)

Reply via email to