Re: [Perldl] Let us Kvetch! (was: PDL book checking)

Chris Marshall Mon, 23 Jan 2012 09:00:34 -0800

On Mon, Jan 23, 2012 at 8:26 AM, David Mertens <[email protected]> wrote:
> On Sun, Jan 22, 2012 at 11:06 PM, Clifford Sobchuk
> <[email protected]> wrote:
>>
>> Here's my opinion. It is really nice to have PDL as a distribution. With
>> better documentation a lot of the dependencies would be better highlighted
>> and be able to be understood.
>
>
> With the current monolithic PDL, it is easy for this sort of documentation
> to slip through the cracks. If we were to create PDL::Slatec as a separate
> distribution (for example), it would be abundantly clear whether or not the
> external dependency was properly noted in the docs. It would be much easier
> to specify and query about these dependencies during the build process and
> give meaningful error messages.
>
>>
>> I have seen the emails on perlbrew, alien and local::lib - but I have no
>> idea what they are.
>
>
> perlbrew is a Linux/Mac thing (hopefully someday ported also to Windows)
> that makes installing and managing multiple versions of Perl about as easy
> as checking your directory listings (i.e. "dir" or "ls").
>
> Alien is the conceptual namespace given to managing (or at least querying)
> external dependencies via a module that is installed from CPAN. Combined
> with local::lib or perlbrew, this allows users to install many external
> dependencies without needing administrative control of their system.
>
> local::lib provides a simple cross-platform (Windows, Linux, Mac, probably
> others) means for installing libraries under your user account so that you
> don't mess with the system perl, and you don't need admin privileges to
> install modules. I have used this recently to get around the fact that at my
> new job, I don't have admin access. This way I get CPAN without having to
> bug my admin.
>
>>
>> When it comes to people who use bits and pieces of perl to get there work
>> done and are not perl experts, PDL is becoming more useful. I have been able
>> to get two other people in my group to install it and start playing with it.
>
>
> This is terrific, but what changed over the last couple of years in PDL that
> made this possible?


What has changed is that PDL now builds and compiles on
many platforms with a much more complete set of features
which has resulted in a wider availability of PDL for users
who are not and don't plan to be perl/C/... gurus just to get
their job done.

>> I am going to guess that when it comes to commercial applications, unless
>> the person performing the analysis already has a good background in perl,
>> they will not know anything about its culture, or its mechanisms.
>
>
> You mean in-house applications, right? If that's the case, then I can't
> imagine anybody coming to PDL without either (1) already knowing about perl
> or (2) having somebody who knows about Perl and is at least aware of PDL. At
> any rate, if they don't know about CPAN, then why not be consistent with the
> rest of CPAN? What do we lose by being consistent?

I think this is confusing the issues of PDL implementation
and distribution (which is a *developers* issue) and the
general availability and usability of PDL as a numeric
tool to solve problems and get work done.

I love CPAN, but you often have to jump through hoops
to get things installed.  Not necessarily bit hoops but
not something someone should *have* to do to use PDL.
Try 'cpan Devel::REPL' on a clean perl and let me know
how that works out for you...

>> In our extended group there are probably ~40 people that use perl, all of
>> us use ActiveState and several use cygwin, because they have installers that
>> work and when they need a new package it is easy for them to find. About 25%
>> of them even know what CPAN is never mind knowing how to use it or cpanm.
>
>
> Right now, you can easily install PDL for ActiveState and cygwin without
> having to care about things. It just works. I would like to see this happen
> just as easily with CPAN installs as well, and the first step in this is
> streamlining the PDL build process. Splitting PDL into multiple pieces will
> not effect PDL's installation reliability for you, by the way, because the
> package managers you mention will have no trouble pulling multiple dependent
> packages.

We're still a long way away from 1-click installs but we have
been improving.  Part of our job in refactoring PDL and perhaps
changing the CPAN module strategy is to *ensure* that
that does *not* affect the PDL install process for end users.

>> So from my point of view, having an installable package that provides 2d
>> and 3d interactive graphics is great.
>
> Alas, ActiveState and cygwin have lulled you into complacency: PDL does not
> support 2d graphics out of the box. 2.4.11 will address this, but right now
> this is one of PDL's greatest shortcomings. Also see my notes below about
> cluster computing.

Actually, it does support 2D and 3D graphics out of the box
for many systems with the proper packages available---at
least as much as any OS/platform.  However, the support
is not uniformly available nor consistent in implementation.
There is _definitely not_ a single, always available, 2D
graphics option.  The PDL-2.4.11 release is planned to
address this.

>> I find myself using it more than R or SciLab now. Although there are still
>> things that I find easier to do in both of those applications as well, and
>> work will always be like that. One application will be better for sometype
>> of analysis than another.
>
> This is good news! One of R's greatest strength, to my knowledge, is CRAN.
> It is *precisely* why one of my coworkers used R for a regression analysis
> last month, because our boss said "There's already a package in R for
> handling this." Wouldn't it be awesome if the same were true with PDL?

There is a difference between PDL users and perl users.
Maybe better support for PDL-specific code would help?

>> Right now I think that PDL is becoming a very good application
>
> I would really like to know how PDL has improved over the last few years in
> such a way that you say it is "becoming" a very good library. Aside from
> John Cerney's work making pThreading automatic, the core hasn't changed for
> years. If you think it has improved, that is *great* and we should give
> kudos to whoever has implemented the changes that make your life easier. We
> should also be aware of them so that future work does not change them.

Bug fixes, portability, better buildability, more documentation,
more uniform support across all platforms and not just a
favored few/two/one.  Part of the goal of this work has been
to try to make PDL a package that could generate a user
community.

A *huge* obstacle to this has been the poor to non-existent
support for PDL on win32 platforms, largely because the
original developers worked on unix systems.
(N.B.  Without Rob's support for PDL on win32 we *still*
wouldn't have a real PDL available for win32 systems.
As is, the PPD availability makes it better in some ways
than other systems with "real" package managers and
development environments.)

People have even stated on the mailing list that scientists
don't use windows---all evidence of the 90% of PCs in
existence running windows and the fact that companies
and orgs often only run windows so their scientists and
engineers must be running on windows.  I would like them
to have PDL as an _easy_ option.


>> - more than a perl distribution and much more than a bunch of loosely
>> connected perl packages -
>
>
> To say that PDL is "more than a [CPAN] distribution" really frustrates me.
> Moose is more than a CPAN distribution. BioPerl is more than a CPAN
> distribution. DBI is more than a CPAN distribution. They are whole
> categories of modules that allow people to get work done, and they have
> entire communities surrounding them. PDL *is* a CPAN distribution. We have
> not made it easy for related modules to spring into existence.

I can't speak to all of this but Devel::REPL is a nightmare
to install because of the embedded Moose dependencies.
It is a nice framework but *not* 1-click at the moment.

>> that don't always work nor are properly supported.
>
> You might think that most of PDL works because the interactions between the
> different components are well understood and there is a good test suite, but
> this is far from true. The test coverage on the core is abysmal (I added the
> first tests of PDL::PP a few months ago!) and most stuff works because the
> original implementer worked hard to make it work, without passing along his
> institutional knowledge. The interactions that modules have between one
> another are not well documented. In short, PDL is brittle. If we don't touch
> it, it will continue to perform brilliantly. But it won't get cool new
> stuff, either. The few fiddlings I have done with PDL::PP I have tested
> with, "Well, if the core still compiles, I guess I didn't break anything."
> That is not the right approach, IMHO.

Improving the software engineering and development standards
for PDL is important.  I'm glad you figured out that "If it compiles
it is ok" doesn't work as a testing strategy.  ;-)

>> Recently one of the packages that I used for date-time I notices was
>> providing inconsistent results. I then found that the person no longer
>> supports the package. Once a distribution becomes fractured, you will run in
>> to these types of issues as well as the integration quality.
>
> Yep, people move on. The exact same thing has happened with a number of
> modules in PDL itself. However, as it's all wrapped up into PDL, it's harder
> to notice. Our current distribution is already fractured, as evidenced by
> the fact that PDL::Fit::Linfit installs even when PDL::Slatec does not. I
> pointed this out a couple years ago but nobody has taken it upon themselves
> to fix it, in part because the original author is gone, and in part because
> it's not a trivial fix with the current PDL build system.
>
> That having been said, just because I propose that we split PDL into
> multiple distributions does not mean that the PDL Porters will no longer
> claim responsibility for them. In fact, I would like to see a way for the
> PDL Porters to accommodate even more. I do not understand, for example, why
> PDL::Stats is not part of PDL.

Adding more entities to the PDL distribution is going in
the wrong direction.  Making a crufty software system
bigger never fixes anything.  In fact you have been presenting
a very strong argument *against* this.

Hint: There is a reason why we should be more considerate
about what goes into PDL.  For now, the #1 criterion should
be: It must support *all* PDL platforms: unix/linux, macosx,
and win32.  It may happen that non-core PDL modules are
not written portably but we need to be removing those
limits from the PDL core distribution.

BTW, with the right infrastructure to support outside modules
there would be no real reason for the "kitchen sink" PDL
we have today....but wait, we already said that....

>> I completely agree Quality Assurance is number 1. Documentation is part of
>> Quality Assurance of a product.
>
>
> Yes, I have found that writing docs for my modules and writing the test
> suite often cause me to rewrite parts of my code or change my API. To borrow
> something from brian d foy, if it's hard to document, or if it's hard to
> test, that is a signal that it's probably poor programming.
>
>> Possibly an architecture of plugins is what you would like to see around a
>> PDL core.
>
>
> Well, yes, though I would call them "modules" or "module distributions"
> instead of plugins. :-)
>
>> The definition of the core though will be key. Matlab, Scilab, have
>> somewhat defined a core that would include 2d and 3d interactive graphics,
>> and then pluggable modules for specialized analysis - such as signal
>> processing, thermodynamics, etc.
>
>
> I disagree. I want a very lean core. I want something that I could
> reasonably request for a computing cluster. Fourier transforms? Yes. Matrix
> operations? Preferably, though I could be convinced that they should be in a
> separate module. 2D and 3D plotting? Unnecessary for a computing cluster.
> However, we could very easily create a Bundle or a Task module that
> incorporates different combinations of modules so instead of telling your
> coworkers to install PDL, you would tell them to install "Task::PDL::Cliff"
> or some such. And no, I'm not joking about that name. See  p3rl.org/Task for
> details about the Task namespace.
>
>>
>> R on the other hand, almost everything is a module, and it drives me crazy
>> sometimes to get what I want out of it.
>
>
> Why? What is the most irritating part? What should PDL hope to avoid? I can
> tell you that personally I don't like it when I need some Perl module which
> has a long dependency chain because installing it in the middle of writing a
> script can lead to a five minute interruption. Migration to a new machine
> can also be a headache if I have to reinstall all those modules. However,
> this is Perl and there are solutions to these sorts of issues.

It needs to be simple and easy.  Witness Puneet's ongoing trials to
use PDL.  Not what I would call friendly for a new user to try...

Cheers,
Chris

>> My 2 cents.
>>
>> CLIFF SOBCHUK
>> Core RF Engineering
>> Phone 613-667-1974   ecn: 8109-71974
>> mobile 403-819-9233
>> yahoo: sobchuk
>> www.ericsson.com
>>
>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who
>> is solely responsible for this email and its contents. All inquiries
>> regarding this email should be addressed to Ericsson. The web site for
>> Ericsson is www.ericsson.com."
>>
>> This Communication is Confidential. We only send and receive email on the
>> basis of the terms set out at www.ericsson.com/email_disclaimer
>>
>>
>> -----Original Message-----
>> From: chm [mailto:[email protected]]
>> Sent: Sunday, January 22, 2012 8:14 PM
>> To: David Mertens
>> Cc: [email protected]
>> Subject: Re: [Perldl] Let us Kvetch! (was: PDL book checking)
>>
>> On 1/22/2012 9:45 PM, David Mertens wrote:
>> > To all -
>> >
>> > I've changed the original subject. I hope this doesn't bother anybody
>> > too much.
>>
>> Subject change is ok, but you dropped the thread so users don't see the
>> earlier part of this discussion to which this appears to be a response.
>>
>> The first:
>> > -------- Original Message --------
>> > Subject: Re: PDL book checking
>> > Date: Sun, 22 Jan 2012 16:18:32 -0500
>> > From: chm <[email protected]>
>> > To: Matthew Kenworthy <[email protected]>
>> > CC: [email protected] <[email protected]>
>> >
>> > On 1/22/2012 3:57 PM, Matthew Kenworthy wrote:
>> >>>
>> >>> I forgot to point out that =ff is just what is needed to put page
>> >>> breaks at the start of each chapter...
>> >>>
>> >>>
>> >> Ah! Good to know :)
>> >>
>> >>> I'm confused.  Yes, the plan was to have a PDL::Book distribution,
>> >>> which, by definition, would include the PDL::Book.
>> >>>
>> >>>
>> >> I thought the ultimate idea was to put PDL::Book into the PDL-2.4.10
>> >> tarball, but the discussion about the sizes of the included images
>> >> nixed that idea. YOu can revive the diea by having PDL::Book only
>> >> have text and image generating scripts. I think that your point is to
>> >> keep PDL::Book a separate distribution entirely, which is where our
>> >> confusion comes in.
>> >
>> > OK.  There seems to be an enormous amount of interest in "putting" the
>> > PDL Book into the PDL distribution.
>> >
>> > While it _seems_ simple to just add it into the current "kitchen sink"
>> > PDL has, the reality is that if PDL were split into a core
>> > distribution and a number of other, separate, distributions
>> > corresponding to the external dependencies, we would be *much* better
>> > off:
>> >
>> > (1) The core would already be 100% ported since
>> >     it is mostly the external libraries and programs
>> >     that are difficult to get working consistently
>> >     across all platforms.
>> >
>> >     For example, a win32 PDL still takes
>> >     significant guru expertise to do.  I *still*
>> >     can't do it.  Although, if I took the time,
>> >     I could follow Rob's instructions and build
>> >     it eventually...
>> >
>> >     We work around that through Rob's generosity
>> >     to build and make available up-to-date PPD
>> >     versions of PDL CPAN releases, including the
>> >     latest developers release.
>> >
>> > (2) Code improvement in PDL modules could happen
>> >     faster without having to wait for the entire
>> >     PDL distribution.  By releasing frequent git
>> >     snapshots as developers releases, I've been
>> >     able to reduce some of the impact of this.
>> >
>> >     However, the developers releases are even
>> >     farther from 1-click installs then the CPAN
>> >     official releases.
>> >
>> > (3) The full on, kitchen sink version of PDL
>> >     could still be bundled up and distributed
>> >     as a single distribution rather than the
>> >     possibly dicey use of cpan or cpanm to
>> >     build all the dependencies correctly.
>> >
>> > (4) For similar reasons, having the PDL-Book-0.0.1
>> >     distribution works better: more frequent or
>> >     needed updates can be made as required, issues
>> >     of format generation and image generation will
>> >     continue to be worked out, a book isn't the
>> >     same thing as on-line help or documentation
>> >     (although they could be viewed with the same
>> >     utilities),...
>> >
>> > Cheers,
>> > Chris
>> >
>> >> And, I should add, at this point, this is a Good Idea.
>> >>
>> >> The issue of generating the figures occurred to
>> >>> me when I saw that the full size image looked fine but that the
>> >>> scaled html image had lines that were too thin and hard to see.  It
>> >>> would be better to have a separate NxN for HTML and 800x800 for PDF
>> >>> output.
>> >>>
>> >>>
>> >> Hmm, I think that good displayable single source images are possible
>> >> with
>> >> HardLW=>5 and HardCH=>2 for illustrations. But that's something for
>> >> the release after this upcoming one!
>> >>
>> >> Matt
>>
>> And the second, additional points:
>> > -------- Original Message --------
>> > Subject: Re: [Perldl] PDL book checking
>> > Date: Mon, 23 Jan 2012 00:10:11 +0100
>> > From: Henning Glawe <[email protected]>
>> > To: [email protected]
>> >
>> > On Sun, Jan 22, 2012 at 04:18:32PM -0500, chm wrote:
>> >> While it _seems_ simple to just add it into the current "kitchen
>> >> sink" PDL has, the reality is that if PDL were split into a core
>> >> distribution and a number of other, separate, distributions
>> >> corresponding to the external dependencies, we would be *much* better
>> >> off:
>> >>
>> >> (1) The core would already be 100% ported since [ ... ]
>> >> (4) For similar reasons, having the PDL-Book-0.0.1
>> >
>> > With my Debian Developer's hat on (those points mainly refer to the
>> > 'bleeding edge' of debian development, i.e.
>> > testing/unstable):
>> >
>> > (5) a problem with a single dependency would not kick all of pdl
>> >     and all packages it depends on from our testing branch,
>> >     which has happened recently due to portability problems with plplot.
>> > (6) Less problems with SONAME transitions, as only the relevant
>> >     interface module packages would need to be updated.
>> > (7) Easier/more reliable way to automatically create package
>> >     dependency lists (each interface module depends on the
>> >     corresponding library packages). As mentioned recently on this
>> >     list, the dependency list of debian's pdl package is a bit
>> >     long; I have to do the splitting into depends, suggests
>> >     and recommends manually, that's maybe why a bit too much
>> >     slipped through... this would be a lot easier if we had
>> >     a 'core' with minimal external dependencies and interface
>> >     distributions.
>> >
>> > --
>> > c u
>> > henning
>>
>> And this reply (in context):
>> > This is a well-worn discussion. The last time this was thoroughly
>> > discussed was on the porters list: see the porters' archives starting
>> > from October 31, 2009, and running into November. These are from the
>> > days when I spent a lot of effort stirring the pot, and a bit less at
>> > actually writing code.
>> > [Note: November 2009 has one of the largest collection messages in the
>> > archives, and my pot stirring has been bested by no less than the
>> > great Daniel Carrera, whose ability to stir a pot (and get docs and
>> > code written) still impresses me.] I have since repented my lack of
>> > code writing and I've tried hard to focus more on writing code and
>> > less on stirring pots. If you don't believe me, just wait for
>> > PDL::Graphics::Prima. :-)
>> >
>> > Back in late 2009 when we last discussed this, nearly everybody was in
>> > favor of splitting PDL into multiple pieces. Judd Taylor voiced a
>> > dissenting opinion, stating that PDL needs to have a large collection
>> > of numerical capabilities built-in so that it appeals to new folks. If
>> > people have to install lots of modules to get what they want, they'll
>> > just walk away. He also claimed that the lack of an install-everywhere
>> > 2D plotting library was a big issue. Read his email and others'
>> > responses to get a fuller picture:
>> > http://mailman.jach.hawaii.edu/pipermail//pdl-porters/2009-November/00
>> > 1617.html
>> >
>> > These days, I stand by my original statement, that PDL would be best
>> > served split into many smaller pieces. BioPerl underwent a similar
>> > transitions a few years ago, and many major frameworks (Test comes to
>> > mind) were built like this from the ground up, providing a simple core
>> > upon which others can build. We are a Perl technology, and I believe
>> > we would do well to embrace the current trend in Perl modules to
>> > provide simpler distributions that target specific goals.
>> >
>> > I see two issues:
>> >
>> > 1) Quality Assurance. Whenever somebody makes a change to the core,
>> > they run the *whole* test suite. If we split the core, changes made in
>> > one component will not be easily tested against other components.
>> > Solutions include (1) CPAN testers, which should be able to pick out
>> > bad interactions within a few days to a week, and (2) a continuous
>> > integration server specifically for PDL. For the latter, jitterbug, a
>> > Perl-based continuous integration system comes to mind. It would be
>> > amazing, it would take time to set up, and it would cost $$ to host
>> > the server unless somebody out there has a box sitting idle on a
>> > static IP. (Lately I have been thinking about purchasing a $7/month
>> > VPS for this very idea, and for hosting the IRC
>> > bot.)
>> >
>> > 2) Knowing where to find things. If we split things up, we must have
>> > documentation about where to find information about different PDL
>> > capabilities. This is all the more important if users are installing
>> > PDL, and need to know what to install. Installation itself is becoming
>> > much easier with local::lib, perlbrew, and the Alien packages (shout
>> > out to Joel for his recent work on Alien::Base). The current docs are
>> > not very tied together and may not give the user and idea of where (in
>> > monolithic PDL) where to look, but they do have an Index document that
>> > knows about all the installed modules. The solution to this is simple,
>> > but hard: write better docs that make thorough references to what's out
>> > there.
>> >
>> > I believe that the benefits greatly outweigh the costs, but the
>> > greatest missing piece is commitment by PDL porters and users to make it
>> > happen.
>> > Back then, I began working on Module::Build::PDL, but I lost steam
>> > when I was told that M::B::PDL would have to build the entire PDL
>> > distribution. I have since figured out that what I had accomplished
>> > with M::B::PDL can be as easily achieved using Module::Build with well
>> > crafted .pm.PL files. My opinion is now this: if we can't achieve our
>> > split of PDL into pieces that M::B can handle, then the pieces are still
>> > too big.
>> >
>> > So, here's what I say: let us kvetch! We will move forward with 2.4.10
>> > and the close follow-up of 2.4.11, but everybody should lay out their
>> > thoughts about splitting up PDL (or not). After the dust has settled,
>> > as 2.4.11 is taking form, those us of who are truly interested can
>> > re-read the discussion and decide how to move forward. In particular,
>> > I would *love* to send out another survey, this time asking about what
>> > people want and *how many hours people are willing to commit to make
>> > it happen.*
>> >
>> > David
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Perldl mailing list
>> > [email protected]
>> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
> This notion that PDL should have everything everybody wants in it already
> rubs me the wrong way. It insulates PDL from Perl and CPAN. CPAN is an
> amazing resource. Why should we insulate our users from it?
>
> Furthermore, PDL is not only monolithic, it's source file structure differs
> substantially from the layout of the vast majority of Perl modules on CPAN.
> I know about so much low-hanging fruit that a hacker with the right skills
> could easily solve for us, but they are buried in a source tree that would
> be hard for newcomers to grok.
>
> In short, there are solutions to the user-level issues that you raise. But I
> would like to make it easier to attract *developers* to PDL, and splitting
> PDL into well-defined modules is a very important first step. If nothing
> else, it signals to the Perl community, "Hey, we're alive and well, and
> we're trying to make it easier for you to hack on it."
>
> David
>
> P.S. QA is a big deal for any major next steps. Do you think you might be
> able to convince your company to spare some server time, at night perhaps,
> to run smoke tests and/or continuous integration tests?

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Let us Kvetch! (was: PDL book checking)

Reply via email to