Re: [Perldl] Let us Kvetch! (was: PDL book checking)

Joel Berger Mon, 23 Jan 2012 05:59:44 -0800

I would highly support a modularized PDL, with a lean core and a
Task::PDL which pull everything we are used to from `cpanm PDL`. If
even for the testing purposes.


Joel


On Mon, Jan 23, 2012 at 7:26 AM, David Mertens <[email protected]> wrote:
> On Sun, Jan 22, 2012 at 11:06 PM, Clifford Sobchuk
> <[email protected]> wrote:
>>
>> Here's my opinion. It is really nice to have PDL as a distribution. With
>> better documentation a lot of the dependencies would be better highlighted
>> and be able to be understood.
>
>
> With the current monolithic PDL, it is easy for this sort of documentation
> to slip through the cracks. If we were to create PDL::Slatec as a separate
> distribution (for example), it would be abundantly clear whether or not the
> external dependency was properly noted in the docs. It would be much easier
> to specify and query about these dependencies during the build process and
> give meaningful error messages.
>
>>
>> I have seen the emails on perlbrew, alien and local::lib - but I have no
>> idea what they are.
>
>
> perlbrew is a Linux/Mac thing (hopefully someday ported also to Windows)
> that makes installing and managing multiple versions of Perl about as easy
> as checking your directory listings (i.e. "dir" or "ls").
>
> Alien is the conceptual namespace given to managing (or at least querying)
> external dependencies via a module that is installed from CPAN. Combined
> with local::lib or perlbrew, this allows users to install many external
> dependencies without needing administrative control of their system.
>
> local::lib provides a simple cross-platform (Windows, Linux, Mac, probably
> others) means for installing libraries under your user account so that you
> don't mess with the system perl, and you don't need admin privileges to
> install modules. I have used this recently to get around the fact that at my
> new job, I don't have admin access. This way I get CPAN without having to
> bug my admin.
>
>>
>> When it comes to people who use bits and pieces of perl to get there work
>> done and are not perl experts, PDL is becoming more useful. I have been able
>> to get two other people in my group to install it and start playing with it.
>
>
> This is terrific, but what changed over the last couple of years in PDL that
> made this possible?
>
>> I am going to guess that when it comes to commercial applications, unless
>> the person performing the analysis already has a good background in perl,
>> they will not know anything about its culture, or its mechanisms.
>
>
> You mean in-house applications, right? If that's the case, then I can't
> imagine anybody coming to PDL without either (1) already knowing about perl
> or (2) having somebody who knows about Perl and is at least aware of PDL. At
> any rate, if they don't know about CPAN, then why not be consistent with the
> rest of CPAN? What do we lose by being consistent?
>
>>
>> In our extended group there are probably ~40 people that use perl, all of
>> us use ActiveState and several use cygwin, because they have installers that
>> work and when they need a new package it is easy for them to find. About 25%
>> of them even know what CPAN is never mind knowing how to use it or cpanm.
>
>
> Right now, you can easily install PDL for ActiveState and cygwin without
> having to care about things. It just works. I would like to see this happen
> just as easily with CPAN installs as well, and the first step in this is
> streamlining the PDL build process. Splitting PDL into multiple pieces will
> not effect PDL's installation reliability for you, by the way, because the
> package managers you mention will have no trouble pulling multiple dependent
> packages.
>
>> So from my point of view, having an installable package that provides 2d
>> and 3d interactive graphics is great.
>
>
> Alas, ActiveState and cygwin have lulled you into complacency: PDL does not
> support 2d graphics out of the box. 2.4.11 will address this, but right now
> this is one of PDL's greatest shortcomings. Also see my notes below about
> cluster computing.
>
>> I find myself using it more than R or SciLab now. Although there are still
>> things that I find easier to do in both of those applications as well, and
>> work will always be like that. One application will be better for sometype
>> of analysis than another.
>
>
> This is good news! One of R's greatest strength, to my knowledge, is CRAN.
> It is *precisely* why one of my coworkers used R for a regression analysis
> last month, because our boss said "There's already a package in R for
> handling this." Wouldn't it be awesome if the same were true with PDL?
>
>>
>> Right now I think that PDL is becoming a very good application
>
>
> I would really like to know how PDL has improved over the last few years in
> such a way that you say it is "becoming" a very good library. Aside from
> John Cerney's work making pThreading automatic, the core hasn't changed for
> years. If you think it has improved, that is *great* and we should give
> kudos to whoever has implemented the changes that make your life easier. We
> should also be aware of them so that future work does not change them.
>
>
>> - more than a perl distribution and much more than a bunch of loosely
>> connected perl packages -
>
>
> To say that PDL is "more than a [CPAN] distribution" really frustrates me.
> Moose is more than a CPAN distribution. BioPerl is more than a CPAN
> distribution. DBI is more than a CPAN distribution. They are whole
> categories of modules that allow people to get work done, and they have
> entire communities surrounding them. PDL *is* a CPAN distribution. We have
> not made it easy for related modules to spring into existence.
>
>> that don't always work nor are properly supported.
>
>
> You might think that most of PDL works because the interactions between the
> different components are well understood and there is a good test suite, but
> this is far from true. The test coverage on the core is abysmal (I added the
> first tests of PDL::PP a few months ago!) and most stuff works because the
> original implementer worked hard to make it work, without passing along his
> institutional knowledge. The interactions that modules have between one
> another are not well documented. In short, PDL is brittle. If we don't touch
> it, it will continue to perform brilliantly. But it won't get cool new
> stuff, either. The few fiddlings I have done with PDL::PP I have tested
> with, "Well, if the core still compiles, I guess I didn't break anything."
> That is not the right approach, IMHO.
>
>> Recently one of the packages that I used for date-time I notices was
>> providing inconsistent results. I then found that the person no longer
>> supports the package. Once a distribution becomes fractured, you will run in
>> to these types of issues as well as the integration quality.
>
>
> Yep, people move on. The exact same thing has happened with a number of
> modules in PDL itself. However, as it's all wrapped up into PDL, it's harder
> to notice. Our current distribution is already fractured, as evidenced by
> the fact that PDL::Fit::Linfit installs even when PDL::Slatec does not. I
> pointed this out a couple years ago but nobody has taken it upon themselves
> to fix it, in part because the original author is gone, and in part because
> it's not a trivial fix with the current PDL build system.
>
> That having been said, just because I propose that we split PDL into
> multiple distributions does not mean that the PDL Porters will no longer
> claim responsibility for them. In fact, I would like to see a way for the
> PDL Porters to accommodate even more. I do not understand, for example, why
> PDL::Stats is not part of PDL.
>
>> I completely agree Quality Assurance is number 1. Documentation is part of
>> Quality Assurance of a product.
>
>
> Yes, I have found that writing docs for my modules and writing the test
> suite often cause me to rewrite parts of my code or change my API. To borrow
> something from brian d foy, if it's hard to document, or if it's hard to
> test, that is a signal that it's probably poor programming.
>
>> Possibly an architecture of plugins is what you would like to see around a
>> PDL core.
>
>
> Well, yes, though I would call them "modules" or "module distributions"
> instead of plugins. :-)
>
>> The definition of the core though will be key. Matlab, Scilab, have
>> somewhat defined a core that would include 2d and 3d interactive graphics,
>> and then pluggable modules for specialized analysis - such as signal
>> processing, thermodynamics, etc.
>
>
> I disagree. I want a very lean core. I want something that I could
> reasonably request for a computing cluster. Fourier transforms? Yes. Matrix
> operations? Preferably, though I could be convinced that they should be in a
> separate module. 2D and 3D plotting? Unnecessary for a computing cluster.
> However, we could very easily create a Bundle or a Task module that
> incorporates different combinations of modules so instead of telling your
> coworkers to install PDL, you would tell them to install "Task::PDL::Cliff"
> or some such. And no, I'm not joking about that name. See  p3rl.org/Task for
> details about the Task namespace.
>
>>
>> R on the other hand, almost everything is a module, and it drives me crazy
>> sometimes to get what I want out of it.
>
>
> Why? What is the most irritating part? What should PDL hope to avoid? I can
> tell you that personally I don't like it when I need some Perl module which
> has a long dependency chain because installing it in the middle of writing a
> script can lead to a five minute interruption. Migration to a new machine
> can also be a headache if I have to reinstall all those modules. However,
> this is Perl and there are solutions to these sorts of issues.
>
>> My 2 cents.
>>
>> CLIFF SOBCHUK
>> Core RF Engineering
>> Phone 613-667-1974   ecn: 8109-71974
>> mobile 403-819-9233
>> yahoo: sobchuk
>> www.ericsson.com
>>
>> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who
>> is solely responsible for this email and its contents. All inquiries
>> regarding this email should be addressed to Ericsson. The web site for
>> Ericsson is www.ericsson.com."
>>
>> This Communication is Confidential. We only send and receive email on the
>> basis of the terms set out at www.ericsson.com/email_disclaimer
>>
>>
>> -----Original Message-----
>> From: chm [mailto:[email protected]]
>> Sent: Sunday, January 22, 2012 8:14 PM
>> To: David Mertens
>> Cc: [email protected]
>> Subject: Re: [Perldl] Let us Kvetch! (was: PDL book checking)
>>
>> On 1/22/2012 9:45 PM, David Mertens wrote:
>> > To all -
>> >
>> > I've changed the original subject. I hope this doesn't bother anybody
>> > too much.
>>
>> Subject change is ok, but you dropped the thread so users don't see the
>> earlier part of this discussion to which this appears to be a response.
>>
>> The first:
>> > -------- Original Message --------
>> > Subject: Re: PDL book checking
>> > Date: Sun, 22 Jan 2012 16:18:32 -0500
>> > From: chm <[email protected]>
>> > To: Matthew Kenworthy <[email protected]>
>> > CC: [email protected] <[email protected]>
>> >
>> > On 1/22/2012 3:57 PM, Matthew Kenworthy wrote:
>> >>>
>> >>> I forgot to point out that =ff is just what is needed to put page
>> >>> breaks at the start of each chapter...
>> >>>
>> >>>
>> >> Ah! Good to know :)
>> >>
>> >>> I'm confused.  Yes, the plan was to have a PDL::Book distribution,
>> >>> which, by definition, would include the PDL::Book.
>> >>>
>> >>>
>> >> I thought the ultimate idea was to put PDL::Book into the PDL-2.4.10
>> >> tarball, but the discussion about the sizes of the included images
>> >> nixed that idea. YOu can revive the diea by having PDL::Book only
>> >> have text and image generating scripts. I think that your point is to
>> >> keep PDL::Book a separate distribution entirely, which is where our
>> >> confusion comes in.
>> >
>> > OK.  There seems to be an enormous amount of interest in "putting" the
>> > PDL Book into the PDL distribution.
>> >
>> > While it _seems_ simple to just add it into the current "kitchen sink"
>> > PDL has, the reality is that if PDL were split into a core
>> > distribution and a number of other, separate, distributions
>> > corresponding to the external dependencies, we would be *much* better
>> > off:
>> >
>> > (1) The core would already be 100% ported since
>> >     it is mostly the external libraries and programs
>> >     that are difficult to get working consistently
>> >     across all platforms.
>> >
>> >     For example, a win32 PDL still takes
>> >     significant guru expertise to do.  I *still*
>> >     can't do it.  Although, if I took the time,
>> >     I could follow Rob's instructions and build
>> >     it eventually...
>> >
>> >     We work around that through Rob's generosity
>> >     to build and make available up-to-date PPD
>> >     versions of PDL CPAN releases, including the
>> >     latest developers release.
>> >
>> > (2) Code improvement in PDL modules could happen
>> >     faster without having to wait for the entire
>> >     PDL distribution.  By releasing frequent git
>> >     snapshots as developers releases, I've been
>> >     able to reduce some of the impact of this.
>> >
>> >     However, the developers releases are even
>> >     farther from 1-click installs then the CPAN
>> >     official releases.
>> >
>> > (3) The full on, kitchen sink version of PDL
>> >     could still be bundled up and distributed
>> >     as a single distribution rather than the
>> >     possibly dicey use of cpan or cpanm to
>> >     build all the dependencies correctly.
>> >
>> > (4) For similar reasons, having the PDL-Book-0.0.1
>> >     distribution works better: more frequent or
>> >     needed updates can be made as required, issues
>> >     of format generation and image generation will
>> >     continue to be worked out, a book isn't the
>> >     same thing as on-line help or documentation
>> >     (although they could be viewed with the same
>> >     utilities),...
>> >
>> > Cheers,
>> > Chris
>> >
>> >> And, I should add, at this point, this is a Good Idea.
>> >>
>> >> The issue of generating the figures occurred to
>> >>> me when I saw that the full size image looked fine but that the
>> >>> scaled html image had lines that were too thin and hard to see.  It
>> >>> would be better to have a separate NxN for HTML and 800x800 for PDF
>> >>> output.
>> >>>
>> >>>
>> >> Hmm, I think that good displayable single source images are possible
>> >> with
>> >> HardLW=>5 and HardCH=>2 for illustrations. But that's something for
>> >> the release after this upcoming one!
>> >>
>> >> Matt
>>
>> And the second, additional points:
>> > -------- Original Message --------
>> > Subject: Re: [Perldl] PDL book checking
>> > Date: Mon, 23 Jan 2012 00:10:11 +0100
>> > From: Henning Glawe <[email protected]>
>> > To: [email protected]
>> >
>> > On Sun, Jan 22, 2012 at 04:18:32PM -0500, chm wrote:
>> >> While it _seems_ simple to just add it into the current "kitchen
>> >> sink" PDL has, the reality is that if PDL were split into a core
>> >> distribution and a number of other, separate, distributions
>> >> corresponding to the external dependencies, we would be *much* better
>> >> off:
>> >>
>> >> (1) The core would already be 100% ported since [ ... ]
>> >> (4) For similar reasons, having the PDL-Book-0.0.1
>> >
>> > With my Debian Developer's hat on (those points mainly refer to the
>> > 'bleeding edge' of debian development, i.e.
>> > testing/unstable):
>> >
>> > (5) a problem with a single dependency would not kick all of pdl
>> >     and all packages it depends on from our testing branch,
>> >     which has happened recently due to portability problems with plplot.
>> > (6) Less problems with SONAME transitions, as only the relevant
>> >     interface module packages would need to be updated.
>> > (7) Easier/more reliable way to automatically create package
>> >     dependency lists (each interface module depends on the
>> >     corresponding library packages). As mentioned recently on this
>> >     list, the dependency list of debian's pdl package is a bit
>> >     long; I have to do the splitting into depends, suggests
>> >     and recommends manually, that's maybe why a bit too much
>> >     slipped through... this would be a lot easier if we had
>> >     a 'core' with minimal external dependencies and interface
>> >     distributions.
>> >
>> > --
>> > c u
>> > henning
>>
>> And this reply (in context):
>> > This is a well-worn discussion. The last time this was thoroughly
>> > discussed was on the porters list: see the porters' archives starting
>> > from October 31, 2009, and running into November. These are from the
>> > days when I spent a lot of effort stirring the pot, and a bit less at
>> > actually writing code.
>> > [Note: November 2009 has one of the largest collection messages in the
>> > archives, and my pot stirring has been bested by no less than the
>> > great Daniel Carrera, whose ability to stir a pot (and get docs and
>> > code written) still impresses me.] I have since repented my lack of
>> > code writing and I've tried hard to focus more on writing code and
>> > less on stirring pots. If you don't believe me, just wait for
>> > PDL::Graphics::Prima. :-)
>> >
>> > Back in late 2009 when we last discussed this, nearly everybody was in
>> > favor of splitting PDL into multiple pieces. Judd Taylor voiced a
>> > dissenting opinion, stating that PDL needs to have a large collection
>> > of numerical capabilities built-in so that it appeals to new folks. If
>> > people have to install lots of modules to get what they want, they'll
>> > just walk away. He also claimed that the lack of an install-everywhere
>> > 2D plotting library was a big issue. Read his email and others'
>> > responses to get a fuller picture:
>> > http://mailman.jach.hawaii.edu/pipermail//pdl-porters/2009-November/00
>> > 1617.html
>> >
>> > These days, I stand by my original statement, that PDL would be best
>> > served split into many smaller pieces. BioPerl underwent a similar
>> > transitions a few years ago, and many major frameworks (Test comes to
>> > mind) were built like this from the ground up, providing a simple core
>> > upon which others can build. We are a Perl technology, and I believe
>> > we would do well to embrace the current trend in Perl modules to
>> > provide simpler distributions that target specific goals.
>> >
>> > I see two issues:
>> >
>> > 1) Quality Assurance. Whenever somebody makes a change to the core,
>> > they run the *whole* test suite. If we split the core, changes made in
>> > one component will not be easily tested against other components.
>> > Solutions include (1) CPAN testers, which should be able to pick out
>> > bad interactions within a few days to a week, and (2) a continuous
>> > integration server specifically for PDL. For the latter, jitterbug, a
>> > Perl-based continuous integration system comes to mind. It would be
>> > amazing, it would take time to set up, and it would cost $$ to host
>> > the server unless somebody out there has a box sitting idle on a
>> > static IP. (Lately I have been thinking about purchasing a $7/month
>> > VPS for this very idea, and for hosting the IRC
>> > bot.)
>> >
>> > 2) Knowing where to find things. If we split things up, we must have
>> > documentation about where to find information about different PDL
>> > capabilities. This is all the more important if users are installing
>> > PDL, and need to know what to install. Installation itself is becoming
>> > much easier with local::lib, perlbrew, and the Alien packages (shout
>> > out to Joel for his recent work on Alien::Base). The current docs are
>> > not very tied together and may not give the user and idea of where (in
>> > monolithic PDL) where to look, but they do have an Index document that
>> > knows about all the installed modules. The solution to this is simple,
>> > but hard: write better docs that make thorough references to what's out
>> > there.
>> >
>> > I believe that the benefits greatly outweigh the costs, but the
>> > greatest missing piece is commitment by PDL porters and users to make it
>> > happen.
>> > Back then, I began working on Module::Build::PDL, but I lost steam
>> > when I was told that M::B::PDL would have to build the entire PDL
>> > distribution. I have since figured out that what I had accomplished
>> > with M::B::PDL can be as easily achieved using Module::Build with well
>> > crafted .pm.PL files. My opinion is now this: if we can't achieve our
>> > split of PDL into pieces that M::B can handle, then the pieces are still
>> > too big.
>> >
>> > So, here's what I say: let us kvetch! We will move forward with 2.4.10
>> > and the close follow-up of 2.4.11, but everybody should lay out their
>> > thoughts about splitting up PDL (or not). After the dust has settled,
>> > as 2.4.11 is taking form, those us of who are truly interested can
>> > re-read the discussion and decide how to move forward. In particular,
>> > I would *love* to send out another survey, this time asking about what
>> > people want and *how many hours people are willing to commit to make
>> > it happen.*
>> >
>> > David
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > Perldl mailing list
>> > [email protected]
>> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>> _______________________________________________
>> Perldl mailing list
>> [email protected]
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>
> This notion that PDL should have everything everybody wants in it already
> rubs me the wrong way. It insulates PDL from Perl and CPAN. CPAN is an
> amazing resource. Why should we insulate our users from it?
>
> Furthermore, PDL is not only monolithic, it's source file structure differs
> substantially from the layout of the vast majority of Perl modules on CPAN.
> I know about so much low-hanging fruit that a hacker with the right skills
> could easily solve for us, but they are buried in a source tree that would
> be hard for newcomers to grok.
>
> In short, there are solutions to the user-level issues that you raise. But I
> would like to make it easier to attract *developers* to PDL, and splitting
> PDL into well-defined modules is a very important first step. If nothing
> else, it signals to the Perl community, "Hey, we're alive and well, and
> we're trying to make it easier for you to hack on it."
>
> David
>
> P.S. QA is a big deal for any major next steps. Do you think you might be
> able to convince your company to spare some server time, at night perhaps,
> to run smoke tests and/or continuous integration tests?
>
> _______________________________________________
> Perldl mailing list
> [email protected]
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Re: [Perldl] Let us Kvetch! (was: PDL book checking)

Reply via email to