I would highly support a modularized PDL, with a lean core and a Task::PDL which pull everything we are used to from `cpanm PDL`. If even for the testing purposes.
Joel On Mon, Jan 23, 2012 at 7:26 AM, David Mertens <[email protected]> wrote: > On Sun, Jan 22, 2012 at 11:06 PM, Clifford Sobchuk > <[email protected]> wrote: >> >> Here's my opinion. It is really nice to have PDL as a distribution. With >> better documentation a lot of the dependencies would be better highlighted >> and be able to be understood. > > > With the current monolithic PDL, it is easy for this sort of documentation > to slip through the cracks. If we were to create PDL::Slatec as a separate > distribution (for example), it would be abundantly clear whether or not the > external dependency was properly noted in the docs. It would be much easier > to specify and query about these dependencies during the build process and > give meaningful error messages. > >> >> I have seen the emails on perlbrew, alien and local::lib - but I have no >> idea what they are. > > > perlbrew is a Linux/Mac thing (hopefully someday ported also to Windows) > that makes installing and managing multiple versions of Perl about as easy > as checking your directory listings (i.e. "dir" or "ls"). > > Alien is the conceptual namespace given to managing (or at least querying) > external dependencies via a module that is installed from CPAN. Combined > with local::lib or perlbrew, this allows users to install many external > dependencies without needing administrative control of their system. > > local::lib provides a simple cross-platform (Windows, Linux, Mac, probably > others) means for installing libraries under your user account so that you > don't mess with the system perl, and you don't need admin privileges to > install modules. I have used this recently to get around the fact that at my > new job, I don't have admin access. This way I get CPAN without having to > bug my admin. > >> >> When it comes to people who use bits and pieces of perl to get there work >> done and are not perl experts, PDL is becoming more useful. I have been able >> to get two other people in my group to install it and start playing with it. > > > This is terrific, but what changed over the last couple of years in PDL that > made this possible? > >> I am going to guess that when it comes to commercial applications, unless >> the person performing the analysis already has a good background in perl, >> they will not know anything about its culture, or its mechanisms. > > > You mean in-house applications, right? If that's the case, then I can't > imagine anybody coming to PDL without either (1) already knowing about perl > or (2) having somebody who knows about Perl and is at least aware of PDL. At > any rate, if they don't know about CPAN, then why not be consistent with the > rest of CPAN? What do we lose by being consistent? > >> >> In our extended group there are probably ~40 people that use perl, all of >> us use ActiveState and several use cygwin, because they have installers that >> work and when they need a new package it is easy for them to find. About 25% >> of them even know what CPAN is never mind knowing how to use it or cpanm. > > > Right now, you can easily install PDL for ActiveState and cygwin without > having to care about things. It just works. I would like to see this happen > just as easily with CPAN installs as well, and the first step in this is > streamlining the PDL build process. Splitting PDL into multiple pieces will > not effect PDL's installation reliability for you, by the way, because the > package managers you mention will have no trouble pulling multiple dependent > packages. > >> So from my point of view, having an installable package that provides 2d >> and 3d interactive graphics is great. > > > Alas, ActiveState and cygwin have lulled you into complacency: PDL does not > support 2d graphics out of the box. 2.4.11 will address this, but right now > this is one of PDL's greatest shortcomings. Also see my notes below about > cluster computing. > >> I find myself using it more than R or SciLab now. Although there are still >> things that I find easier to do in both of those applications as well, and >> work will always be like that. One application will be better for sometype >> of analysis than another. > > > This is good news! One of R's greatest strength, to my knowledge, is CRAN. > It is *precisely* why one of my coworkers used R for a regression analysis > last month, because our boss said "There's already a package in R for > handling this." Wouldn't it be awesome if the same were true with PDL? > >> >> Right now I think that PDL is becoming a very good application > > > I would really like to know how PDL has improved over the last few years in > such a way that you say it is "becoming" a very good library. Aside from > John Cerney's work making pThreading automatic, the core hasn't changed for > years. If you think it has improved, that is *great* and we should give > kudos to whoever has implemented the changes that make your life easier. We > should also be aware of them so that future work does not change them. > > >> - more than a perl distribution and much more than a bunch of loosely >> connected perl packages - > > > To say that PDL is "more than a [CPAN] distribution" really frustrates me. > Moose is more than a CPAN distribution. BioPerl is more than a CPAN > distribution. DBI is more than a CPAN distribution. They are whole > categories of modules that allow people to get work done, and they have > entire communities surrounding them. PDL *is* a CPAN distribution. We have > not made it easy for related modules to spring into existence. > >> that don't always work nor are properly supported. > > > You might think that most of PDL works because the interactions between the > different components are well understood and there is a good test suite, but > this is far from true. The test coverage on the core is abysmal (I added the > first tests of PDL::PP a few months ago!) and most stuff works because the > original implementer worked hard to make it work, without passing along his > institutional knowledge. The interactions that modules have between one > another are not well documented. In short, PDL is brittle. If we don't touch > it, it will continue to perform brilliantly. But it won't get cool new > stuff, either. The few fiddlings I have done with PDL::PP I have tested > with, "Well, if the core still compiles, I guess I didn't break anything." > That is not the right approach, IMHO. > >> Recently one of the packages that I used for date-time I notices was >> providing inconsistent results. I then found that the person no longer >> supports the package. Once a distribution becomes fractured, you will run in >> to these types of issues as well as the integration quality. > > > Yep, people move on. The exact same thing has happened with a number of > modules in PDL itself. However, as it's all wrapped up into PDL, it's harder > to notice. Our current distribution is already fractured, as evidenced by > the fact that PDL::Fit::Linfit installs even when PDL::Slatec does not. I > pointed this out a couple years ago but nobody has taken it upon themselves > to fix it, in part because the original author is gone, and in part because > it's not a trivial fix with the current PDL build system. > > That having been said, just because I propose that we split PDL into > multiple distributions does not mean that the PDL Porters will no longer > claim responsibility for them. In fact, I would like to see a way for the > PDL Porters to accommodate even more. I do not understand, for example, why > PDL::Stats is not part of PDL. > >> I completely agree Quality Assurance is number 1. Documentation is part of >> Quality Assurance of a product. > > > Yes, I have found that writing docs for my modules and writing the test > suite often cause me to rewrite parts of my code or change my API. To borrow > something from brian d foy, if it's hard to document, or if it's hard to > test, that is a signal that it's probably poor programming. > >> Possibly an architecture of plugins is what you would like to see around a >> PDL core. > > > Well, yes, though I would call them "modules" or "module distributions" > instead of plugins. :-) > >> The definition of the core though will be key. Matlab, Scilab, have >> somewhat defined a core that would include 2d and 3d interactive graphics, >> and then pluggable modules for specialized analysis - such as signal >> processing, thermodynamics, etc. > > > I disagree. I want a very lean core. I want something that I could > reasonably request for a computing cluster. Fourier transforms? Yes. Matrix > operations? Preferably, though I could be convinced that they should be in a > separate module. 2D and 3D plotting? Unnecessary for a computing cluster. > However, we could very easily create a Bundle or a Task module that > incorporates different combinations of modules so instead of telling your > coworkers to install PDL, you would tell them to install "Task::PDL::Cliff" > or some such. And no, I'm not joking about that name. See p3rl.org/Task for > details about the Task namespace. > >> >> R on the other hand, almost everything is a module, and it drives me crazy >> sometimes to get what I want out of it. > > > Why? What is the most irritating part? What should PDL hope to avoid? I can > tell you that personally I don't like it when I need some Perl module which > has a long dependency chain because installing it in the middle of writing a > script can lead to a five minute interruption. Migration to a new machine > can also be a headache if I have to reinstall all those modules. However, > this is Perl and there are solutions to these sorts of issues. > >> My 2 cents. >> >> CLIFF SOBCHUK >> Core RF Engineering >> Phone 613-667-1974 ecn: 8109-71974 >> mobile 403-819-9233 >> yahoo: sobchuk >> www.ericsson.com >> >> "The author works for Telefonaktiebolaget L M Ericsson ("Ericsson"), who >> is solely responsible for this email and its contents. All inquiries >> regarding this email should be addressed to Ericsson. The web site for >> Ericsson is www.ericsson.com." >> >> This Communication is Confidential. We only send and receive email on the >> basis of the terms set out at www.ericsson.com/email_disclaimer >> >> >> -----Original Message----- >> From: chm [mailto:[email protected]] >> Sent: Sunday, January 22, 2012 8:14 PM >> To: David Mertens >> Cc: [email protected] >> Subject: Re: [Perldl] Let us Kvetch! (was: PDL book checking) >> >> On 1/22/2012 9:45 PM, David Mertens wrote: >> > To all - >> > >> > I've changed the original subject. I hope this doesn't bother anybody >> > too much. >> >> Subject change is ok, but you dropped the thread so users don't see the >> earlier part of this discussion to which this appears to be a response. >> >> The first: >> > -------- Original Message -------- >> > Subject: Re: PDL book checking >> > Date: Sun, 22 Jan 2012 16:18:32 -0500 >> > From: chm <[email protected]> >> > To: Matthew Kenworthy <[email protected]> >> > CC: [email protected] <[email protected]> >> > >> > On 1/22/2012 3:57 PM, Matthew Kenworthy wrote: >> >>> >> >>> I forgot to point out that =ff is just what is needed to put page >> >>> breaks at the start of each chapter... >> >>> >> >>> >> >> Ah! Good to know :) >> >> >> >>> I'm confused. Yes, the plan was to have a PDL::Book distribution, >> >>> which, by definition, would include the PDL::Book. >> >>> >> >>> >> >> I thought the ultimate idea was to put PDL::Book into the PDL-2.4.10 >> >> tarball, but the discussion about the sizes of the included images >> >> nixed that idea. YOu can revive the diea by having PDL::Book only >> >> have text and image generating scripts. I think that your point is to >> >> keep PDL::Book a separate distribution entirely, which is where our >> >> confusion comes in. >> > >> > OK. There seems to be an enormous amount of interest in "putting" the >> > PDL Book into the PDL distribution. >> > >> > While it _seems_ simple to just add it into the current "kitchen sink" >> > PDL has, the reality is that if PDL were split into a core >> > distribution and a number of other, separate, distributions >> > corresponding to the external dependencies, we would be *much* better >> > off: >> > >> > (1) The core would already be 100% ported since >> > it is mostly the external libraries and programs >> > that are difficult to get working consistently >> > across all platforms. >> > >> > For example, a win32 PDL still takes >> > significant guru expertise to do. I *still* >> > can't do it. Although, if I took the time, >> > I could follow Rob's instructions and build >> > it eventually... >> > >> > We work around that through Rob's generosity >> > to build and make available up-to-date PPD >> > versions of PDL CPAN releases, including the >> > latest developers release. >> > >> > (2) Code improvement in PDL modules could happen >> > faster without having to wait for the entire >> > PDL distribution. By releasing frequent git >> > snapshots as developers releases, I've been >> > able to reduce some of the impact of this. >> > >> > However, the developers releases are even >> > farther from 1-click installs then the CPAN >> > official releases. >> > >> > (3) The full on, kitchen sink version of PDL >> > could still be bundled up and distributed >> > as a single distribution rather than the >> > possibly dicey use of cpan or cpanm to >> > build all the dependencies correctly. >> > >> > (4) For similar reasons, having the PDL-Book-0.0.1 >> > distribution works better: more frequent or >> > needed updates can be made as required, issues >> > of format generation and image generation will >> > continue to be worked out, a book isn't the >> > same thing as on-line help or documentation >> > (although they could be viewed with the same >> > utilities),... >> > >> > Cheers, >> > Chris >> > >> >> And, I should add, at this point, this is a Good Idea. >> >> >> >> The issue of generating the figures occurred to >> >>> me when I saw that the full size image looked fine but that the >> >>> scaled html image had lines that were too thin and hard to see. It >> >>> would be better to have a separate NxN for HTML and 800x800 for PDF >> >>> output. >> >>> >> >>> >> >> Hmm, I think that good displayable single source images are possible >> >> with >> >> HardLW=>5 and HardCH=>2 for illustrations. But that's something for >> >> the release after this upcoming one! >> >> >> >> Matt >> >> And the second, additional points: >> > -------- Original Message -------- >> > Subject: Re: [Perldl] PDL book checking >> > Date: Mon, 23 Jan 2012 00:10:11 +0100 >> > From: Henning Glawe <[email protected]> >> > To: [email protected] >> > >> > On Sun, Jan 22, 2012 at 04:18:32PM -0500, chm wrote: >> >> While it _seems_ simple to just add it into the current "kitchen >> >> sink" PDL has, the reality is that if PDL were split into a core >> >> distribution and a number of other, separate, distributions >> >> corresponding to the external dependencies, we would be *much* better >> >> off: >> >> >> >> (1) The core would already be 100% ported since [ ... ] >> >> (4) For similar reasons, having the PDL-Book-0.0.1 >> > >> > With my Debian Developer's hat on (those points mainly refer to the >> > 'bleeding edge' of debian development, i.e. >> > testing/unstable): >> > >> > (5) a problem with a single dependency would not kick all of pdl >> > and all packages it depends on from our testing branch, >> > which has happened recently due to portability problems with plplot. >> > (6) Less problems with SONAME transitions, as only the relevant >> > interface module packages would need to be updated. >> > (7) Easier/more reliable way to automatically create package >> > dependency lists (each interface module depends on the >> > corresponding library packages). As mentioned recently on this >> > list, the dependency list of debian's pdl package is a bit >> > long; I have to do the splitting into depends, suggests >> > and recommends manually, that's maybe why a bit too much >> > slipped through... this would be a lot easier if we had >> > a 'core' with minimal external dependencies and interface >> > distributions. >> > >> > -- >> > c u >> > henning >> >> And this reply (in context): >> > This is a well-worn discussion. The last time this was thoroughly >> > discussed was on the porters list: see the porters' archives starting >> > from October 31, 2009, and running into November. These are from the >> > days when I spent a lot of effort stirring the pot, and a bit less at >> > actually writing code. >> > [Note: November 2009 has one of the largest collection messages in the >> > archives, and my pot stirring has been bested by no less than the >> > great Daniel Carrera, whose ability to stir a pot (and get docs and >> > code written) still impresses me.] I have since repented my lack of >> > code writing and I've tried hard to focus more on writing code and >> > less on stirring pots. If you don't believe me, just wait for >> > PDL::Graphics::Prima. :-) >> > >> > Back in late 2009 when we last discussed this, nearly everybody was in >> > favor of splitting PDL into multiple pieces. Judd Taylor voiced a >> > dissenting opinion, stating that PDL needs to have a large collection >> > of numerical capabilities built-in so that it appeals to new folks. If >> > people have to install lots of modules to get what they want, they'll >> > just walk away. He also claimed that the lack of an install-everywhere >> > 2D plotting library was a big issue. Read his email and others' >> > responses to get a fuller picture: >> > http://mailman.jach.hawaii.edu/pipermail//pdl-porters/2009-November/00 >> > 1617.html >> > >> > These days, I stand by my original statement, that PDL would be best >> > served split into many smaller pieces. BioPerl underwent a similar >> > transitions a few years ago, and many major frameworks (Test comes to >> > mind) were built like this from the ground up, providing a simple core >> > upon which others can build. We are a Perl technology, and I believe >> > we would do well to embrace the current trend in Perl modules to >> > provide simpler distributions that target specific goals. >> > >> > I see two issues: >> > >> > 1) Quality Assurance. Whenever somebody makes a change to the core, >> > they run the *whole* test suite. If we split the core, changes made in >> > one component will not be easily tested against other components. >> > Solutions include (1) CPAN testers, which should be able to pick out >> > bad interactions within a few days to a week, and (2) a continuous >> > integration server specifically for PDL. For the latter, jitterbug, a >> > Perl-based continuous integration system comes to mind. It would be >> > amazing, it would take time to set up, and it would cost $$ to host >> > the server unless somebody out there has a box sitting idle on a >> > static IP. (Lately I have been thinking about purchasing a $7/month >> > VPS for this very idea, and for hosting the IRC >> > bot.) >> > >> > 2) Knowing where to find things. If we split things up, we must have >> > documentation about where to find information about different PDL >> > capabilities. This is all the more important if users are installing >> > PDL, and need to know what to install. Installation itself is becoming >> > much easier with local::lib, perlbrew, and the Alien packages (shout >> > out to Joel for his recent work on Alien::Base). The current docs are >> > not very tied together and may not give the user and idea of where (in >> > monolithic PDL) where to look, but they do have an Index document that >> > knows about all the installed modules. The solution to this is simple, >> > but hard: write better docs that make thorough references to what's out >> > there. >> > >> > I believe that the benefits greatly outweigh the costs, but the >> > greatest missing piece is commitment by PDL porters and users to make it >> > happen. >> > Back then, I began working on Module::Build::PDL, but I lost steam >> > when I was told that M::B::PDL would have to build the entire PDL >> > distribution. I have since figured out that what I had accomplished >> > with M::B::PDL can be as easily achieved using Module::Build with well >> > crafted .pm.PL files. My opinion is now this: if we can't achieve our >> > split of PDL into pieces that M::B can handle, then the pieces are still >> > too big. >> > >> > So, here's what I say: let us kvetch! We will move forward with 2.4.10 >> > and the close follow-up of 2.4.11, but everybody should lay out their >> > thoughts about splitting up PDL (or not). After the dust has settled, >> > as 2.4.11 is taking form, those us of who are truly interested can >> > re-read the discussion and decide how to move forward. In particular, >> > I would *love* to send out another survey, this time asking about what >> > people want and *how many hours people are willing to commit to make >> > it happen.* >> > >> > David >> > >> > >> > >> > >> > _______________________________________________ >> > Perldl mailing list >> > [email protected] >> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl >> >> >> _______________________________________________ >> Perldl mailing list >> [email protected] >> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > > > This notion that PDL should have everything everybody wants in it already > rubs me the wrong way. It insulates PDL from Perl and CPAN. CPAN is an > amazing resource. Why should we insulate our users from it? > > Furthermore, PDL is not only monolithic, it's source file structure differs > substantially from the layout of the vast majority of Perl modules on CPAN. > I know about so much low-hanging fruit that a hacker with the right skills > could easily solve for us, but they are buried in a source tree that would > be hard for newcomers to grok. > > In short, there are solutions to the user-level issues that you raise. But I > would like to make it easier to attract *developers* to PDL, and splitting > PDL into well-defined modules is a very important first step. If nothing > else, it signals to the Perl community, "Hey, we're alive and well, and > we're trying to make it easier for you to hack on it." > > David > > P.S. QA is a big deal for any major next steps. Do you think you might be > able to convince your company to spare some server time, at night perhaps, > to run smoke tests and/or continuous integration tests? > > _______________________________________________ > Perldl mailing list > [email protected] > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl > _______________________________________________ Perldl mailing list [email protected] http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
