Re: [Pdl-devel] Faster PDL Development Cycle---But How?

kmx Thu, 03 Sep 2015 04:54:54 -0700

My view:

I agree with PDLA and I support basically all goals that where mentioned.On the other hand I see also couple of possible gotchas.

First is (potential) users confusion. With ongoing work on breaking PDL(A)into smaller pieces there will be sooner or later more PDLA::* modules thanPDL::* on CPAN. Maybe all PDLA modules should have something like this:


...
=head1 NAME

(experimental PDL fork)  PDL interface to something cool.

=head1 SYNOPSIS
...

Then it will be clear from search result page on metacpan.org orsearch.cpan.org that the module is what it is.

Next: although PDLA is intended to be a place for agile development I thinkit still should have set some milestones (like: a / unbundling jumbo-PDL,b/ reshaping makefiles, c/ rewriting pdl-pp parser, d/ new core, e/ newobject model, ...). Considering how many developers will be activelycontributing the milestones should be perhaps sorted and work onsequentially. These milestones, once finished, might be a good opportunityfor thorough peer review from those PDL devs/users that will not beactively participating in PDLA.

And the last: considering the high demand for stability (see other posts inthis thread) I am not quite sure that the idea of "in the end PDLArepository will replace PDL " will work. Maybe thefinished/polished/discussed/agreed/reviewed/tested changes and ideas fromPDLA should be step-by-step (milestone-by-milestone) brought to mainstreamPDL so that in the end PDL == PDLA (at some point on this way the majorversion will bump from 2 to 3).


--
kmx


On 25.8.2015 19:42, Zakariyya Mughal wrote:

On 2015-08-24 at 23:48:51 +0000, Chris Marshall wrote:

PDL Developers-

      With the addition of two active and highly motivated PDL developers
(Zakariyya Mughal and Guggle "Ed" Worth) we've made significant progress
in cleaning up the PDL distribution itself and the development process
itself.  PDL is now run through test builds automatically on git commit
via the Travis-CI framework of github.  Many perl platforms and PDL
configuration options are exercised.  PDL-2.013 was the best tested
pre-release release ever.

      The current process we've been working toward is to make
PDL development faster and more responsive by breaking up the current
monolithic PDL distribution into a lean core (roughly the current
PDL::Core, PDL:PP, and PDL::Slices) and spinning off the other modules
for IO, Graphics, and Library interfaces as their own CPAN releases.
This would enable the separate module/distributions to have a faster
development-test-relese cycle since that process would not be held up by
the testing of the full PDL distribution with all its subcomponents,
even if they are completely independent/unrelated to the separate module
changes being made.

      We're ready to make the split, but there is a catch...  How can we
have the rapid agile development needed to bring the next generation
PDL3 possible _without_ losing the "PDL just works" that has been one of
the primary focus of PDL-2.x development since I volunteered as release
manager circa PDL-2.4.3 [sic]?

      There has been some discussion, largely on #pdl, about how to best
proceed.  One idea is to move to a constant release mode which could be
expedited by adding co-maints to PDL.  I've not acted on that largely
because I feel that PDL just working, easy to get and start to use, is
essential to survive as a minority numeric computation engine (compared
with R, NumPy, Octave/MATLAB).  How can we grow market share if it takes
a perl expert to start using PDL?

      That said, I think the "big split" is the best way forward for PDL
to grow and thrive.  The ideas for the PDL3 core engine show great
promise for the kind of dynamic development as occurred when Karl first
conceived and implemented the idea that would become PDL.
Unfortunately, my experience with rapid sequential releases is a sort of
"churn" where it is difficult to know if you'll be able to get a working
module at any given release.  So what to do...

      One idea I had is change the stable PDL release distribution into a
PDL bundle.  That would be the "stable PDL" that would be easy to get
and install.  The sub-modules would then be able to have independent
development forming the "experimental PDL" track.  Another way, a bit
more crude, would be to make a fixed "stable PDL" release that would be
the one to install.  Maybe we could use specific version information to
work with cpan, cpanm,...

      Here's where we need your input for discussion and consensus.
Please feel free to comment on any of the above, or to offer your own
thoughts.  The goal is to select the preferred approach for modern PDL
development and move out on it.  I would like to complete this discuss
process within the next two weeks.  At that point we should be able to
make a specific plan for any final comments with the agile development
to begin shortly after.

Let the discussions begin!

Hello Chris,

First off, thank you for starting this conversation.

Ed and I have been working on and off as time permits on preparing for
the split. The work we've been doing hasn't really generated much
traffic on the pdl-devel mailing list, but the #pdl and PDLPorters
GitHub organisation shows a very different story. There is a lot going
on there every few days. The discussion on those two mediums is a little
more agile than the mailing list or SourceForge and helps with formulating

I highly recommend joining both by watching the repositories in
PDLPorters and following the IRC by either joining in a client or
tracking the backlog with<http://irclog.perlgeek.de/pdl/>.

I'd like to summarise some of what we came up with on GitHub/IRC:

  1. A split is necessary to not only make releases easier, but also
     development. We have worked on reducing the time required to build
     PDL across multiple environments down to a little over 1 hour.

     This is still too long when you have perhaps 1.5 hours of tuits that
     day. So the work inevitably gets spread out over weeks.

     A split would help decrease this friction.

  2. Making `cpanm PDL` always work has always part of the plan.
     Improving the PDL devops has helped with that. The plan is to
     continue doing that.

     But large refactors such as this split can be quite daunting. We
     can't be sure we will stick the landing right the first time. But
     the job needs to move forward or it will fail via analysis paralysis
     even before it has begun.

  3. Ed and I have been thinking about releasing a more agile, friendly
     fork of PDL under the PDLA namespace (for PDL Agile). The
     repositories will continue to live under the PDLPorters GitHub
     organisation.

     We will start by applying the split. This will be followed by
     improving code coverage, fixes to the 64-bit indexing, formalising
     the badvalue semantics for more functions, and bug-fixes.

     We plan on making sure that libraries such as PDL-Stats, PDL-IO-CSV,
     etc. remain compatible with this library. I believe there is a way
     to do this without making changes to the original code (via a subref
     in @INC).

4. The modules that come from the split will each be improved so that

     they are easy to install on their own. We already have plans to
     write Alien::Base modules for all of them.

  5. In parallel with this, we will begin reaching out to distribution
     packagers. PDL has not been updated on many of them (some of which
     are on 2.4.x). This is already on the wishlist 
at<https://github.com/PDLPorters/pdl/issues/139>.

  6. The current PDL distribution will remain as it is. Bugfixes will
     continue on PDL and they will be backported from PDLA. This approach
     has worked well for IPython/Jupyter (which underwent a split earlier
     this summer)[^jupyter-split]. Back porting fixes was a large part
     of what they had to go through.

  7. Eventually, after we are sure that PDLA has maintained
     compatibility with PDL, the changes of PDLA will replace the
     current PDL repository.

Finally, I also have some ideas for PDL3 that I will post in about a
month's time. One of the top priorities on the feature list of PDL3's C
API needs to be the ability to do optmisations such as loop fusion. I
need to ponder on how to combine this with the Moo-like metaprogramming
that we envision. The Julia developers seem to be working on this, but
there are still big unresolved questions on the issue tracker.

By the way, I think it might be better to avoid putting a number in the
name of this next major version of PDL. It's a personal opinion that
stems from marketing issues that are similar to what happened with
Osborne 1<https://en.wikipedia.org/wiki/Osborne_effect>  and somewhat
with Perl 6. This isn't a strongly held opinion, but I feel that it is
worth bringing up.

[^jupyter-split]:http://blog.jupyter.org/2015/04/15/the-big-split/

Cheers,
- Zaki Mughal

--Chris

------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140

_______________________________________________
pdl-devel mailing list
pdl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pdl-devel

Re: [Pdl-devel] Faster PDL Development Cycle---But How?

Reply via email to