My view:
I agree with PDLA and I support basically all goals that where mentioned.
On the other hand I see also couple of possible gotchas.
First is (potential) users confusion. With ongoing work on breaking PDL(A)
into smaller pieces there will be sooner or later more PDLA::* modules than
PDL::* on CPAN. Maybe all PDLA modules should have something like this:
...
=head1 NAME
(experimental PDL fork) PDL interface to something cool.
=head1 SYNOPSIS
...
Then it will be clear from search result page on metacpan.org or
search.cpan.org that the module is what it is.
Next: although PDLA is intended to be a place for agile development I think
it still should have set some milestones (like: a / unbundling jumbo-PDL,
b/ reshaping makefiles, c/ rewriting pdl-pp parser, d/ new core, e/ new
object model, ...). Considering how many developers will be actively
contributing the milestones should be perhaps sorted and work on
sequentially. These milestones, once finished, might be a good opportunity
for thorough peer review from those PDL devs/users that will not be
actively participating in PDLA.
And the last: considering the high demand for stability (see other posts in
this thread) I am not quite sure that the idea of "in the end PDLA
repository will replace PDL " will work. Maybe the
finished/polished/discussed/agreed/reviewed/tested changes and ideas from
PDLA should be step-by-step (milestone-by-milestone) brought to mainstream
PDL so that in the end PDL == PDLA (at some point on this way the major
version will bump from 2 to 3).
--
kmx
On 25.8.2015 19:42, Zakariyya Mughal wrote:
On 2015-08-24 at 23:48:51 +0000, Chris Marshall wrote:
PDL Developers-
With the addition of two active and highly motivated PDL developers
(Zakariyya Mughal and Guggle "Ed" Worth) we've made significant progress
in cleaning up the PDL distribution itself and the development process
itself. PDL is now run through test builds automatically on git commit
via the Travis-CI framework of github. Many perl platforms and PDL
configuration options are exercised. PDL-2.013 was the best tested
pre-release release ever.
The current process we've been working toward is to make
PDL development faster and more responsive by breaking up the current
monolithic PDL distribution into a lean core (roughly the current
PDL::Core, PDL:PP, and PDL::Slices) and spinning off the other modules
for IO, Graphics, and Library interfaces as their own CPAN releases.
This would enable the separate module/distributions to have a faster
development-test-relese cycle since that process would not be held up by
the testing of the full PDL distribution with all its subcomponents,
even if they are completely independent/unrelated to the separate module
changes being made.
We're ready to make the split, but there is a catch... How can we
have the rapid agile development needed to bring the next generation
PDL3 possible _without_ losing the "PDL just works" that has been one of
the primary focus of PDL-2.x development since I volunteered as release
manager circa PDL-2.4.3 [sic]?
There has been some discussion, largely on #pdl, about how to best
proceed. One idea is to move to a constant release mode which could be
expedited by adding co-maints to PDL. I've not acted on that largely
because I feel that PDL just working, easy to get and start to use, is
essential to survive as a minority numeric computation engine (compared
with R, NumPy, Octave/MATLAB). How can we grow market share if it takes
a perl expert to start using PDL?
That said, I think the "big split" is the best way forward for PDL
to grow and thrive. The ideas for the PDL3 core engine show great
promise for the kind of dynamic development as occurred when Karl first
conceived and implemented the idea that would become PDL.
Unfortunately, my experience with rapid sequential releases is a sort of
"churn" where it is difficult to know if you'll be able to get a working
module at any given release. So what to do...
One idea I had is change the stable PDL release distribution into a
PDL bundle. That would be the "stable PDL" that would be easy to get
and install. The sub-modules would then be able to have independent
development forming the "experimental PDL" track. Another way, a bit
more crude, would be to make a fixed "stable PDL" release that would be
the one to install. Maybe we could use specific version information to
work with cpan, cpanm,...
Here's where we need your input for discussion and consensus.
Please feel free to comment on any of the above, or to offer your own
thoughts. The goal is to select the preferred approach for modern PDL
development and move out on it. I would like to complete this discuss
process within the next two weeks. At that point we should be able to
make a specific plan for any final comments with the agile development
to begin shortly after.
Let the discussions begin!
Hello Chris,
First off, thank you for starting this conversation.
Ed and I have been working on and off as time permits on preparing for
the split. The work we've been doing hasn't really generated much
traffic on the pdl-devel mailing list, but the #pdl and PDLPorters
GitHub organisation shows a very different story. There is a lot going
on there every few days. The discussion on those two mediums is a little
more agile than the mailing list or SourceForge and helps with formulating
I highly recommend joining both by watching the repositories in
PDLPorters and following the IRC by either joining in a client or
tracking the backlog with<http://irclog.perlgeek.de/pdl/>.
I'd like to summarise some of what we came up with on GitHub/IRC:
1. A split is necessary to not only make releases easier, but also
development. We have worked on reducing the time required to build
PDL across multiple environments down to a little over 1 hour.
This is still too long when you have perhaps 1.5 hours of tuits that
day. So the work inevitably gets spread out over weeks.
A split would help decrease this friction.
2. Making `cpanm PDL` always work has always part of the plan.
Improving the PDL devops has helped with that. The plan is to
continue doing that.
But large refactors such as this split can be quite daunting. We
can't be sure we will stick the landing right the first time. But
the job needs to move forward or it will fail via analysis paralysis
even before it has begun.
3. Ed and I have been thinking about releasing a more agile, friendly
fork of PDL under the PDLA namespace (for PDL Agile). The
repositories will continue to live under the PDLPorters GitHub
organisation.
We will start by applying the split. This will be followed by
improving code coverage, fixes to the 64-bit indexing, formalising
the badvalue semantics for more functions, and bug-fixes.
We plan on making sure that libraries such as PDL-Stats, PDL-IO-CSV,
etc. remain compatible with this library. I believe there is a way
to do this without making changes to the original code (via a subref
in @INC).
4. The modules that come from the split will each be improved so that
they are easy to install on their own. We already have plans to
write Alien::Base modules for all of them.
5. In parallel with this, we will begin reaching out to distribution
packagers. PDL has not been updated on many of them (some of which
are on 2.4.x). This is already on the wishlist
at<https://github.com/PDLPorters/pdl/issues/139>.
6. The current PDL distribution will remain as it is. Bugfixes will
continue on PDL and they will be backported from PDLA. This approach
has worked well for IPython/Jupyter (which underwent a split earlier
this summer)[^jupyter-split]. Back porting fixes was a large part
of what they had to go through.
7. Eventually, after we are sure that PDLA has maintained
compatibility with PDL, the changes of PDLA will replace the
current PDL repository.
Finally, I also have some ideas for PDL3 that I will post in about a
month's time. One of the top priorities on the feature list of PDL3's C
API needs to be the ability to do optmisations such as loop fusion. I
need to ponder on how to combine this with the Moo-like metaprogramming
that we envision. The Julia developers seem to be working on this, but
there are still big unresolved questions on the issue tracker.
By the way, I think it might be better to avoid putting a number in the
name of this next major version of PDL. It's a personal opinion that
stems from marketing issues that are similar to what happened with
Osborne 1<https://en.wikipedia.org/wiki/Osborne_effect> and somewhat
with Perl 6. This isn't a strongly held opinion, but I feel that it is
worth bringing up.
[^jupyter-split]:http://blog.jupyter.org/2015/04/15/the-big-split/
Cheers,
- Zaki Mughal
--Chris
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
pdl-devel mailing list
pdl-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pdl-devel