Re: A universal ADT mapper and sorter?

Alan Silverstein Mon, 07 Feb 2011 18:20:23 -0800

Drifting off topic but what the heck...

> Data flow is not a new idea, it's a subset of the REAL idea:  category
> theory.


OK, I'll have to read up on that...

> Build systems must be driven bottom up.  They're intrinsically
> imperative NOT functional.

Would you elaborate on the difference?  Do you mean the difference
between declarative (what) and functional (how)?

> When you change a source file, that should trigger rebuilding the
> system based on what depends on the source file you changed.  This is
> completely the reverse of target driven building.

Yes -- and no -- at least as I think of it.  Viewing a build system as
an acyclical graph, it's a static (at any one point in time) set of
relationships between sources (files that have no arrows into them
within the build system, even if derived say from a version control
system) and constructed targets (some of which are deliverable, others
of which are intermediate, but that doesn't matter here).  Given some
form of specification of these relationships -- sources, targets, rules,
dependencies/conditions -- then any time a source changes, all dependees
must be at least revisited if NOT updated/reconstructed, whether you
consider this to be targets-backwards or sources-forward.

By the way, that elaborate chip design system I mentioned had a neat
feature, where you could say "check to see if the target actually
changed as a result of the reapplication of the rule" and if not, don't
touch it, don't even change its modify time, meaning all downstream
targets (dependees of it) don't need rebuilding.  This "pruning" saved
considerable time in many real circumstances where a target was in some
way an abstraction of a source, immune to many detailed changes
affecting the source but not the target.

> You CANNOT specify goal driven building effectively, because it is not
> possible to get the dependencies right.  This is a plain fact of
> reality.

Can you please elaborate on that?  Again if I imagine the DFD describing
a collection of source and constructed files, and their rules and
dependencies, it doesn't seem to matter much which way you look at the
arrows, it's the results that count.

> Goal driven building also fails to work with multiple outputs.  Many
> programs output several files, eg make some binary code AND generate
> documentation.

Right, this is what I summarize as multi-target rules.  A common problem
is deciding whether all of the targets need updating when a common
source changes.  The pruning I mentioned earlier helped control ripples
in this way.

An even worse problem, usually not well understood, is a multi-rule
target.  This is when several rules contribute to a single repository
(such as a message catalog), blurring the state of that target for its
dependees.  I further divide these into robust and fragile multi-rule
targets.  A robust one can be partially updated correctly at any time
(like revising some database entries), but a fragile multi-rule target
must be wholly rebuilt (running multiple input rules) when any
dependency demands it.  In the worst case there's an ordering
requirement upon the rules (the file must be built in the right order)
which is difficult to correctly represent in a "static" DFD.  Wise
designers avoid creating constructing files that are fragile multi-rule
targets, if at all possible.

In real life one way this manifested, for example, was shipping a bad
patch, where a message catalog was broken due to an incomplete rebuild,
but the entire file was redelivered.

I think multi-rule targets arise naturally but mistakenly from
old-school thinking where files and file systems were expensive, so we
lumped similar things into common files (a kind of not-really database),
sometimes with an associated "registry" (index) of some type.  I'm more
in favor of what I call "self-registry", like how /etc/rc.d works (if I
recall right).  You drop files/scripts into a "known location" and their
mere presence (when found) acts as the registry, plus you can easily
update every file separately from others.

> Also some systems require recursion.  The best example is LaTeX.  This
> requires a concept of fixpoints, that is, you run latex repeatedly
> until the output doesn't change (this is because things like plugging
> in cross references change layout, which change the cross-references).

Yuck.  Cyclical build graphs are anathema and should be completely
avoided because no one ever builds them correctly.  I dispute your
assertion that "some systems require recursion."  Good design should
avoid it.  When "magic happens here" is a design rule, miraculous bugs
follow.

> The canonical example of how goal driven building fails is my own
> product "interscript".  This is a literate programming tool that takes
> a file containing other files and emits them (and maybe typesets stuff
> as well).
>
> It is completely backwards to specify the "targets" here.  You have no
> idea what they are.  Interscript just generates files and it can use
> arbitrary Python code to do that.  The code generation is
> sophisticated.

I would assert that you have a design flaw in your package.  Correct
building demands "full disclosure" to the build control system, in
whatever language.  All files must be listed; hidden temporary or
intermediate files not explicitly stated are accidents waiting to
happen.  Your example of (presumably) unpredictable deliverable targets
is even worse.  It might be expedient for the programmer to just "write
the list as a smart rule," but I think it's bad design.  It makes it
impossible to "manifest" the customer deliverable package in a
predictable and auditable way.  (I have a lot of experience dealing with
CPE = current product engineering...)

I understand WHY programmers like to operate this way.  It's clunky to
have to "redundantly" state information to various parts of the
engineering system.  First I tell the version control system I just
created a new source file...  Now I must tell the build system about
that file and how to build it...  Later I must explain how to handle (at
least clean up) any intermediate files like *.o object files...  And
then tell some kind of package/delivery/update code (often separate
parts flying in formation) once or more about the target deliverable
files.  And lord help me if I forget, leaving some kind of disconnect in
the DFD, and lack adequate automated tests/tools to catch my error.
Yuck!  An elegant comprehensive environment would make that easier, more
integrated.

So being a clever programmer, hell I'll just write a script/program that
embodies some arcane app-specific knowledge about how to create targets
from sources, based on "discovery"...

Believe me I've see all kinds of half-assed (well-intended but still
hackish) packages put together around these kinds of issues, with no
overall understanding of what it means to deliver maintainable,
updateable, removable packages to customers.

I don't think the answer is to punt and say, "my targets are
auto-generated."  A better answer is, "I have an easy way to specify
exactly what I'm expecting within and as output from the build system,
and to check that I got what I expected."  This does not mean you must
list every *.o to be created...  I'm OK with generic rules for generic
circumstances...  But that rule must only be applied in generic
situations.

> Yes.  And the way to do that sophisticated stuff requires a REAL
> programming language like Python.  Trying to do this with micky mouse
> crap like Make cannot possibly work.

Uh, you dismiss it too quickly.  Obviously make is popular because in
many relatively simple contexts it works just fine -- warts and all.
(Although I agree philosophically that it's a far cry from a
comprehensive version control, build, test, package, deliver, and
update/remove solution.)

> Fbuild is a caching build system.  It caches the results of various
> operations (compiling stuff, etc) and knows when the caches are not up
> to date.  So rebuilding is the same as building, except the caching
> allow skipping some parts of the build because the dependencies tell
> fbuild the results will be the same.

Cool, that's the right concept.

> Fbuild captures dependencies automatically, you not only don't have to
> specify them ..  you CANNOT specify them.

Caution, you appear to be headed down the same path as (now what was the
name again of Rational Software's kernel-incestuous over-the-top version
control and build package?)  You couldn't swat a fly in that system
without first getting a doctoral thesis!

> So you see with fbuild, you basically just tell it how to build the
> system, the optimisation is automatic.

  Exceptions prove the rule, and wreck the budget.  -- Miller

How do you let people specify unusual dependencies that aren't as simple
as compile this-to-that?

Cheers,
Alan

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: A universal ADT mapper and sorter?

Reply via email to