Re: A universal ADT mapper and sorter?

john skaller Mon, 07 Feb 2011 16:39:13 -0800

On 08/02/2011, at 10:31 AM, Alan Silverstein wrote:

> John et al,
> 
>> My concept is that build systems are just programs like any other.
> 
> But not really just like any other because of what they do.  You might
> not get that impression from the Judy makefile I mostly wrote, but I am
> something of an expert on software build and delivery systems.  I won't
> expound at length but want to mention that the idea of a data flow
> diagram (DFD) is a powerful model for what a build system is really all
> about, what it implements.  Repositories (files) flow through processes
> (programs/etc) into other files.  (DFDs are to control flow diagrams as
> higher-level declarative languages are to lower-level functional
> languages.  You can google and read a lot more about DFD theory if it's
> a new idea to you.)


Data flow is not a new idea, it's a subset of the REAL idea: category theory.
I have a degree in math specialising in abstract algebra and my particular
interest is category theory :)

However, the domain specific languages used for build systems are crap.
I mean they're utterly crap. They're all wrong.

All build systems I know (other than fbuild) totally get the idea backwards.

Building is NOT goal driven. Make gets this completely wrong and everyone
has copied the mistake, and the mistake is fundamental.

The idea of targets is WRONG.

Build systems must be driven bottom up. They're intrinsically imperative
NOT functional.

When you change a source file, that should trigger rebuilding the system
based on what depends on the source file you changed. This is completely
the reverse of target driven building.

The correct way to construct a build system is by specifying how to build
the system and then optimising it. 

You CANNOT specify goal driven building effectively, because it is not
possible to get the dependencies right. This is a plain fact of reality.

Goal driven building also fails to work with multiple outputs.
Many programs output several files, eg make some binary code
AND generate documentation.

Also some systems require recursion. The best example is LaTeX.
This requires a concept of fixpoints, that is, you run latex repeatedly
until the output doesn't change (this is because things like plugging
in cross references change layout, which change the cross-references).

The canonical example of how goal driven building fails is my own
product "interscript". This is a literate programming tool that takes
a file containing other files and emits them (and maybe typesets
stuff as well).

It is completely backwards to specify the "targets" here. You have no idea
what they are. Interscript just generates files and it can use arbitrary
Python code to do that. The code generation is sophisticated.

But you don't care. What matters is that when the interscript INPUT file
changes you have to run interscript again to generate the outputs,
whatever they are.

> 
> When the build sources/targets are small enough relative to current
> technology, we just rebuild everything from scratch all the time and the
> build system can be relatively simple.  In real life though we always
> need conditional (re)build systems that understand dependencies and how
> to do efficient partial rebuilds.  Part of the art is how to correctly
> and efficiently "templatize" myriad repetitive patterns (rules) that
> have some variations.

Yes. And the way to do that sophisticated stuff requires a REAL programming
language like Python. Trying to do this with micky mouse crap like Make
cannot possibly work.

That is why Make comes with a set of other rubbish tools like automake,
autoconf, and you have crud all over the place *.am and inputs and other
rubbish .. almost all of which is C specific.

> 
> The most complicated build system I've ever seen was for chip design
> flows, where multiple different sets of sources could be used to create
> multiple different sets of outputs depending on what sources were
> available, and there were series of pattern-matching distinguishers for
> when to do what within each "rule group".

You had better look at fbuild then :)

Fbuild is a caching build system. It caches the results of various 
operations (compiling stuff, etc) and knows when the caches are
not up to date. So rebuilding is the same as building, except the caching
allow skipping some parts of the build because the dependencies
tell fbuild the results will be the same.

Fbuild captures dependencies automatically, you not only don't
have to specify them .. you CANNOT specify them.

What you do is say something like:

        cc("mycode.c","myout.o")

and fbuild caches the function call, it knows that "mycode.c" is
an input and "myout.o" is an output (because the 'cc' function has
been written that way, it's not magic!).

So you see with fbuild, you basically just tell it how to build
the system, the optimisation is automatic.

> Anyway one other drive-by concept worth mentioning is that real life
> often involves both multi-target rules and multi-rule targets (using
> make(1) terminology)...  The correct handling of those concepts is
> philosophically difficult, and often gotten wrong.

Yes. Which is why you should not use crud like make.
You have to use a powerful expressive language or you can't
possibly hope to get it all right.

It is still hard, even with Python. fbuild still has bugs. And the Felix
build scripts do as well. But the build system really is portable.
Code builds the same way using MSVC on Windows as it does
using gcc on Linux or OSX.


> 
>> Doug:  you and Alan have to decide what to do here.
> 
> To be clear it's 100% Doug's project now, has been for years.  I often
> respond to emails trying to be helpful if he doesn't, and we live in the
> same general area, but haven't worked on libJudy together since 2002.
> He's retired, I'm mostly not yet, and he's the sole owner of the
> library, I'm just a user now.  Which means I'm not reading every word of
> the current discussion as it gets involved!  :-)

you may not be the owner, but you have a level of interest and expertise.

The "rule of 3" says that to get a system like Judy working requires a lot
of resources, including intellectual ones: we need your brain :)

--
john skaller
[email protected]





------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Judy-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/judy-devel

Re: A universal ADT mapper and sorter?

Reply via email to