Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Wed, Nov 11, 2015 at 11:34 AM, Paul Moore  wrote:
> On 11 November 2015 at 18:38, Nathaniel Smith  wrote:
>> Have you tried current dev versions of pip recently?
>
> No, but I did see your work on this, and I appreciate and approve of it.
>
>> The default now is to
>> suppress the actual output but for progress reporting to show a spinner that
>> rotates each time a line of text would have been printed. It's low tech but
>> IMHO very effective. (And obviously you can also flip a switch to either see
>> all or nothing of the output as well, or if that isn't there now if books
>> really be added.) So I kinda feel like these are solved problems.
>
> And this relies on build tools outputting to stdout, not stderr, and
> not buffering their output.

FWIW the spinner patch actually looks at both stdout and stderr, and
it also takes care to force the child process's sys.stdout/sys.stderr
into line-buffered mode, but of course this buffering tweak only helps
for output printed by python code running in the immediate child. So
yeah, it wouldn't hurt to add a few non-normative words about
buffering to my original one-sentence specification :-).

> That's an interface spec. Not everything has to be massively
> complicated, and I wasn't implying it needed to be. Just that we need
> conventions. One constant annoyance for pip is that distutils doesn't
> properly separate stdout and stderr, so we can't suppress unnecessary
> status reports without losing important error messages. Users report
> this as a bug in pip, not in distutils, and I don't imagine that would
> change if a project was using .

Sorry for misunderstanding!

I guess the other thing we could do is to try to convince build
systems to do a better job of separating stdout and stderr, but I'm
dubious about how much this would help, because I think the problem is
more fundamental than that. For outright errors, there isn't really a
problem IMO, because when the build fails that gives you a clear
signal that you should probably show the user the output :-). The case
that's trickier, and could potentially benefit, is warnings that don't
cause the build to fail. If gcc outputs a warning, should we show that
to the user? Yes if this is the developer building their own code...
but probably not if this is pip building from an automatically
downloaded sdist for an end-user -- there are lots and lots of
harmless warnings in the output of popular packages, and dumping those
scary and inscrutable messages on end-users is going to create all the
problems we were trying to avoid by hiding the output in the first
place.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Paul Moore
On 11 November 2015 at 19:31, Nathaniel Smith  wrote:
> This particular subthread is all hanging off of Paul's message [1]
> where he argues that we can't just print arbitrary text to
> stdout/stderr, we need, like, structured JSON messages on stdout that
> pip can parse while the build is running

As I already pointed out, I never said that.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Paul Moore
On 11 November 2015 at 18:38, Nathaniel Smith  wrote:
> Have you tried current dev versions of pip recently?

No, but I did see your work on this, and I appreciate and approve of it.

> The default now is to
> suppress the actual output but for progress reporting to show a spinner that
> rotates each time a line of text would have been printed. It's low tech but
> IMHO very effective. (And obviously you can also flip a switch to either see
> all or nothing of the output as well, or if that isn't there now if books
> really be added.) So I kinda feel like these are solved problems.

And this relies on build tools outputting to stdout, not stderr, and
not buffering their output.

That's an interface spec. Not everything has to be massively
complicated, and I wasn't implying it needed to be. Just that we need
conventions. One constant annoyance for pip is that distutils doesn't
properly separate stdout and stderr, so we can't suppress unnecessary
status reports without losing important error messages. Users report
this as a bug in pip, not in distutils, and I don't imagine that would
change if a project was using .

>
>> Taking all of those requirements into account, pip *has* to have some
>> level of control over the output of a build tool - with setuptools at
>> the moment, we have no such control (other than "we may or may not
>> show the output to the user") and that means we struggle to
>> realistically satisfy all of the conflicting requirements we have.
>>
>> So we do need much better defined contracts over stdin, stdout and
>> stderr, and return codes. This is true whether or not the build system
>> is invoked via a Python API or a CLI.
>
> Even if you really do want to define a generic structured system for build
> progress reporting (it feels pretty second-systemy to me), then in the
> python api approach there are better options than trying to define a
> specific protocol on stdout.

No, no, no. I never said that. All I was saying was that we need a
level of agreement on what pip can expect to do with stdout and
stderr, *given that there are known requirements pip's users expect to
be satisfied*.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Wed, Nov 11, 2015 at 11:12 AM, Robert Collins
 wrote:
> On 12 November 2015 at 08:07, Nathaniel Smith  wrote:
>> On Wed, Nov 11, 2015 at 10:42 AM, Donald Stufft  wrote:
>>> On November 11, 2015 at 1:38:38 PM, Nathaniel Smith (n...@pobox.com) wrote:
 > Guaranteeing a clean stdout/stderr is hard: it means you have
 to be careful to correctly capture and process the output of every
 child you invoke (e.g. compilers), and deal correctly with the
 tricky aspects of pipes (deadlocks, sigpipe, ...). And even
 then you can get thwarted by accidentally importing the wrong
 library into your main process, and discovering that it writes
 directly to stdout/stderr on some error condition. And it may
 or may not respect your resetting of sys.stdout/sys.stderr
 at the python level. So to be really reliable the only thing to
 do is to create some pipes and some threads to read the pipes and
 do the dup2 dance (but not everyone will actually do this, they'll
 just accept corrupted output on errors) and ugh, all of this is
 a huge hassle that massively raises the bar on implementing simple
 build systems.
>>>
>>> How is this not true for a worker.py process as well? If the worker process 
>>> communicates via stdout then it has to make sure it captures the stdout and 
>>> redirects it before calling into the Python API and then undoes that 
>>> afterwords. It makes it harder to do incremental output actually because a 
>>> Python function can’t return in the middle of execution so we’d need to 
>>> make it some sort of akward generator protocol to make that happen too.
>>
>> Did you, uh, read the second half of my email? :-) My actual position
>> is that we shouldn't even try to get structured incremental output
>> from the build system, and should stick with the current approach of
>> unstructured incremental output on stdout/stderr. But if we do insist
>> on getting structured incremental output, then I described a system
>> that's much easier for backends to implement, while leaving it up to
>> the frontend to pick whether they want to bother doing complicated
>> redirection tricks, and if so then which particular variety of
>> complicated redirection trick they like best.
>>
>> In both approaches, yeah, any kind of incremental output is eventually
>> come down to some Python code issuing some sort of function call that
>> reports progress without returning, whether that's
>> sys.stdout.write(json.dumps(...)) or
>> progress_reporter.report_update(...). Between those two options, it's
>> sys.stdout.write(json.dumps(...)) that looks more awkward to me.
>
> I think there is some big disconnect in the conversation. AIUI Donald
> and Marcus and I are saying that build systems should just use
>
> print("Something happened")
>
> to provide incremental output.

I agree that this is the best approach.

This particular subthread is all hanging off of Paul's message [1]
where he argues that we can't just print arbitrary text to
stdout/stderr, we need, like, structured JSON messages on stdout that
pip can parse while the build is running. (Which implies that you can
*only* have structured JSON messages on stdout, because otherwise
there's no way to tell which bits are supposed to be structured and
which bits are just arbitrary text.) And I said well, I think that's
probably overcomplicated and unnecessary, but if you insist then this
is what it would look like in the different approaches.

(Your current draft does create similar challenges for build backends
because it also uses stdout for passing structured data. But I know
you're in the middle of rewriting it anyway, so maybe this is
irrelevant.)

-n

[1] http://thread.gmane.org/gmane.comp.python.distutils.devel/24760/focus=24792

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Wed, Nov 11, 2015 at 4:29 AM, Donald Stufft  wrote:
> On November 11, 2015 at 4:05:11 AM, Nathaniel Smith (n...@pobox.com) wrote:
>> > But even this isn't really true -- the difference between them
>> is that
>> either way you have a subprocess API, but with a Python API, the
>> subprocess interface that pip uses has the option of being improved
>> incrementally over time -- including, potentially, to take
>> further
>> advantage of the underlying richness of the Python semantics.
>> Sure,
>> maybe the first release would just take all exceptions and map
>> them
>> into some text printed to stderr and a non-zero return code, and
>> that's all that pip would get. But if someone had an idea for how
>> pip
>> could do better than this by, I dunno, encoding some structured
>> metadata about the particular exception that occurred and passing
>> this
>> back up to pip to do something intelligent with it, they absolutely
>> could write the code and submit a PR to pip, without having to write
>> a
>> new PEP.
>
> I think I prefer a CLI based approach (my suggestion was to remove the 
> formatting/interpolation all together and just have the file include a list 
> of things to install, and a python module to invoke via ``python -m  provided by user>``).
>
> The main reason I think I prefer a CLI based approach is that I worry about 
> the impedance mismatch between the two systems. We’re not actually going to 
> be able to take advantage of Python’s plethora of types in any meaningful 
> capacity because at the end of the day the bulk of the data is either 
> naturally a string or as we start to allow end users to pass options through 
> pip into the build system, we have no real way of knowing what the type is 
> supposed to be other than the fact we got it as a CLI flag. How does a user 
> encode something like “pass an integer into this value in the build system?” 
> on the CLI in a generic way? I can’t think of any way which means that any 
> boundary code in the build system is going to need to be smart enough to 
> handle an array of arguments that come in via the user typing something on 
> the CLI. We have a wide variety of libraries to handle that case already for 
> building CLI apps but we do not have a wide array of libraries handling it 
> for a Python API. It will have to be manually encoded for each and every 
> option that the build system supports.

You're overcomplicating things :-). The solution to this problem is
just "pip's UI only allows passing arbitrary strings as option values,
so build backends had better deal with it". That's what we'd
effectively be doing anyway in the CLI approach.

> My other concern is that it introduces another potential area for mistake 
> that is a bit harder to test. I don’t believe that any sort of “worker.py” 
> script is ever going to be able to handle arbitrary Python values coming back 
> as a return value from a Python script. Whatever serialization we use to send 
> data back into the main pip process (likely JSON) will simply choke and cause 
> an error if it encounters a type it doesn’t know how to serialize. However 
> this error case will only happen when the build system is being invoked by 
> pip, not when it is being invoked “naturally” in the build system’s unit 
> tests. By forcing build tool authors to write a CLI interface, we push the 
> work of “how do I serialize my internal data structures” down onto them 
> instead of making it some implicit piece of code that pip needs to work.

I think this is another issue that isn't actually a problem. Remember,
we don't need to support translating arbitrary Python function calls
across process boundaries; there will be a fixed, finite set of
methods that we need to support, and those methods' semantics will be
defined in a PEP.

So e.g., if the PEP says that build backends should define a method like this:

def build_requirements(self, build_options):
"""Calculate the dynamic portion of the build-requirements.

:param build_options: The build options dictionary.
:returns: A list of strings, where each string is a PEP XX
requirement specifier.
"""

then our IPC mechanism doesn't need to be able to handle arbitrary
types as return values, it needs to be able to handle a list of
strings. Which that sketch I sent does handle, so we're good. And the
build tool's unit tests will be checking that it returns a list of
strings, because... that's what unit tests do, they validate that
methods implement the interface that they're defined to implement :-).
So this is a non-problem -- we just have to make sure when we define
the various method interfaces in the PEP that we don't have any
methods that return arbitrary complicated Python types. Which we
weren't going to be tempted to do anyway.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinf

Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Donald Stufft
On November 11, 2015 at 2:08:00 PM, Nathaniel Smith (n...@pobox.com) wrote:
> On Wed, Nov 11, 2015 at 10:42 AM, Donald Stufft wrote:
> > On November 11, 2015 at 1:38:38 PM, Nathaniel Smith (n...@pobox.com) wrote:
> >> > Guaranteeing a clean stdout/stderr is hard: it means you have
> >> to be careful to correctly capture and process the output of every
> >> child you invoke (e.g. compilers), and deal correctly with the
> >> tricky aspects of pipes (deadlocks, sigpipe, ...). And even
> >> then you can get thwarted by accidentally importing the wrong
> >> library into your main process, and discovering that it writes
> >> directly to stdout/stderr on some error condition. And it may
> >> or may not respect your resetting of sys.stdout/sys.stderr
> >> at the python level. So to be really reliable the only thing to
> >> do is to create some pipes and some threads to read the pipes and
> >> do the dup2 dance (but not everyone will actually do this, they'll
> >> just accept corrupted output on errors) and ugh, all of this is
> >> a huge hassle that massively raises the bar on implementing simple
> >> build systems.
> >
> > How is this not true for a worker.py process as well? If the worker process 
> > communicates  
> via stdout then it has to make sure it captures the stdout and redirects it 
> before calling  
> into the Python API and then undoes that afterwords. It makes it harder to do 
> incremental  
> output actually because a Python function can’t return in the middle of 
> execution so  
> we’d need to make it some sort of akward generator protocol to make that 
> happen too.
>  
> Did you, uh, read the second half of my email? :-) My actual position
> is that we shouldn't even try to get structured incremental output
> from the build system, and should stick with the current approach of
> unstructured incremental output on stdout/stderr. But if we do insist
> on getting structured incremental output, then I described a system
> that's much easier for backends to implement, while leaving it up to
> the frontend to pick whether they want to bother doing complicated
> redirection tricks, and if so then which particular variety of
> complicated redirection trick they like best.
>  
> In both approaches, yeah, any kind of incremental output is eventually
> come down to some Python code issuing some sort of function call that
> reports progress without returning, whether that's
> sys.stdout.write(json.dumps(...)) or
> progress_reporter.report_update(...). Between those two options, it's
> sys.stdout.write(json.dumps(...)) that looks more awkward to me.
>  

I’m confused how the progress indicator you just implemented would work if 
there wasn’t something triggering a “hey I’m still doing work” to incrementally 
output information.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Robert Collins
On 12 November 2015 at 08:07, Nathaniel Smith  wrote:
> On Wed, Nov 11, 2015 at 10:42 AM, Donald Stufft  wrote:
>> On November 11, 2015 at 1:38:38 PM, Nathaniel Smith (n...@pobox.com) wrote:
>>> > Guaranteeing a clean stdout/stderr is hard: it means you have
>>> to be careful to correctly capture and process the output of every
>>> child you invoke (e.g. compilers), and deal correctly with the
>>> tricky aspects of pipes (deadlocks, sigpipe, ...). And even
>>> then you can get thwarted by accidentally importing the wrong
>>> library into your main process, and discovering that it writes
>>> directly to stdout/stderr on some error condition. And it may
>>> or may not respect your resetting of sys.stdout/sys.stderr
>>> at the python level. So to be really reliable the only thing to
>>> do is to create some pipes and some threads to read the pipes and
>>> do the dup2 dance (but not everyone will actually do this, they'll
>>> just accept corrupted output on errors) and ugh, all of this is
>>> a huge hassle that massively raises the bar on implementing simple
>>> build systems.
>>
>> How is this not true for a worker.py process as well? If the worker process 
>> communicates via stdout then it has to make sure it captures the stdout and 
>> redirects it before calling into the Python API and then undoes that 
>> afterwords. It makes it harder to do incremental output actually because a 
>> Python function can’t return in the middle of execution so we’d need to make 
>> it some sort of akward generator protocol to make that happen too.
>
> Did you, uh, read the second half of my email? :-) My actual position
> is that we shouldn't even try to get structured incremental output
> from the build system, and should stick with the current approach of
> unstructured incremental output on stdout/stderr. But if we do insist
> on getting structured incremental output, then I described a system
> that's much easier for backends to implement, while leaving it up to
> the frontend to pick whether they want to bother doing complicated
> redirection tricks, and if so then which particular variety of
> complicated redirection trick they like best.
>
> In both approaches, yeah, any kind of incremental output is eventually
> come down to some Python code issuing some sort of function call that
> reports progress without returning, whether that's
> sys.stdout.write(json.dumps(...)) or
> progress_reporter.report_update(...). Between those two options, it's
> sys.stdout.write(json.dumps(...)) that looks more awkward to me.

I think there is some big disconnect in the conversation. AIUI Donald
and Marcus and I are saying that build systems should just use

print("Something happened")

to provide incremental output.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Wed, Nov 11, 2015 at 10:42 AM, Donald Stufft  wrote:
> On November 11, 2015 at 1:38:38 PM, Nathaniel Smith (n...@pobox.com) wrote:
>> > Guaranteeing a clean stdout/stderr is hard: it means you have
>> to be careful to correctly capture and process the output of every
>> child you invoke (e.g. compilers), and deal correctly with the
>> tricky aspects of pipes (deadlocks, sigpipe, ...). And even
>> then you can get thwarted by accidentally importing the wrong
>> library into your main process, and discovering that it writes
>> directly to stdout/stderr on some error condition. And it may
>> or may not respect your resetting of sys.stdout/sys.stderr
>> at the python level. So to be really reliable the only thing to
>> do is to create some pipes and some threads to read the pipes and
>> do the dup2 dance (but not everyone will actually do this, they'll
>> just accept corrupted output on errors) and ugh, all of this is
>> a huge hassle that massively raises the bar on implementing simple
>> build systems.
>
> How is this not true for a worker.py process as well? If the worker process 
> communicates via stdout then it has to make sure it captures the stdout and 
> redirects it before calling into the Python API and then undoes that 
> afterwords. It makes it harder to do incremental output actually because a 
> Python function can’t return in the middle of execution so we’d need to make 
> it some sort of akward generator protocol to make that happen too.

Did you, uh, read the second half of my email? :-) My actual position
is that we shouldn't even try to get structured incremental output
from the build system, and should stick with the current approach of
unstructured incremental output on stdout/stderr. But if we do insist
on getting structured incremental output, then I described a system
that's much easier for backends to implement, while leaving it up to
the frontend to pick whether they want to bother doing complicated
redirection tricks, and if so then which particular variety of
complicated redirection trick they like best.

In both approaches, yeah, any kind of incremental output is eventually
come down to some Python code issuing some sort of function call that
reports progress without returning, whether that's
sys.stdout.write(json.dumps(...)) or
progress_reporter.report_update(...). Between those two options, it's
sys.stdout.write(json.dumps(...)) that looks more awkward to me.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Wes Turner
On Nov 11, 2015 12:31 PM, "Robert Collins" 
wrote:
>
> On 12 November 2015 at 02:30, Paul Moore  wrote:
> > On 10 November 2015 at 22:44, Nathaniel Smith  wrote:
> >> "Stdin is unspecified, and stdout/stderr can be used for printing
> >> status messages, errors, etc. just like you're used to from every
> >> other build system in the world."
> >
> > This is over simplistic.
> >
> > We have real-world requirements from users of pip that they *don't*
> > want to see all of the progress that the various build tools invoke.
> > That is not something we can ignore. We also have some users saying
> > they want access to all of the build tool output. And we also have a
> > requirement for progress reporting.
> >
> > Taking all of those requirements into account, pip *has* to have some
> > level of control over the output of a build tool - with setuptools at
> > the moment, we have no such control (other than "we may or may not
> > show the output to the user") and that means we struggle to
> > realistically satisfy all of the conflicting requirements we have.
> >
> > So we do need much better defined contracts over stdin, stdout and
> > stderr, and return codes. This is true whether or not the build system
> > is invoked via a Python API or a CLI.
>
> Aye.
>
> I'd like everyone to take a breather on this thread btw. I'm focusing
> on the dependency specification PEP and until thats at the point I
> can't move it forward, I won't be updating the draft build abstraction
> PEP:

Presumably, it would be great to list a platform parameter description as
JSONLD-serializable keys and values (e.g. for a bdist/wheel build "imprint"
in the JSONLD build metadata composition file) ... #PEP426JSONLD

> when thats done, with the thing Donald and I hammered out on IRC
> a few days back (Option 3, earlier) then we'll have something to talk
> about and consider.
>
> -Rob
>
> --
> Robert Collins 
> Distinguished Technologist
> HP Converged Cloud
> ___
> Distutils-SIG maillist  -  Distutils-SIG@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Donald Stufft
On November 11, 2015 at 1:38:38 PM, Nathaniel Smith (n...@pobox.com) wrote:
> > Guaranteeing a clean stdout/stderr is hard: it means you have  
> to be careful to correctly capture and process the output of every  
> child you invoke (e.g. compilers), and deal correctly with the  
> tricky aspects of pipes (deadlocks, sigpipe, ...). And even  
> then you can get thwarted by accidentally importing the wrong  
> library into your main process, and discovering that it writes  
> directly to stdout/stderr on some error condition. And it may  
> or may not respect your resetting of sys.stdout/sys.stderr  
> at the python level. So to be really reliable the only thing to  
> do is to create some pipes and some threads to read the pipes and  
> do the dup2 dance (but not everyone will actually do this, they'll  
> just accept corrupted output on errors) and ugh, all of this is  
> a huge hassle that massively raises the bar on implementing simple  
> build systems.

How is this not true for a worker.py process as well? If the worker process 
communicates via stdout then it has to make sure it captures the stdout and 
redirects it before calling into the Python API and then undoes that 
afterwords. It makes it harder to do incremental output actually because a 
Python function can’t return in the middle of execution so we’d need to make it 
some sort of akward generator protocol to make that happen too.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA


___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Nov 11, 2015 5:30 AM, "Paul Moore"  wrote:
>
> On 10 November 2015 at 22:44, Nathaniel Smith  wrote:
> > "Stdin is unspecified, and stdout/stderr can be used for printing
> > status messages, errors, etc. just like you're used to from every
> > other build system in the world."
>
> This is over simplistic.
>
> We have real-world requirements from users of pip that they *don't*
> want to see all of the progress that the various build tools invoke.
> That is not something we can ignore. We also have some users saying
> they want access to all of the build tool output. And we also have a
> requirement for progress reporting.

Have you tried current dev versions of pip recently? The default now is to
suppress the actual output but for progress reporting to show a spinner
that rotates each time a line of text would have been printed. It's low
tech but IMHO very effective. (And obviously you can also flip a switch to
either see all or nothing of the output as well, or if that isn't there now
if books really be added.) So I kinda feel like these are solved problems.

> Taking all of those requirements into account, pip *has* to have some
> level of control over the output of a build tool - with setuptools at
> the moment, we have no such control (other than "we may or may not
> show the output to the user") and that means we struggle to
> realistically satisfy all of the conflicting requirements we have.
>
> So we do need much better defined contracts over stdin, stdout and
> stderr, and return codes. This is true whether or not the build system
> is invoked via a Python API or a CLI.

Even if you really do want to define a generic structured system for build
progress reporting (it feels pretty second-systemy to me), then in the
python api approach there are better options than trying to define a
specific protocol on stdout.

Guaranteeing a clean stdout/stderr is hard: it means you have to be careful
to correctly capture and process the output of every child you invoke (e.g.
compilers), and deal correctly with the tricky aspects of pipes (deadlocks,
sigpipe, ...). And even then you can get thwarted by accidentally importing
the wrong library into your main process, and discovering that it writes
directly to stdout/stderr on some error condition. And it may or may not
respect your resetting of sys.stdout/sys.stderr at the python level. So to
be really reliable the only thing to do is to create some pipes and some
threads to read the pipes and do the dup2 dance (but not everyone will
actually do this, they'll just accept corrupted output on errors) and ugh,
all of this is a huge hassle that massively raises the bar on implementing
simple build systems.

In the subprocess approach you don't really have many options; if you want
live feedback from a build process then you have to get it somehow, and you
can't just say "fine part of the protocol is that we use fd 3 for
structured status updates" because that doesn't work on windows.

In the python api approach, we have better options, though. The way I'd do
this is to define some of progress reporting abstract interface, like

  class BuildUpdater:
  # pass -1 for "unknown"
  def set_total_steps(self, n):
  pass

  # if total is unknown, call this repeatedly to say "something's
happening"
  def set_current_step(self, n):
  pass

  def alert_user(self, message):
  pass

And methods like build_wheel would accept an object implementing this
interface as an argument. Stdout/stderr keep the same semantics as they
have today; this is a separate, additional channel.

And then a build frontend could decide how it wants to actually implement
this interface. A simple frontend that didn't want to implement fancy UI
stuff might just have each of those methods print something to stderr to be
captured along with the rest of the chatter. A fancier frontend like pip
could pick whichever ipc mechanism they like best and implement that inside
their worker. (E.g., maybe on POSIX we use fd 3, and on windows we do
incremental writes to a temp file, or use a named pipe. Or maybe we prefer
to stick to using stdout for pip<->worker communication, and the worker
would take the responsibility of robustly redirecting stdout via dup2
before invoking the actual build hook. There are lots of options; the
beauty of the approach, again, is that we don't have to pick one now and
write it in stone.)

-n
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Robert Collins
On 12 November 2015 at 02:30, Paul Moore  wrote:
> On 10 November 2015 at 22:44, Nathaniel Smith  wrote:
>> "Stdin is unspecified, and stdout/stderr can be used for printing
>> status messages, errors, etc. just like you're used to from every
>> other build system in the world."
>
> This is over simplistic.
>
> We have real-world requirements from users of pip that they *don't*
> want to see all of the progress that the various build tools invoke.
> That is not something we can ignore. We also have some users saying
> they want access to all of the build tool output. And we also have a
> requirement for progress reporting.
>
> Taking all of those requirements into account, pip *has* to have some
> level of control over the output of a build tool - with setuptools at
> the moment, we have no such control (other than "we may or may not
> show the output to the user") and that means we struggle to
> realistically satisfy all of the conflicting requirements we have.
>
> So we do need much better defined contracts over stdin, stdout and
> stderr, and return codes. This is true whether or not the build system
> is invoked via a Python API or a CLI.

Aye.

I'd like everyone to take a breather on this thread btw. I'm focusing
on the dependency specification PEP and until thats at the point I
can't move it forward, I won't be updating the draft build abstraction
PEP: when thats done, with the thing Donald and I hammered out on IRC
a few days back (Option 3, earlier) then we'll have something to talk
about and consider.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Paul Moore
On 10 November 2015 at 22:44, Nathaniel Smith  wrote:
> "Stdin is unspecified, and stdout/stderr can be used for printing
> status messages, errors, etc. just like you're used to from every
> other build system in the world."

This is over simplistic.

We have real-world requirements from users of pip that they *don't*
want to see all of the progress that the various build tools invoke.
That is not something we can ignore. We also have some users saying
they want access to all of the build tool output. And we also have a
requirement for progress reporting.

Taking all of those requirements into account, pip *has* to have some
level of control over the output of a build tool - with setuptools at
the moment, we have no such control (other than "we may or may not
show the output to the user") and that means we struggle to
realistically satisfy all of the conflicting requirements we have.

So we do need much better defined contracts over stdin, stdout and
stderr, and return codes. This is true whether or not the build system
is invoked via a Python API or a CLI.

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Donald Stufft
On November 11, 2015 at 4:05:11 AM, Nathaniel Smith (n...@pobox.com) wrote:
> > But even this isn't really true -- the difference between them  
> is that
> either way you have a subprocess API, but with a Python API, the  
> subprocess interface that pip uses has the option of being improved  
> incrementally over time -- including, potentially, to take  
> further
> advantage of the underlying richness of the Python semantics.  
> Sure,
> maybe the first release would just take all exceptions and map  
> them
> into some text printed to stderr and a non-zero return code, and  
> that's all that pip would get. But if someone had an idea for how  
> pip
> could do better than this by, I dunno, encoding some structured  
> metadata about the particular exception that occurred and passing  
> this
> back up to pip to do something intelligent with it, they absolutely  
> could write the code and submit a PR to pip, without having to write  
> a
> new PEP.

I think I prefer a CLI based approach (my suggestion was to remove the 
formatting/interpolation all together and just have the file include a list of 
things to install, and a python module to invoke via ``python -m ``).

The main reason I think I prefer a CLI based approach is that I worry about the 
impedance mismatch between the two systems. We’re not actually going to be able 
to take advantage of Python’s plethora of types in any meaningful capacity 
because at the end of the day the bulk of the data is either naturally a string 
or as we start to allow end users to pass options through pip into the build 
system, we have no real way of knowing what the type is supposed to be other 
than the fact we got it as a CLI flag. How does a user encode something like 
“pass an integer into this value in the build system?” on the CLI in a generic 
way? I can’t think of any way which means that any boundary code in the build 
system is going to need to be smart enough to handle an array of arguments that 
come in via the user typing something on the CLI. We have a wide variety of 
libraries to handle that case already for building CLI apps but we do not have 
a wide array of libraries handling it for a Python API. It will have to be 
manually encoded for each and every option that the build system supports.

My other concern is that it introduces another potential area for mistake that 
is a bit harder to test. I don’t believe that any sort of “worker.py” script is 
ever going to be able to handle arbitrary Python values coming back as a return 
value from a Python script. Whatever serialization we use to send data back 
into the main pip process (likely JSON) will simply choke and cause an error if 
it encounters a type it doesn’t know how to serialize. However this error case 
will only happen when the build system is being invoked by pip, not when it is 
being invoked “naturally” in the build system’s unit tests. By forcing build 
tool authors to write a CLI interface, we push the work of “how do I serialize 
my internal data structures” down onto them instead of making it some implicit 
piece of code that pip needs to work.

The other reason I think a CLI approach is nicer is that it gives us a standard 
interface that we can us to have defined errors that the build system can omit. 
For instance if we wanted to allow the build system to indicate that it can’t 
do a build because it’s missing a mandatory C library, that would be trivial to 
do in a natural way for a CLI approach, we just define an error code and say 
that if the CLI exits with a 2 then we assume it’s missing a mandatory C 
library and we can take additional measures in pip to handle that case. If we 
use a Python API the natural way to signal an error like that is using an 
exception… but we don’t have any way to force a standard exception hierarchy on 
people. There is no “Missing C Library Exception” in Python so either we’d have 
to encode some numerical or string based identifier that we’ll inspect an 
exception for (like Exception().error_code) or we’ll need to make a mandatory 
runtime library that the build systems must utilize to get their exceptions 
from. Alternatively we could have the calling functions return exit codes as 
well just like a process boundary does, however that is also not natural in 
Python and is more natural in a language like C.

The main downside to the CLI approach is that it’s harder for the build system 
to send structured information back to the calling process outside of defined 
error code. However I do not believe that is particularly difficult since we 
can have it do something like send messages on stdout that are JSON encoded 
messages that pip can process and understand. I don’t think that it’s a 
requirement or even useful that the same CLI that end users would use to 
directly invoke that build system is the same one that pip would use to invoke 
that build system. So we wouldn’t need to worry about the fact that a bunch of 
JSON blobs being put o

Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
In case it's useful to make this discussion more concrete, here's a
sketch of what the pip code for dealing with a build system defined by
a Python API might look like:

https://gist.github.com/njsmith/75818a6debbce9d7ff48

Obviously there's room to build on this to get much fancier, but
AFAICT even this minimal version is already enough to correctly handle
all the important stuff -- schema version checking, error reporting,
full args/kwargs/return values. (It does assume that we'll only use
json-serializable data structures for argument and return values, but
that seems like a good plan anyway. Pickle would probably be a bad
idea because we're crossing between two different python environments
that may have different or incompatible packages/classes available.)

-n

On Wed, Nov 11, 2015 at 1:04 AM, Nathaniel Smith  wrote:
> On Tue, Nov 10, 2015 at 11:27 PM, Robert Collins
>  wrote:
>> On 11 November 2015 at 19:49, Nick Coghlan  wrote:
>>> On 11 November 2015 at 16:19, Robert Collins  
>>> wrote:
>> ...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
 requirement unless Python's very fundamental import behaviour changes.
 Slapping a Python API on things is lipstick on a pig here IMO: we're
 going to have to downgrade any richer interface; and by specifying the
 actual LCD as the interface it is then amenable to direct exploration
 by users without them having to reverse engineer an undocumented thunk
 within pip.
>>>
>>> I'm not opposed to documenting how pip talks to its worker CLI - I
>>> just share Nathan's concerns about locking that down in a PEP vs
>>> keeping *that* CLI within pip's boundary of responsibilities, and
>>> having a documented Python interface used for invoking build systems.
>>
>> I'm also very wary of something that would be an attractive nuisance.
>> I've seen nothing suggesting that a Python API would be anything but:
>>  - it won't be usable [it requires the glue to set up an isolated
>> context, which is buried in pip] in the general case
>
> This is exactly as true of a command line API -- in the general case
> it also requires the glue to set up an isolated context. People who go
> ahead and run 'flit' from their global environment instead of in the
> isolated build environment will experience exactly the same problems
> as people who go ahead and import 'flit.build_system_api' in their
> global environment, so I don't see how one is any more of an
> attractive nuisance than the other?
>
> AFAICT the main difference is that "setting up a specified Python
> context and then importing something and exploring its API" is
> literally what I do all day as a Python developer. Either way you have
> to set stuff up, and then once you do, in the Python API case you get
> stuff like tab completion, ipython introspection (? and ??), etc. for
> free.
>
>>  - no matter what we do, pip can't benefit from it beyond the
>> subprocess interface pip needs, because pip *cannot* import and use
>> the build interface
>
> Not sure what you mean by "benefit" here. At best this is an argument
> that the two options have similar capabilities, in which case I would
> argue that we should choose the one that leads to simpler and thus
> more probably bug-free specification language.
>
> But even this isn't really true -- the difference between them is that
> either way you have a subprocess API, but with a Python API, the
> subprocess interface that pip uses has the option of being improved
> incrementally over time -- including, potentially, to take further
> advantage of the underlying richness of the Python semantics. Sure,
> maybe the first release would just take all exceptions and map them
> into some text printed to stderr and a non-zero return code, and
> that's all that pip would get. But if someone had an idea for how pip
> could do better than this by, I dunno, encoding some structured
> metadata about the particular exception that occurred and passing this
> back up to pip to do something intelligent with it, they absolutely
> could write the code and submit a PR to pip, without having to write a
> new PEP.
>
>> tl;dr - I think making the case that the layer we define should be a
>> Python protocol rather than a subprocess protocol requires some really
>> strong evidence. We're *not* dealing with the same moving parts that
>> typical Python stuff requires.
>
> I'm very confused and honestly do not understand what you find
> attractive about the subprocess protocol approach. Even your arguments
> above aren't really even trying to be arguments that it's good, just
> arguments that the Python API approach isn't much better. I'm sure
> there is some reason you like it, and you might even have said it but
> I missed it because I disagreed or something :-). But literally the
> only reason I can think of right now for why one would prefer the
> subprocess approach is that it lets one remove 50 lines of "worker
> process" code from pip and move them into 

Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-11 Thread Nathaniel Smith
On Tue, Nov 10, 2015 at 11:27 PM, Robert Collins
 wrote:
> On 11 November 2015 at 19:49, Nick Coghlan  wrote:
>> On 11 November 2015 at 16:19, Robert Collins  
>> wrote:
> ...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
>>> requirement unless Python's very fundamental import behaviour changes.
>>> Slapping a Python API on things is lipstick on a pig here IMO: we're
>>> going to have to downgrade any richer interface; and by specifying the
>>> actual LCD as the interface it is then amenable to direct exploration
>>> by users without them having to reverse engineer an undocumented thunk
>>> within pip.
>>
>> I'm not opposed to documenting how pip talks to its worker CLI - I
>> just share Nathan's concerns about locking that down in a PEP vs
>> keeping *that* CLI within pip's boundary of responsibilities, and
>> having a documented Python interface used for invoking build systems.
>
> I'm also very wary of something that would be an attractive nuisance.
> I've seen nothing suggesting that a Python API would be anything but:
>  - it won't be usable [it requires the glue to set up an isolated
> context, which is buried in pip] in the general case

This is exactly as true of a command line API -- in the general case
it also requires the glue to set up an isolated context. People who go
ahead and run 'flit' from their global environment instead of in the
isolated build environment will experience exactly the same problems
as people who go ahead and import 'flit.build_system_api' in their
global environment, so I don't see how one is any more of an
attractive nuisance than the other?

AFAICT the main difference is that "setting up a specified Python
context and then importing something and exploring its API" is
literally what I do all day as a Python developer. Either way you have
to set stuff up, and then once you do, in the Python API case you get
stuff like tab completion, ipython introspection (? and ??), etc. for
free.

>  - no matter what we do, pip can't benefit from it beyond the
> subprocess interface pip needs, because pip *cannot* import and use
> the build interface

Not sure what you mean by "benefit" here. At best this is an argument
that the two options have similar capabilities, in which case I would
argue that we should choose the one that leads to simpler and thus
more probably bug-free specification language.

But even this isn't really true -- the difference between them is that
either way you have a subprocess API, but with a Python API, the
subprocess interface that pip uses has the option of being improved
incrementally over time -- including, potentially, to take further
advantage of the underlying richness of the Python semantics. Sure,
maybe the first release would just take all exceptions and map them
into some text printed to stderr and a non-zero return code, and
that's all that pip would get. But if someone had an idea for how pip
could do better than this by, I dunno, encoding some structured
metadata about the particular exception that occurred and passing this
back up to pip to do something intelligent with it, they absolutely
could write the code and submit a PR to pip, without having to write a
new PEP.

> tl;dr - I think making the case that the layer we define should be a
> Python protocol rather than a subprocess protocol requires some really
> strong evidence. We're *not* dealing with the same moving parts that
> typical Python stuff requires.

I'm very confused and honestly do not understand what you find
attractive about the subprocess protocol approach. Even your arguments
above aren't really even trying to be arguments that it's good, just
arguments that the Python API approach isn't much better. I'm sure
there is some reason you like it, and you might even have said it but
I missed it because I disagreed or something :-). But literally the
only reason I can think of right now for why one would prefer the
subprocess approach is that it lets one remove 50 lines of "worker
process" code from pip and move them into the individual build
backends instead, which I guess is a win if one is focused narrowly on
pip itself. But surely there is more I'm missing?

(And even this is lines-of-code argument is actually pretty dubious --
right now your draft PEP is importing-by-reference an entire existing
codebase (!) for shell variable expansion in command lines, which is
code that simply doesn't need to exist in the Python API approach. I'd
be willing to bet that your approach requires more code in pip than
mine :-).)

>> However, I've now realised that we're not constrained even if we start
>> with the CLI interface, as there's still a migration path to a Python
>> API based model:
>>
>> Now: documented CLI for invoking build systems
>> Future: documented Python API for invoking build systems, default
>> fallback invokes the documented CLI
>
> Or we just issue an updated bootstrap schema, and there's no fallback
> or anything needed.

Oh no! But this totally gi

Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Robert Collins
On 11 November 2015 at 19:49, Nick Coghlan  wrote:
> On 11 November 2015 at 16:19, Robert Collins  
> wrote:
...>> pip is going to be invoking a CLI *no matter what*. Thats a hard
>> requirement unless Python's very fundamental import behaviour changes.
>> Slapping a Python API on things is lipstick on a pig here IMO: we're
>> going to have to downgrade any richer interface; and by specifying the
>> actual LCD as the interface it is then amenable to direct exploration
>> by users without them having to reverse engineer an undocumented thunk
>> within pip.
>
> I'm not opposed to documenting how pip talks to its worker CLI - I
> just share Nathan's concerns about locking that down in a PEP vs
> keeping *that* CLI within pip's boundary of responsibilities, and
> having a documented Python interface used for invoking build systems.

I'm also very wary of something that would be an attractive nuisance.
I've seen nothing suggesting that a Python API would be anything but:
 - it won't be usable [it requires the glue to set up an isolated
context, which is buried in pip] in the general case
 - no matter what we do, pip can't benefit from it beyond the
subprocess interface pip needs, because pip *cannot* import and use
the build interface

tl;dr - I think making the case that the layer we define should be a
Python protocol rather than a subprocess protocol requires some really
strong evidence. We're *not* dealing with the same moving parts that
typical Python stuff requires.

> However, I've now realised that we're not constrained even if we start
> with the CLI interface, as there's still a migration path to a Python
> API based model:
>
> Now: documented CLI for invoking build systems
> Future: documented Python API for invoking build systems, default
> fallback invokes the documented CLI

Or we just issue an updated bootstrap schema, and there's no fallback
or anything needed.

> So the CLI documented in the PEP isn't *necessarily* going to be the
> one used by pip to communicate into the build environment - it may be
> invoked locally within the build environment.

No, it totally will be. Exactly as setup.py is today. Thats
deliberate: The *new* thing we're setting out to enable is abstract
build systems, not reengineering pip.

The future - sure, someone can write a new thing, and the necessary
capability we're building in to allow future changes will allow a new
PEP to slot in easily and take on that [non trivial and substantial
chunk of work]. (For instance, how do you do compiler and build system
specific options when you have a CLI to talk to pip with)?

-Rob


-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Nick Coghlan
On 11 November 2015 at 16:19, Robert Collins  wrote:
> On 11 November 2015 at 18:53, Nick Coghlan  wrote:
>> On 11 November 2015 at 08:44, Nathaniel Smith  wrote:
>>> On Mon, Nov 9, 2015 at 6:11 PM, Robert Collins
>>>  wrote:
 On 10 November 2015 at 15:03, Nathaniel Smith  wrote:
>>> Similarly, we still have to specify how what the different operations
>>> are, what arguments they take, how they signal errors, etc. The point
>>> though is this specification will be shorter and simpler if we're
>>> specifying Python APIs than if we're specifying IPC APIs, because with
>>> a Python API we get to assume the existence of things like data
>>> structures and kwargs and exceptions and return values instead of
>>> having to build them from scratch.
>>
>> I think the potentially improved quality of error handling arising
>> from a Python API based approach is well worth taking into account.
>> When the backend interface is CLI based, you're limited to:
>>
>> 1. The return code
>> 2. Typically unstructured stderr output
>>
>> This isn't like HTTP+JSON, where there's an existing rich suite of
>> well-defined error codes to use, and an ability to readily include
>> error details in the reply payload.
>>
>> The other thing is that if the core interface is Python API based,
>> then if no hook is specified, there can be a default provider in pip
>> that knows how to invoke the setup.py CLI (or perhaps even implements
>> looking up the CLI to invoke from the source tree metadata).
>
> Its richer, which is both a positive and a negative. I appreciate the
> arguments, but I'm not convinced at this point.
>
> pip is going to be invoking a CLI *no matter what*. Thats a hard
> requirement unless Python's very fundamental import behaviour changes.
> Slapping a Python API on things is lipstick on a pig here IMO: we're
> going to have to downgrade any richer interface; and by specifying the
> actual LCD as the interface it is then amenable to direct exploration
> by users without them having to reverse engineer an undocumented thunk
> within pip.

I'm not opposed to documenting how pip talks to its worker CLI - I
just share Nathan's concerns about locking that down in a PEP vs
keeping *that* CLI within pip's boundary of responsibilities, and
having a documented Python interface used for invoking build systems.

However, I've now realised that we're not constrained even if we start
with the CLI interface, as there's still a migration path to a Python
API based model:

Now: documented CLI for invoking build systems
Future: documented Python API for invoking build systems, default
fallback invokes the documented CLI

So the CLI documented in the PEP isn't *necessarily* going to be the
one used by pip to communicate into the build environment - it may be
invoked locally within the build environment.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Robert Collins
On 11 November 2015 at 18:53, Nick Coghlan  wrote:
> On 11 November 2015 at 08:44, Nathaniel Smith  wrote:
>> On Mon, Nov 9, 2015 at 6:11 PM, Robert Collins
>>  wrote:
>>> On 10 November 2015 at 15:03, Nathaniel Smith  wrote:
>> Similarly, we still have to specify how what the different operations
>> are, what arguments they take, how they signal errors, etc. The point
>> though is this specification will be shorter and simpler if we're
>> specifying Python APIs than if we're specifying IPC APIs, because with
>> a Python API we get to assume the existence of things like data
>> structures and kwargs and exceptions and return values instead of
>> having to build them from scratch.
>
> I think the potentially improved quality of error handling arising
> from a Python API based approach is well worth taking into account.
> When the backend interface is CLI based, you're limited to:
>
> 1. The return code
> 2. Typically unstructured stderr output
>
> This isn't like HTTP+JSON, where there's an existing rich suite of
> well-defined error codes to use, and an ability to readily include
> error details in the reply payload.
>
> The other thing is that if the core interface is Python API based,
> then if no hook is specified, there can be a default provider in pip
> that knows how to invoke the setup.py CLI (or perhaps even implements
> looking up the CLI to invoke from the source tree metadata).

Its richer, which is both a positive and a negative. I appreciate the
arguments, but I'm not convinced at this point.

pip is going to be invoking a CLI *no matter what*. Thats a hard
requirement unless Python's very fundamental import behaviour changes.
Slapping a Python API on things is lipstick on a pig here IMO: we're
going to have to downgrade any richer interface; and by specifying the
actual LCD as the interface it is then amenable to direct exploration
by users without them having to reverse engineer an undocumented thunk
within pip.

-Rob

-- 
Robert Collins 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Nick Coghlan
On 11 November 2015 at 08:44, Nathaniel Smith  wrote:
> On Mon, Nov 9, 2015 at 6:11 PM, Robert Collins
>  wrote:
>> On 10 November 2015 at 15:03, Nathaniel Smith  wrote:
> Similarly, we still have to specify how what the different operations
> are, what arguments they take, how they signal errors, etc. The point
> though is this specification will be shorter and simpler if we're
> specifying Python APIs than if we're specifying IPC APIs, because with
> a Python API we get to assume the existence of things like data
> structures and kwargs and exceptions and return values instead of
> having to build them from scratch.

I think the potentially improved quality of error handling arising
from a Python API based approach is well worth taking into account.
When the backend interface is CLI based, you're limited to:

1. The return code
2. Typically unstructured stderr output

This isn't like HTTP+JSON, where there's an existing rich suite of
well-defined error codes to use, and an ability to readily include
error details in the reply payload.

The other thing is that if the core interface is Python API based,
then if no hook is specified, there can be a default provider in pip
that knows how to invoke the setup.py CLI (or perhaps even implements
looking up the CLI to invoke from the source tree metadata).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Nathaniel Smith
On Mon, Nov 9, 2015 at 6:11 PM, Robert Collins
 wrote:
> On 10 November 2015 at 15:03, Nathaniel Smith  wrote:
>> On Sun, Nov 8, 2015 at 5:28 PM, Robert Collins
>>  wrote:
>>> +The use of a command line API rather than a Python API is a little
>>> +contentious. Fundamentally anything can be made to work, and Robert wants 
>>> to
>>> +pick something thats sufficiently lowest common denominator that
>>> +implementation is straight forward on all sides. Picking a CLI for that 
>>> makes
>>> +sense because all build systems will need a CLI for end users to use 
>>> anyway.
>>
>> I agree that this is not terribly important, and anything can be made
>> to work. Having pondered it all for a few more weeks though I think
>> that the "entrypoints-style" interface actually is unambiguously
>> better, so let me see about making that case.
>>
>> What's at stake?
>> --
>>
>> Option 1, as in Robert's PEP:
>>
>> The build configuration file contains a string like "flit
>> --dump-build-description" (or whatever), which names a command to run,
>> and then a protocol for running this command to get information on the
>> actual build system interface. Build operations are performed by
>> executing these commands as subprocesses.
>>
>> Option 2, my preference:
>>
>> The build configuration file contains a string like
>> "flit:build_system_api" (or whatever) which names a Python object
>> accessed like
>>
>>   import flit
>>   flit.build_system_api
>>
>> (This is the same syntax used for naming entry points.) Which would
>> then have attributes and methods describing the actual build system
>> interface. Build operations are performed by calling these methods.
>
> Option 3 expressed by Donald on IRC

Where is this IRC channel, btw? :-)

> (and implied by his 'smaller step'
> email - hard code the CLI).
>
> A compromise position from 'setup.py  the 'setup.py' step in pypa.json, but define the rest as a fixed
> contract, e.g. with subcommands like wheel, metadata etc. This drops
> the self describing tool blob and the caching machinery.

So this would give up on having schema versioning for the API, I guess?

> I plan on using that approach in my next draft.
>
> Your point about bugs etc is interesting, but the use of stdin etc in
> a dedicated Python API also needs to be specified.

Yes, but this specification is trivial:

"Stdin is unspecified, and stdout/stderr can be used for printing
status messages, errors, etc. just like you're used to from every
other build system in the world."

Similarly, we still have to specify how what the different operations
are, what arguments they take, how they signal errors, etc. The point
though is this specification will be shorter and simpler if we're
specifying Python APIs than if we're specifying IPC APIs, because with
a Python API we get to assume the existence of things like data
structures and kwargs and exceptions and return values instead of
having to build them from scratch.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-10 Thread Paul Moore
On 10 November 2015 at 04:03, Marcus Smith  wrote:
> although I wasn't arguing for it in that context, but rather just using it
> to be clear that a python api approach could still be used with build
> environment isolation

Which is a good point - it's easy enough to write adapters from one
convention to another (I'm inclined to think it's easier to adapt a
Python API to a CLI interface than the other way around, but I may be
wrong about that).

Paul
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-09 Thread Marcus Smith
> Because even if we go with the entry-point-style Python
> hooks, the build frontends like pip will still want to spawn a child
> to do the actual calls -- this is important for isolating pip from the
> build backend and the build backend from pip, it's important because
> the build backend needs to execute in a different environment than pip
> itself, etc.

[...]
> Concretely, the way I imagine this would work is that pip would set up
> the build environment, and then it would run
>
>   build-environment/bin/python path/to/pip-worker-script.py 
>

fwiw, such a worker is what I was describing in an earlier thread with
Robert last work

https://mail.python.org/pipermail/distutils-sig/2015-October/027443.html

although I wasn't arguing for it in that context, but rather just using it
to be clear that a python api approach could still be used with build
environment isolation
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


Re: [Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-09 Thread Robert Collins
On 10 November 2015 at 15:03, Nathaniel Smith  wrote:
> On Sun, Nov 8, 2015 at 5:28 PM, Robert Collins
>  wrote:
>> +The use of a command line API rather than a Python API is a little
>> +contentious. Fundamentally anything can be made to work, and Robert wants to
>> +pick something thats sufficiently lowest common denominator that
>> +implementation is straight forward on all sides. Picking a CLI for that 
>> makes
>> +sense because all build systems will need a CLI for end users to use anyway.
>
> I agree that this is not terribly important, and anything can be made
> to work. Having pondered it all for a few more weeks though I think
> that the "entrypoints-style" interface actually is unambiguously
> better, so let me see about making that case.
>
> What's at stake?
> --
>
> Option 1, as in Robert's PEP:
>
> The build configuration file contains a string like "flit
> --dump-build-description" (or whatever), which names a command to run,
> and then a protocol for running this command to get information on the
> actual build system interface. Build operations are performed by
> executing these commands as subprocesses.
>
> Option 2, my preference:
>
> The build configuration file contains a string like
> "flit:build_system_api" (or whatever) which names a Python object
> accessed like
>
>   import flit
>   flit.build_system_api
>
> (This is the same syntax used for naming entry points.) Which would
> then have attributes and methods describing the actual build system
> interface. Build operations are performed by calling these methods.

Option 3 expressed by Donald on IRC (and implied by his 'smaller step'
email - hard code the CLI).

A compromise position from 'setup.py 
Distinguished Technologist
HP Converged Cloud
___
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig


[Distutils] command line versus python API for build system abstraction (was Re: build system abstraction PEP)

2015-11-09 Thread Nathaniel Smith
On Sun, Nov 8, 2015 at 5:28 PM, Robert Collins
 wrote:
> +The use of a command line API rather than a Python API is a little
> +contentious. Fundamentally anything can be made to work, and Robert wants to
> +pick something thats sufficiently lowest common denominator that
> +implementation is straight forward on all sides. Picking a CLI for that makes
> +sense because all build systems will need a CLI for end users to use anyway.

I agree that this is not terribly important, and anything can be made
to work. Having pondered it all for a few more weeks though I think
that the "entrypoints-style" interface actually is unambiguously
better, so let me see about making that case.

What's at stake?
--

Option 1, as in Robert's PEP:

The build configuration file contains a string like "flit
--dump-build-description" (or whatever), which names a command to run,
and then a protocol for running this command to get information on the
actual build system interface. Build operations are performed by
executing these commands as subprocesses.

Option 2, my preference:

The build configuration file contains a string like
"flit:build_system_api" (or whatever) which names a Python object
accessed like

  import flit
  flit.build_system_api

(This is the same syntax used for naming entry points.) Which would
then have attributes and methods describing the actual build system
interface. Build operations are performed by calling these methods.

Why does it matter?


First, to be clear: I think that no matter which choice we make here,
the final actual execution path is going to end up looking very
similar. Because even if we go with the entry-point-style Python
hooks, the build frontends like pip will still want to spawn a child
to do the actual calls -- this is important for isolating pip from the
build backend and the build backend from pip, it's important because
the build backend needs to execute in a different environment than pip
itself, etc. So no matter what, we're going to have some subprocess
calls and some IPC.

The difference is that in the subprocess approach, the IPC machinery
is all written into the spec, and build frontends like pip implement
one half while build backends implement the other half. In the Python
API approach, the spec just specifies the Python calling conventions,
and both halves of the IPC code live are implemented inside each build
backend.

Concretely, the way I imagine this would work is that pip would set up
the build environment, and then it would run

  build-environment/bin/python path/to/pip-worker-script.py 

where pip-worker-script.py is distributed as part of pip. (In simple
cases it could simply be a file inside pip's package directory; if we
want to support execution from pip-inside-a-zip-file then we need a
bit of code to unpack it to a tempfile before executing it. Creating a
tempfile is not a huge additional burden given that by the time we
call build hooks we will have already created a whole temporary python
environment...)

In the subprocess approach, we have to write a ton of text describing
all the intricacies of IPC. We have to specify how the command line
gets split (or is it passed to the shell?), and specify a JSON-based
protocol, and what happens to stdin/stdout/stderr, and etc. etc. In
the Python API approach, we still have to do all the work of figuring
these things out, but they would live inside pip's code, instead of in
a PEP. The actual PEP text would be much smaller.

It's not clear which approach leads to smaller code overall. If there
are F frontends and B backends, then in the subprocess approach we
collectively have to write F+B pieces of IPC code, and in the Python
API approach we collectively have to write 2*F pieces of IPC code. So
on this metric the Python API is a win if F < B, which would happen if
e.g. everyone ends up using pip for their frontend but with lots of
different backends, which seems plausible? But who knows.

But now suppose that there's some bug in that complicated IPC protocol
(which I would rate as about a 99.3% likelihood in our first attempt,
because cross-platform compatible cross-process IPC is super annoying
and fiddly). In the subprocess approach, fixing this means that we
need to (a) write a PEP, and then (b) fix F+B pieces of code
simultaneously on some flag day, and possibly test F*B combinations
for correct interoperation. In the Python API approach, fixing this
means patching whichever frontend has the bug, no PEPs or flag days
necessary.

In addition, the ability to evolve the two halves of the IPC channel
together allows for better efficiency. For example, in Robert's
current PEP there's some machinery added that hopes to let pip cache
the result of the "--dump-build-description" call. This is needed
because in the subprocess approach, the minimum number of subprocess
calls you need to do something is two: one to ask what command to
call, and a second to actually execute the