Re: [Avocado-devel] RFC: avocado.utils.process (or, how to handle the split stdout/stderr recording)

2017-11-08 Thread Cleber Rosa


On 11/08/2017 11:01 AM, Lucas Meneghel Rodrigues wrote:
> On Wed, Nov 8, 2017 at 4:01 PM, Cleber Rosa  wrote:
>> Hi everyone,
>>
>> This is kind of brainstorm about the really annoying issue we've trying
>> to deal with in Avocado.  The root problem we're trying to fix is that
>> Avocado's "output check" feature can not be used when the test's
>> generated content combines both stdout and stderr in a single file.  One
>> example is QEMU's iotests[1].
>>
>> The root cause of this limitation is that avocado.utils.process.run()[2]
>> (because of avocado.utils.process.SubProcess[3]) decided a long time ago
>> to split the content of a process STDOUT and STDERR to separate files
>> (and different logging streams).  This is reflected on Avocado's "output
>> check" feature[4].  Users can choose to record the test's generated
>> STDOUT, STDERR, both (all) or none of them.  This is how the command
>> line options look like:
>>
>>   --output-check-record {none,all,stdout,stderr}
>>Record output streams of your tests to reference files
>>(valid options: none (do not record output streams),
>>all (record both stdout and stderr), stdout (record
>>only stderr), stderr (record only stderr). Current:
>>none
>>
>> Which map to the following recorded files:
>>
>> +--+++
>> | OPTION   | RECORDED FILES | CONTENT|
>> +--+++
>> | none |||
>> +--+++
>> |  | stdout | process FD 1   |
>> + all  +++
>> |  | stderr | process FD 2   |
>> +--+++
>> | stdout   | stdout | process FD 1   |
>> +--+++
>> | stderr   | stderr | process FD 2   |
>> +--+++
>>
>> Notice how the "all" option still creates two separate files, with
>> content coming from individual file descriptors.  Adding another record
>> option would actually be pretty simple, that is, something like:
>>
>> +--+++
>> | OPTION   | RECORDED FILES | CONTENT|
>> +--+++
>> | combined | combined_stdout_stderr | process FD 1 and 2 |
>> +--+++
>>
>> What is *not* easily possible, is to have two options such as "all" and
>> "combined" at the same time.  The reason is that it the order of the
>> content written to "combined_stdout_stderr" can not be guaranteed to be
>> correct.
>>
>> I've attempted to allow for both options to be used at once, by using a
>> pair of file-like objects with specialized writes that would either
>> share single list of records or just share a secondary file with a lock
>> (pseudo/prototype code):
>>
>> class TeeFile(file):
>> def __init__(self, path, tee_path, tee_lock, mode='a'):
>> super(TeeFile, self).__init__(path, mode)
>> self._tee_path = tee_path
>> self._tee_lock = tee_lock
>>
>> def write(self, record):
>> self.tee_lock.acquire()
>> super(TeeFile, self).write(record)
>> with open(self._tee_path, 'a') as tee_file:
>> tee_file.write(record)
>> tee_file.flush()
>> self.tee_lock.release()
>>
>> But the fact that the Python standard library "subprocess.Popen"
>> implementation works with file descriptors doesn't make it possible to
>> implement a shared/synchronized write strategy.  Rewriting parts of
>> "subprocess.Popen" or "avocado.utils.process.SubProcess" is, IMO, a
>> clear sign to not attempt something way more complicated and error prone
>> for the sake of this feature.
> 
> Yes, writing to a synchronized, combined output buffer should be
> possible, but it is as you said, complicated and error prone.
> 
> the SubProcess implementation was written considering that stdout and
> stderr streams are separate in POSIX systems, so it seemed the most
> obvious behavior. If having a combined stream is something important
> for the users, it's worthwhile to add a work item for such an
> implementation. Meanwhile, the solution proposed below is a good
> intermediate step.
> 

Yep, I also think it's the most obvious, and most flexible.  It's just
that Avocado needs to be flexible than "average", and those
unanticipated things can appear later on.

>> Accepting the fact that either combined (stdout+stderr) or a split
>> (stdout, stderr) will be available, the point of option names comes into
>> play.  Both "stdout", 

[Avocado-devel] RFC: avocado.utils.process (or, how to handle the split stdout/stderr recording)

2017-11-08 Thread Cleber Rosa
Hi everyone,

This is kind of brainstorm about the really annoying issue we've trying
to deal with in Avocado.  The root problem we're trying to fix is that
Avocado's "output check" feature can not be used when the test's
generated content combines both stdout and stderr in a single file.  One
example is QEMU's iotests[1].

The root cause of this limitation is that avocado.utils.process.run()[2]
(because of avocado.utils.process.SubProcess[3]) decided a long time ago
to split the content of a process STDOUT and STDERR to separate files
(and different logging streams).  This is reflected on Avocado's "output
check" feature[4].  Users can choose to record the test's generated
STDOUT, STDERR, both (all) or none of them.  This is how the command
line options look like:

  --output-check-record {none,all,stdout,stderr}
   Record output streams of your tests to reference files
   (valid options: none (do not record output streams),
   all (record both stdout and stderr), stdout (record
   only stderr), stderr (record only stderr). Current:
   none

Which map to the following recorded files:

+--+++
| OPTION   | RECORDED FILES | CONTENT|
+--+++
| none |||
+--+++
|  | stdout | process FD 1   |
+ all  +++
|  | stderr | process FD 2   |
+--+++
| stdout   | stdout | process FD 1   |
+--+++
| stderr   | stderr | process FD 2   |
+--+++

Notice how the "all" option still creates two separate files, with
content coming from individual file descriptors.  Adding another record
option would actually be pretty simple, that is, something like:

+--+++
| OPTION   | RECORDED FILES | CONTENT|
+--+++
| combined | combined_stdout_stderr | process FD 1 and 2 |
+--+++

What is *not* easily possible, is to have two options such as "all" and
"combined" at the same time.  The reason is that it the order of the
content written to "combined_stdout_stderr" can not be guaranteed to be
correct.

I've attempted to allow for both options to be used at once, by using a
pair of file-like objects with specialized writes that would either
share single list of records or just share a secondary file with a lock
(pseudo/prototype code):

class TeeFile(file):
def __init__(self, path, tee_path, tee_lock, mode='a'):
super(TeeFile, self).__init__(path, mode)
self._tee_path = tee_path
self._tee_lock = tee_lock

def write(self, record):
self.tee_lock.acquire()
super(TeeFile, self).write(record)
with open(self._tee_path, 'a') as tee_file:
tee_file.write(record)
tee_file.flush()
self.tee_lock.release()

But the fact that the Python standard library "subprocess.Popen"
implementation works with file descriptors doesn't make it possible to
implement a shared/synchronized write strategy.  Rewriting parts of
"subprocess.Popen" or "avocado.utils.process.SubProcess" is, IMO, a
clear sign to not attempt something way more complicated and error prone
for the sake of this feature.

Accepting the fact that either combined (stdout+stderr) or a split
(stdout, stderr) will be available, the point of option names comes into
play.  Both "stdout", "stderr" and "none" are aptly named.  But, with
the introduction of a third *real* output record option/file, "all"
becomes a rather bad option name.

To keep things simple, I'd add an alias from "all" to "both",
deprecating the former.  The new option would be named either "combined"
and the generated file named "combined_stdout_stderr" by default.  It's
a rather verbose name, but I'd go with explicit option names rather than
confusing ones.

Comments?

--

[1] -
https://git.qemu.org/?p=qemu.git;a=blob;f=tests/qemu-iotests/check;h=e6b6ff7a047562099c2fb5f0fd35652780c9c006;hb=HEAD#l757
[2] -
http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.run
[3] -
http://avocado-framework.readthedocs.io/en/55.0/api/utils/avocado.utils.html#avocado.utils.process.SubProcess
[4] -
http://avocado-framework.readthedocs.io/en/55.0/WritingTests.html#test-output-check-and-output-record-mode

-- 
Cleber Rosa
[ Sr Software Engineer - Virtualization Team - Red Hat ]
[ Avocado Test Framework -