On 25 May 2017 at 10:03, Neil Williams <[email protected]> wrote:
> On Wed, 24 May 2017 21:07:45 +0200
> Vincent Guittot <[email protected]> wrote:
>
>> Hi Neil,
>>
>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a
>> écrit :
>>
>> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote:
>> > On Fri, 19 May 2017 17:02:14 +0100
>> > Neil Williams <[email protected]> wrote:
>> >
>> >> On Fri, 19 May 2017 16:48:11 +0100
>> >> Steve McIntyre <[email protected]> wrote:
>> >>
>> >> > Hi folks!
>> >> >
>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>> >> > >Neil Williams <[email protected]> wrote:
>> >> > >
>> >> >
>> >> > I've just run a local test with an AEP inside lxc on my local
>> >> > machine. As far as I can see, there's nothing particularly magic
>> >> > going on here. The only problem I see is Lisa's config file
>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>> >> > device to talk to. Using:
>> >> >
>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>> >> >
>> >> > I create that device in my container. I build libwebsockets and
>> >> > the arm-probe software in the container, then
>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>> >> > fine:
>> >> >
>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>> >> > # config_name: pandaboard
>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>> >> > 400us Configuration: pandaboard
>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>> >> > # host: lxc-aep-test-174524
>> >> > #
>> >> > + /dev/ttyACM0
>> >> > Starting...
>> >> > sending start to 0
>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>> >> > #
>> >> > #
>> >> > time  VDD(V) VDD(A) VDD(W)
>> >> > 0.000500  5.11 0.0474 0.24196
>> >> > 0.000600  5.11 0.0364 0.18572
>> >> > 0.000700  5.11 0.0314 0.16012
>> >> > 0.000800  5.10 0.0544 0.27734
>> >> > 0.000900  5.10 0.0234 0.11923
>> >> > 0.001000  5.11 0.0304 0.15505
>> >> > ...
>> >> >
>> >> > I don't have any problems running things and getting output here.
>> >> >
>> >> > I *have* seen two real bugs here while trying to get things
>> >> > running, though:
>> >> >
>> >> >  1. If the device specified in the config file doesn't exist, or
>> >> > is the wrong type of device, or (maybe) there is any other kind
>> >> > of problem with it, you get *no* useful feedback to say there's a
>> >> >     problem. Running things under strace will show the background
>> >> >     libarmep process attempt to use the device specified, but
>> >> > there's no error handling. :-(
>> >> >
>> >> > 2. The "-x" option says that the arm-probe program is meant to
>> >> > exit when you've done capturing, but it just sits there forever
>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>> >> > work around that for now.
>> >> >
>> >> > If I knew where to file those bugs, I would, but it's really not
>> >> > obvious. They're really easy to reproduce, I hope...
>> >> >
>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>> >> > says that it creates devices based on their existing entries on
>> >> > the host. Double-check that the host (dispatcher) has an
>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>> >>
>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>> >> been using for the tests of the new code to ensure
>> >> that /dev/ttyACM0 can be attached to the LXC.
>> >>
>> >> That panda and AEP will shortly return to staging and then the
>> >> changes to LAVA and the required changes to the test definition
>> >> can be available for the 2017.6 release.
>> >
>> > OK. staging-panda03 is back and has been running tests. This is what
>> > we've learnt so far:
>> >
>> > 0: This does not appear to be an LXC issue. Running the commands
>> > manually on the worker with the same LXC on the same worker does
>> > return data from the probe.
>> >
>> > 1: Running the same commands in "headless" mode shows that the probe
>> > software starts successfully but something within the protocol
>> > parser or sampler fails to retrieve data.
>>
>>
>> What do you mean by headless mode?
>
> With no controlling terminal.
>
> LAVA runs as a daemon and forks processes to run the tests. This does
> not usually cause issues and is fundamental to automation. When I run
> the same commands in an LXC as a user logged into the machine, I get
> output. When I run the commands from a daemon, the output is not seen.

even when you redirect the output to a file ?

On workload automation, arm_probe is called in a dedicated process
with subprocess.Popen and we are able to get data in the file.
Just wonder what could be the difference in lava case

>
>> >
>> > 2: The websockets dependency is completely unnecessary and has been
>> > disabled in the build I've been testing:
>> > https://git.linaro.org/lava-team/arm-probe.git/
>>
>>
>> Yes. I do the same. aepd is only useful for the web interface.
>>
>>
>> >
>> > 3: We've added a *lot* of debug to the arm-probe code
>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>> > was run using
>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>> > but are not much closer to identifying the precise problem with the
>> > code. However, I am satisfied that this is a problem in the
>> > arm-probe software when being run in automation.
>>
>>
>> Can you give details about "this is a problem in arm probe software
>> when being run in automation"? Do you mean workload automation?
>
> No. Not workload automation - that is a specific test framework which
> can use LAVA. I'm talking about the process of running tests on behalf
> of users without the users being logged in or interacting with the
> shell.

ok. Just to be sure about the context

>
>> >
>> > 4: the arm-probe code is appallingly difficult to read and debug. It
>> > also seems unnecessarily complex.
>> >
>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>> > repository (which has also had a few fixes to compile with gcc6) but
>> > I'm running out of time to work on the arm-probe software myself.
>> >
>> > Someone needs to update the arm-probe software:
>> >
>> > a) to remove websockets as a compile-time option as this only bloats
>> > the build in automation where a web based UI is impossible anyway.
>> > I've done this by brute force in my cloned repo, I just patched out
>> > the dependency.
>> >
>> > b) improve the code to have comments and output about what is
>> > happening and why when verbose mode is used.
>> >
>> > c) Identify what is preventing the software from receiving data from
>> > the probe when run in automation.
>> >
>> > d) the config file still needs fixes to allow for changes in the
>> > device node name from one probe to another.
>> >
>> > --
>>
>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>> respond (if he has anything to say) while I'm on holiday until early
>> June.
>
> Steve & I are also on annual leave next week.
>
> --
>
>
> Neil Williams
> =============
> http://www.linux.codehelp.co.uk/
>
_______________________________________________
linaro-validation mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to