On 6 June 2017 at 14:32, Vincent Guittot <[email protected]> wrote:
> On 6 June 2017 at 14:25, Neil Williams <[email protected]> wrote:
>> On 6 June 2017 at 13:11, Vincent Guittot <[email protected]> wrote:
>>> On 6 June 2017 at 14:03, Neil Williams <[email protected]> wrote:
>>>> On 6 June 2017 at 12:53, Vincent Guittot <[email protected]> 
>>>> wrote:
>>>>> On 6 June 2017 at 13:38, Neil Williams <[email protected]> wrote:
>>>>>> This problem has been resolved inside the arm-probe configuration, it
>>>>>> is not a fault within LAVA. There was a concern that the probe was not
>>>>>> showing data output because of a theoretical problem of running
>>>>>> daemonized instead of with a controlling terminal. The actual problem
>>>>>> was that the probe software is running more slowly than expected and
>>>>>> extending the runtime of the utility allows the probe to output data.
>>>>>> https://staging.validation.linaro.org/scheduler/job/175033#L2038
>>>>>> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb
>>>>>
>>>>> ok so the 2seconds for timeout was your problem
>>>>
>>>> That and the problem with the config file.
>>>
>>> ok
>>>
>>>>
>>>>>> (The verbose option was later dropped to output only the interesting 
>>>>>> data.)
>>>>>>
>>>>>> The configuration file in the git repo needs to be modified.
>>>>>>
>>>>>> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd
>>>>>
>>>>> can you point out the modification you did that has been needed ? I
>>>>> can't see any obvious difference except using /dev/ttyACM0 instead of
>>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00.
>>>>> Is it the difference ?
>>>>
>>>> Yes, because inside the LXC, /dev/serial/by-id does not get created
>>>> (there is no udev support for that inside containers).
>>>>
>>>>> What about using 2 AEPs ?
>>>>
>>>> That would have to be fixed either in the test shell definitions (e.g.
>>>> using parameters passed through the test job) or within the arm-probe
>>>> code itself. I have no idea at this stage whether the arm-probe
>>>> software can cope with multiple probes - in LAVA that would likely
>>>
>>> arm-probe supports multi AEP and we are using with multi AEPs with the
>>> mtk8173 evb.
>>> arm-probe just rely of the config file to get the path of the AEP. I
>>> have put the content of the config file below:
>>>
>>> # arm-probe configuration file
>>> #
>>> # setup name
>>> mt8173-evb
>>>
>>> # <device path>
>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00
>>>  VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A57/Cache A57_CACHE #ff0000 SoC
>>>  VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A57/Core0 A57_CORE #ff0000 SoC
>>>  VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A57/Core1 A57_CORE #ff0000 SoC
>>>
>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00
>>>  VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A53/Cache A53_CACHE #ff0000 SoC
>>>  VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A53/Core0 A53_CORE #ff0000 SoC
>>>  VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0
>>> SoC/A53/Core1 A53_CORE #ff0000 SoC
>>
>> These configuration files may need to be generated within the test
>> shell definition at runtime, based on parameters. The test shell will
>> need to work out which device is which probe and this could be awkward
>> without /dev/serial/by-id support. The enumeration order of ttyUSB0
>> and ttyUSB1 cannot be guaranteed. dmesg remains available inside the
>> LXC, so some automated parsing may be required. If the arm-probe
>
> To be honest i don't like such way to proceed it is just error prone
>
>> software can be modified to use a more sane configuration file syntax,
>> this could also be addressed there.
>
> I don't catch why the config file is insane and how this will help for
> this problem

If the config file is to be generated for each test job, the syntax is
awkward to handle as it would need a line inserted instead of
supporting a parser or similar.

>>>> need secondary connections and MultiNode to separate the output.
>>>
>>> Is it something that Lisa can do by herself or does it need some
>>> changes from your side ?
>>
>> Secondary connections and MultiNode can be adopted by test writers
>> without any changes in LAVA.
>>
>> https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4
>> https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#index-0
>>
>> Any testjob using MultiNode has a certain level of complexity, so the
>> change is non-trivial.
>
> Does it also mean that the datas of the 2 probes will not be in the
> same file whereas arm-probe already merge datas from multi AEP in its
> config file into one single output

OK, then if that is what is desired then this can be done without
using secondary connections and therefore without MultiNode. I was
expecting that the two would run simultaneously, causing issues with
interleaving.


>> Note also that physically fitting more AEPs will involve work by the
>> LAB team - especially for devices like the panda, because the power
>> connector which comes with the AEP does not fit the panda and a
>> one-off daughter board is required.
>
> This is something that has been already handled and in the case of the
> mt8173evb everything is already done and working on our server with
> current arm-probe, AEPs and workload automation



> Regards,
> Vincent
>>
>>
>>> Regards,
>>> Vincent
>>>
>>>>
>>>> The syntax of the arm-probe configuration file does not make this easy
>>>> but that section could be patched to use a more sane structure. That
>>>> isn't related to the LAVA support though.
>>>>
>>>>>>
>>>>>>
>>>>>> On 29 May 2017 at 16:45, Vincent Guittot <[email protected]> 
>>>>>> wrote:
>>>>>>> On 25 May 2017 at 10:03, Neil Williams <[email protected]> wrote:
>>>>>>>> On Wed, 24 May 2017 21:07:45 +0200
>>>>>>>> Vincent Guittot <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Neil,
>>>>>>>>>
>>>>>>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a
>>>>>>>>> écrit :
>>>>>>>>>
>>>>>>>>> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote:
>>>>>>>>> > On Fri, 19 May 2017 17:02:14 +0100
>>>>>>>>> > Neil Williams <[email protected]> wrote:
>>>>>>>>> >
>>>>>>>>> >> On Fri, 19 May 2017 16:48:11 +0100
>>>>>>>>> >> Steve McIntyre <[email protected]> wrote:
>>>>>>>>> >>
>>>>>>>>> >> > Hi folks!
>>>>>>>>> >> >
>>>>>>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>>>>>>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>>>>>>>>> >> > >Neil Williams <[email protected]> wrote:
>>>>>>>>> >> > >
>>>>>>>>> >> >
>>>>>>>>> >> > I've just run a local test with an AEP inside lxc on my local
>>>>>>>>> >> > machine. As far as I can see, there's nothing particularly magic
>>>>>>>>> >> > going on here. The only problem I see is Lisa's config file
>>>>>>>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>>>>>>>>> >> > device to talk to. Using:
>>>>>>>>> >> >
>>>>>>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>>>>>>>>> >> >
>>>>>>>>> >> > I create that device in my container. I build libwebsockets and
>>>>>>>>> >> > the arm-probe software in the container, then
>>>>>>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>>>>>>>>> >> > fine:
>>>>>>>>> >> >
>>>>>>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>>>>>>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>>>>>>>>> >> > # config_name: pandaboard
>>>>>>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>>>>>>>>> >> > 400us Configuration: pandaboard
>>>>>>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>>>>>>>>> >> > # host: lxc-aep-test-174524
>>>>>>>>> >> > #
>>>>>>>>> >> > + /dev/ttyACM0
>>>>>>>>> >> > Starting...
>>>>>>>>> >> > sending start to 0
>>>>>>>>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>>>>>>>>> >> > #
>>>>>>>>> >> > #
>>>>>>>>> >> > time  VDD(V) VDD(A) VDD(W)
>>>>>>>>> >> > 0.000500  5.11 0.0474 0.24196
>>>>>>>>> >> > 0.000600  5.11 0.0364 0.18572
>>>>>>>>> >> > 0.000700  5.11 0.0314 0.16012
>>>>>>>>> >> > 0.000800  5.10 0.0544 0.27734
>>>>>>>>> >> > 0.000900  5.10 0.0234 0.11923
>>>>>>>>> >> > 0.001000  5.11 0.0304 0.15505
>>>>>>>>> >> > ...
>>>>>>>>> >> >
>>>>>>>>> >> > I don't have any problems running things and getting output here.
>>>>>>>>> >> >
>>>>>>>>> >> > I *have* seen two real bugs here while trying to get things
>>>>>>>>> >> > running, though:
>>>>>>>>> >> >
>>>>>>>>> >> >  1. If the device specified in the config file doesn't exist, or
>>>>>>>>> >> > is the wrong type of device, or (maybe) there is any other kind
>>>>>>>>> >> > of problem with it, you get *no* useful feedback to say there's a
>>>>>>>>> >> >     problem. Running things under strace will show the background
>>>>>>>>> >> >     libarmep process attempt to use the device specified, but
>>>>>>>>> >> > there's no error handling. :-(
>>>>>>>>> >> >
>>>>>>>>> >> > 2. The "-x" option says that the arm-probe program is meant to
>>>>>>>>> >> > exit when you've done capturing, but it just sits there forever
>>>>>>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>>>>>>>>> >> > work around that for now.
>>>>>>>>> >> >
>>>>>>>>> >> > If I knew where to file those bugs, I would, but it's really not
>>>>>>>>> >> > obvious. They're really easy to reproduce, I hope...
>>>>>>>>> >> >
>>>>>>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>>>>>>>>> >> > says that it creates devices based on their existing entries on
>>>>>>>>> >> > the host. Double-check that the host (dispatcher) has an
>>>>>>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>>>>>>>>> >>
>>>>>>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>>>>>>>>> >> been using for the tests of the new code to ensure
>>>>>>>>> >> that /dev/ttyACM0 can be attached to the LXC.
>>>>>>>>> >>
>>>>>>>>> >> That panda and AEP will shortly return to staging and then the
>>>>>>>>> >> changes to LAVA and the required changes to the test definition
>>>>>>>>> >> can be available for the 2017.6 release.
>>>>>>>>> >
>>>>>>>>> > OK. staging-panda03 is back and has been running tests. This is what
>>>>>>>>> > we've learnt so far:
>>>>>>>>> >
>>>>>>>>> > 0: This does not appear to be an LXC issue. Running the commands
>>>>>>>>> > manually on the worker with the same LXC on the same worker does
>>>>>>>>> > return data from the probe.
>>>>>>>>> >
>>>>>>>>> > 1: Running the same commands in "headless" mode shows that the probe
>>>>>>>>> > software starts successfully but something within the protocol
>>>>>>>>> > parser or sampler fails to retrieve data.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What do you mean by headless mode?
>>>>>>>>
>>>>>>>> With no controlling terminal.
>>>>>>>>
>>>>>>>> LAVA runs as a daemon and forks processes to run the tests. This does
>>>>>>>> not usually cause issues and is fundamental to automation. When I run
>>>>>>>> the same commands in an LXC as a user logged into the machine, I get
>>>>>>>> output. When I run the commands from a daemon, the output is not seen.
>>>>>>>
>>>>>>> even when you redirect the output to a file ?
>>>>>>>
>>>>>>> On workload automation, arm_probe is called in a dedicated process
>>>>>>> with subprocess.Popen and we are able to get data in the file.
>>>>>>> Just wonder what could be the difference in lava case
>>>>>>>
>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > 2: The websockets dependency is completely unnecessary and has been
>>>>>>>>> > disabled in the build I've been testing:
>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes. I do the same. aepd is only useful for the web interface.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > 3: We've added a *lot* of debug to the arm-probe code
>>>>>>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>>>>>>>>> > was run using
>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>>>>>>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>>>>>>>>> > but are not much closer to identifying the precise problem with the
>>>>>>>>> > code. However, I am satisfied that this is a problem in the
>>>>>>>>> > arm-probe software when being run in automation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you give details about "this is a problem in arm probe software
>>>>>>>>> when being run in automation"? Do you mean workload automation?
>>>>>>>>
>>>>>>>> No. Not workload automation - that is a specific test framework which
>>>>>>>> can use LAVA. I'm talking about the process of running tests on behalf
>>>>>>>> of users without the users being logged in or interacting with the
>>>>>>>> shell.
>>>>>>>
>>>>>>> ok. Just to be sure about the context
>>>>>>>
>>>>>>>>
>>>>>>>>> >
>>>>>>>>> > 4: the arm-probe code is appallingly difficult to read and debug. It
>>>>>>>>> > also seems unnecessarily complex.
>>>>>>>>> >
>>>>>>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>>>>>>>>> > repository (which has also had a few fixes to compile with gcc6) but
>>>>>>>>> > I'm running out of time to work on the arm-probe software myself.
>>>>>>>>> >
>>>>>>>>> > Someone needs to update the arm-probe software:
>>>>>>>>> >
>>>>>>>>> > a) to remove websockets as a compile-time option as this only bloats
>>>>>>>>> > the build in automation where a web based UI is impossible anyway.
>>>>>>>>> > I've done this by brute force in my cloned repo, I just patched out
>>>>>>>>> > the dependency.
>>>>>>>>> >
>>>>>>>>> > b) improve the code to have comments and output about what is
>>>>>>>>> > happening and why when verbose mode is used.
>>>>>>>>> >
>>>>>>>>> > c) Identify what is preventing the software from receiving data from
>>>>>>>>> > the probe when run in automation.
>>>>>>>>> >
>>>>>>>>> > d) the config file still needs fixes to allow for changes in the
>>>>>>>>> > device node name from one probe to another.
>>>>>>>>> >
>>>>>>>>> > --
>>>>>>>>>
>>>>>>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>>>>>>>>> respond (if he has anything to say) while I'm on holiday until early
>>>>>>>>> June.
>>>>>>>>
>>>>>>>> Steve & I are also on annual leave next week.
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>>
>>>>>>>> Neil Williams
>>>>>>>> =============
>>>>>>>> http://www.linux.codehelp.co.uk/
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Neil Williams
>>>>>> =============
>>>>>> [email protected]
>>>>>> http://www.linux.codehelp.co.uk/
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Neil Williams
>>>> =============
>>>> [email protected]
>>>> http://www.linux.codehelp.co.uk/
>>
>>
>>
>> --
>>
>> Neil Williams
>> =============
>> [email protected]
>> http://www.linux.codehelp.co.uk/



-- 

Neil Williams
=============
[email protected]
http://www.linux.codehelp.co.uk/
_______________________________________________
linaro-validation mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to