On 6 June 2017 at 14:32, Vincent Guittot <[email protected]> wrote: > On 6 June 2017 at 14:25, Neil Williams <[email protected]> wrote: >> On 6 June 2017 at 13:11, Vincent Guittot <[email protected]> wrote: >>> On 6 June 2017 at 14:03, Neil Williams <[email protected]> wrote: >>>> On 6 June 2017 at 12:53, Vincent Guittot <[email protected]> >>>> wrote: >>>>> On 6 June 2017 at 13:38, Neil Williams <[email protected]> wrote: >>>>>> This problem has been resolved inside the arm-probe configuration, it >>>>>> is not a fault within LAVA. There was a concern that the probe was not >>>>>> showing data output because of a theoretical problem of running >>>>>> daemonized instead of with a controlling terminal. The actual problem >>>>>> was that the probe software is running more slowly than expected and >>>>>> extending the runtime of the utility allows the probe to output data. >>>>>> https://staging.validation.linaro.org/scheduler/job/175033#L2038 >>>>>> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb >>>>> >>>>> ok so the 2seconds for timeout was your problem >>>> >>>> That and the problem with the config file. >>> >>> ok >>> >>>> >>>>>> (The verbose option was later dropped to output only the interesting >>>>>> data.) >>>>>> >>>>>> The configuration file in the git repo needs to be modified. >>>>>> >>>>>> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd >>>>> >>>>> can you point out the modification you did that has been needed ? I >>>>> can't see any obvious difference except using /dev/ttyACM0 instead of >>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. >>>>> Is it the difference ? >>>> >>>> Yes, because inside the LXC, /dev/serial/by-id does not get created >>>> (there is no udev support for that inside containers). >>>> >>>>> What about using 2 AEPs ? >>>> >>>> That would have to be fixed either in the test shell definitions (e.g. >>>> using parameters passed through the test job) or within the arm-probe >>>> code itself. I have no idea at this stage whether the arm-probe >>>> software can cope with multiple probes - in LAVA that would likely >>> >>> arm-probe supports multi AEP and we are using with multi AEPs with the >>> mtk8173 evb. >>> arm-probe just rely of the config file to get the path of the AEP. I >>> have put the content of the config file below: >>> >>> # arm-probe configuration file >>> # >>> # setup name >>> mt8173-evb >>> >>> # <device path> >>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 >>> VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A57/Cache A57_CACHE #ff0000 SoC >>> VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A57/Core0 A57_CORE #ff0000 SoC >>> VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A57/Core1 A57_CORE #ff0000 SoC >>> >>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 >>> VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A53/Cache A53_CACHE #ff0000 SoC >>> VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A53/Core0 A53_CORE #ff0000 SoC >>> VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 >>> SoC/A53/Core1 A53_CORE #ff0000 SoC >> >> These configuration files may need to be generated within the test >> shell definition at runtime, based on parameters. The test shell will >> need to work out which device is which probe and this could be awkward >> without /dev/serial/by-id support. The enumeration order of ttyUSB0 >> and ttyUSB1 cannot be guaranteed. dmesg remains available inside the >> LXC, so some automated parsing may be required. If the arm-probe > > To be honest i don't like such way to proceed it is just error prone > >> software can be modified to use a more sane configuration file syntax, >> this could also be addressed there. > > I don't catch why the config file is insane and how this will help for > this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar. >>>> need secondary connections and MultiNode to separate the output. >>> >>> Is it something that Lisa can do by herself or does it need some >>> changes from your side ? >> >> Secondary connections and MultiNode can be adopted by test writers >> without any changes in LAVA. >> >> https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 >> https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#index-0 >> >> Any testjob using MultiNode has a certain level of complexity, so the >> change is non-trivial. > > Does it also mean that the datas of the 2 probes will not be in the > same file whereas arm-probe already merge datas from multi AEP in its > config file into one single output OK, then if that is what is desired then this can be done without using secondary connections and therefore without MultiNode. I was expecting that the two would run simultaneously, causing issues with interleaving. >> Note also that physically fitting more AEPs will involve work by the >> LAB team - especially for devices like the panda, because the power >> connector which comes with the AEP does not fit the panda and a >> one-off daughter board is required. > > This is something that has been already handled and in the case of the > mt8173evb everything is already done and working on our server with > current arm-probe, AEPs and workload automation > Regards, > Vincent >> >> >>> Regards, >>> Vincent >>> >>>> >>>> The syntax of the arm-probe configuration file does not make this easy >>>> but that section could be patched to use a more sane structure. That >>>> isn't related to the LAVA support though. >>>> >>>>>> >>>>>> >>>>>> On 29 May 2017 at 16:45, Vincent Guittot <[email protected]> >>>>>> wrote: >>>>>>> On 25 May 2017 at 10:03, Neil Williams <[email protected]> wrote: >>>>>>>> On Wed, 24 May 2017 21:07:45 +0200 >>>>>>>> Vincent Guittot <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Neil, >>>>>>>>> >>>>>>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a >>>>>>>>> écrit : >>>>>>>>> >>>>>>>>> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote: >>>>>>>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>>>>>>> > Neil Williams <[email protected]> wrote: >>>>>>>>> > >>>>>>>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>>>>>>> >> Steve McIntyre <[email protected]> wrote: >>>>>>>>> >> >>>>>>>>> >> > Hi folks! >>>>>>>>> >> > >>>>>>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>>>>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>>>>>>> >> > >Neil Williams <[email protected]> wrote: >>>>>>>>> >> > > >>>>>>>>> >> > >>>>>>>>> >> > I've just run a local test with an AEP inside lxc on my local >>>>>>>>> >> > machine. As far as I can see, there's nothing particularly magic >>>>>>>>> >> > going on here. The only problem I see is Lisa's config file >>>>>>>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>>>>>>>> >> > device to talk to. Using: >>>>>>>>> >> > >>>>>>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>>>>>>> >> > >>>>>>>>> >> > I create that device in my container. I build libwebsockets and >>>>>>>>> >> > the arm-probe software in the container, then >>>>>>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>>>>>>> >> > fine: >>>>>>>>> >> > >>>>>>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>>>>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>>>>>>> >> > # config_name: pandaboard >>>>>>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>>>>>>> >> > 400us Configuration: pandaboard >>>>>>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>>>>>>> >> > # host: lxc-aep-test-174524 >>>>>>>>> >> > # >>>>>>>>> >> > + /dev/ttyACM0 >>>>>>>>> >> > Starting... >>>>>>>>> >> > sending start to 0 >>>>>>>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>>>>>>> >> > # >>>>>>>>> >> > # >>>>>>>>> >> > time VDD(V) VDD(A) VDD(W) >>>>>>>>> >> > 0.000500 5.11 0.0474 0.24196 >>>>>>>>> >> > 0.000600 5.11 0.0364 0.18572 >>>>>>>>> >> > 0.000700 5.11 0.0314 0.16012 >>>>>>>>> >> > 0.000800 5.10 0.0544 0.27734 >>>>>>>>> >> > 0.000900 5.10 0.0234 0.11923 >>>>>>>>> >> > 0.001000 5.11 0.0304 0.15505 >>>>>>>>> >> > ... >>>>>>>>> >> > >>>>>>>>> >> > I don't have any problems running things and getting output here. >>>>>>>>> >> > >>>>>>>>> >> > I *have* seen two real bugs here while trying to get things >>>>>>>>> >> > running, though: >>>>>>>>> >> > >>>>>>>>> >> > 1. If the device specified in the config file doesn't exist, or >>>>>>>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>>>>>>> >> > of problem with it, you get *no* useful feedback to say there's a >>>>>>>>> >> > problem. Running things under strace will show the background >>>>>>>>> >> > libarmep process attempt to use the device specified, but >>>>>>>>> >> > there's no error handling. :-( >>>>>>>>> >> > >>>>>>>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>>>>>>> >> > exit when you've done capturing, but it just sits there forever >>>>>>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>>>>>>> >> > work around that for now. >>>>>>>>> >> > >>>>>>>>> >> > If I knew where to file those bugs, I would, but it's really not >>>>>>>>> >> > obvious. They're really easy to reproduce, I hope... >>>>>>>>> >> > >>>>>>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>>>>>>> >> > says that it creates devices based on their existing entries on >>>>>>>>> >> > the host. Double-check that the host (dispatcher) has an >>>>>>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>>>>>>> >> >>>>>>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>>>>>>>> >> been using for the tests of the new code to ensure >>>>>>>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>>>>>>> >> >>>>>>>>> >> That panda and AEP will shortly return to staging and then the >>>>>>>>> >> changes to LAVA and the required changes to the test definition >>>>>>>>> >> can be available for the 2017.6 release. >>>>>>>>> > >>>>>>>>> > OK. staging-panda03 is back and has been running tests. This is what >>>>>>>>> > we've learnt so far: >>>>>>>>> > >>>>>>>>> > 0: This does not appear to be an LXC issue. Running the commands >>>>>>>>> > manually on the worker with the same LXC on the same worker does >>>>>>>>> > return data from the probe. >>>>>>>>> > >>>>>>>>> > 1: Running the same commands in "headless" mode shows that the probe >>>>>>>>> > software starts successfully but something within the protocol >>>>>>>>> > parser or sampler fails to retrieve data. >>>>>>>>> >>>>>>>>> >>>>>>>>> What do you mean by headless mode? >>>>>>>> >>>>>>>> With no controlling terminal. >>>>>>>> >>>>>>>> LAVA runs as a daemon and forks processes to run the tests. This does >>>>>>>> not usually cause issues and is fundamental to automation. When I run >>>>>>>> the same commands in an LXC as a user logged into the machine, I get >>>>>>>> output. When I run the commands from a daemon, the output is not seen. >>>>>>> >>>>>>> even when you redirect the output to a file ? >>>>>>> >>>>>>> On workload automation, arm_probe is called in a dedicated process >>>>>>> with subprocess.Popen and we are able to get data in the file. >>>>>>> Just wonder what could be the difference in lava case >>>>>>> >>>>>>>> >>>>>>>>> > >>>>>>>>> > 2: The websockets dependency is completely unnecessary and has been >>>>>>>>> > disabled in the build I've been testing: >>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Yes. I do the same. aepd is only useful for the web interface. >>>>>>>>> >>>>>>>>> >>>>>>>>> > >>>>>>>>> > 3: We've added a *lot* of debug to the arm-probe code >>>>>>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>>>>>>> > was run using >>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>>>>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>>>>>>> > but are not much closer to identifying the precise problem with the >>>>>>>>> > code. However, I am satisfied that this is a problem in the >>>>>>>>> > arm-probe software when being run in automation. >>>>>>>>> >>>>>>>>> >>>>>>>>> Can you give details about "this is a problem in arm probe software >>>>>>>>> when being run in automation"? Do you mean workload automation? >>>>>>>> >>>>>>>> No. Not workload automation - that is a specific test framework which >>>>>>>> can use LAVA. I'm talking about the process of running tests on behalf >>>>>>>> of users without the users being logged in or interacting with the >>>>>>>> shell. >>>>>>> >>>>>>> ok. Just to be sure about the context >>>>>>> >>>>>>>> >>>>>>>>> > >>>>>>>>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>>>>>>>> > also seems unnecessarily complex. >>>>>>>>> > >>>>>>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>>>>>>> > repository (which has also had a few fixes to compile with gcc6) but >>>>>>>>> > I'm running out of time to work on the arm-probe software myself. >>>>>>>>> > >>>>>>>>> > Someone needs to update the arm-probe software: >>>>>>>>> > >>>>>>>>> > a) to remove websockets as a compile-time option as this only bloats >>>>>>>>> > the build in automation where a web based UI is impossible anyway. >>>>>>>>> > I've done this by brute force in my cloned repo, I just patched out >>>>>>>>> > the dependency. >>>>>>>>> > >>>>>>>>> > b) improve the code to have comments and output about what is >>>>>>>>> > happening and why when verbose mode is used. >>>>>>>>> > >>>>>>>>> > c) Identify what is preventing the software from receiving data from >>>>>>>>> > the probe when run in automation. >>>>>>>>> > >>>>>>>>> > d) the config file still needs fixes to allow for changes in the >>>>>>>>> > device node name from one probe to another. >>>>>>>>> > >>>>>>>>> > -- >>>>>>>>> >>>>>>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>>>>>>> respond (if he has anything to say) while I'm on holiday until early >>>>>>>>> June. >>>>>>>> >>>>>>>> Steve & I are also on annual leave next week. >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> Neil Williams >>>>>>>> ============= >>>>>>>> http://www.linux.codehelp.co.uk/ >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Neil Williams >>>>>> ============= >>>>>> [email protected] >>>>>> http://www.linux.codehelp.co.uk/ >>>> >>>> >>>> >>>> -- >>>> >>>> Neil Williams >>>> ============= >>>> [email protected] >>>> http://www.linux.codehelp.co.uk/ >> >> >> >> -- >> >> Neil Williams >> ============= >> [email protected] >> http://www.linux.codehelp.co.uk/ -- Neil Williams ============= [email protected] http://www.linux.codehelp.co.uk/ _______________________________________________ linaro-validation mailing list [email protected] https://lists.linaro.org/mailman/listinfo/linaro-validation
