Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Tue, 11 Nov 2014 19:05:37 +0100 Hans Verkuil hverk...@xs4all.nl wrote: On 11/11/2014 06:46 PM, Andrey Utkin wrote: At Bluecherry, we have issues with servers which have 3 solo6110 cards (and cards have up to 16 analog video cameras connected to them, and being actively read). This is a kernel which I tested with such a server last time. It is based on linux-next of October, 31, with few patches of mine (all are in review for upstream). https://github.com/krieger-od/linux/ . The HEAD commit is 949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment. The problem is the following: after ~1 hour of uptime with working application reading the streams, one card (the same one every time) stops producing interrupts (counter in /proc/interrupts freezes), and all threads reading from that card hang forever in ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to read the corresponding /dev/videoX devices of H264 encoders. Application restart doesn't help, just interrupt counter increases by 64. To help that, we need reboot or programmatic PCI device reset by echo 1 /sys/bus/pci/devices/\:03\:05.0/reset, which requires unloading app and driver and is not a solution obviously. We had this issue for a long time, even before we used libavformat for reading from such sources. A few days ago, we had standalone ffmpeg processes working stable for several days. The kernel was 3.17, the only probably-relevant change in code over the above mentioned revision is an additional bool variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread() to figure out whether to do or skip solo_handle_ring(). The variable was guarded with spin_lock_irqsave(). I am not sure if it makes any difference, will try it again eventually. Any thoughts, can it be a bug in driver code causing that (please point which areas of code to review/fix)? Or is that desperate hardware issue? How to figure out for sure whether it is the former or the latter? I would first try to exclude hardware issues: since you say it is always the same card, try either replacing it or swapping it with another solo card and see if the problem follows the card or not. If it does, then it is likely a hardware problem. If it doesn't, then it suggests a race condition in the interrupt handling somewhere. Regards, Hans CC'ing Curtis, hope you don't mind. It's just coincidence. This has been a long standing issue, and only depends on having enough cards. One of the problems I had to weed out this one was that I didn't have the right hardware (only one 16-port card), and my guess is that Andrey is in the same position. pgpsd0GlIkpe9.pgp Description: OpenPGP digital signature
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Sat, 15 Nov 2014 21:42:05 +0100 khal...@piap.pl (Krzysztof Hałasa) wrote: Andrey Utkin andrey.ut...@corp.bluecherry.net writes: In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. Which kernel version are you using? I remember there were some problems with earlier versions, where the NTSC vs PAL wasn't consistenly a bool but rather a raw register value (or something like this), but it was fixed last time I checked. I'm personally using SOLO6110-based cards with v3.17 and PAL and it works, with minimal unrelated patches. The selection works correctly for me, tested recently after a server upgrade. pgpbQZz234aBK.pgp Description: OpenPGP digital signature
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
Thanks to all for the great help so far, but I've got another issue with upstream driver. In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. Such change didn't help: https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4 So, temporarily, I've hardcoded this for exact customer who uses PAL: https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a rmmod/modprobe cycle works fine and doesn't make any difference from reboot, but still it works correctly only with PAL hardcoded for the first-time initialization. Any ideas why wouldn't it work to change the mode after the driver load? Would it be allowed to add back that kernel module parameter (the one passed at module load time)? -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
Hi Andrey, On 11/15/2014 02:48 PM, Andrey Utkin wrote: Thanks to all for the great help so far, but I've got another issue with upstream driver. In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. That's strange. I know I tested it at the time. I assume it is the PAL standard that isn't working (as opposed to NTSC)? Or does it just always fail when you switch between the two standards? Such change didn't help: https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4 So, temporarily, I've hardcoded this for exact customer who uses PAL: https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a rmmod/modprobe cycle works fine and doesn't make any difference from reboot, but still it works correctly only with PAL hardcoded for the first-time initialization. Any ideas why wouldn't it work to change the mode after the driver load? Not really. I will have to test this next week (either Monday or Friday) with my solo board. Would it be allowed to add back that kernel module parameter (the one passed at module load time)? No. That's a hack, the S_STD call should just work and we need to figure out why it fails. Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Sat, Nov 15, 2014 at 6:08 PM, Hans Verkuil hverk...@xs4all.nl wrote: Hi Andrey, On 11/15/2014 02:48 PM, Andrey Utkin wrote: Thanks to all for the great help so far, but I've got another issue with upstream driver. In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. That's strange. I know I tested it at the time. I assume it is the PAL standard that isn't working (as opposed to NTSC)? Or does it just always fail when you switch between the two standards? Switching to PAL is not working (NTSC is default). Not sure if it fails to make _any_ switching, or whether it fails to switch between (hardcoded or switched) PAL to NTSC. I can test it a bit later. Such change didn't help: https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4 So, temporarily, I've hardcoded this for exact customer who uses PAL: https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a rmmod/modprobe cycle works fine and doesn't make any difference from reboot, but still it works correctly only with PAL hardcoded for the first-time initialization. Any ideas why wouldn't it work to change the mode after the driver load? Not really. I will have to test this next week (either Monday or Friday) with my solo board. Thanks in advance. Would it be allowed to add back that kernel module parameter (the one passed at module load time)? No. That's a hack, the S_STD call should just work and we need to figure out why it fails. Ok. -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
Andrey Utkin andrey.ut...@corp.bluecherry.net writes: In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. Which kernel version are you using? I remember there were some problems with earlier versions, where the NTSC vs PAL wasn't consistenly a bool but rather a raw register value (or something like this), but it was fixed last time I checked. I'm personally using SOLO6110-based cards with v3.17 and PAL and it works, with minimal unrelated patches. Any ideas why wouldn't it work to change the mode after the driver load? Would it be allowed to add back that kernel module parameter (the one passed at module load time)? I don't think this alone would help. Looking at my patch queue (will try to remember to have them posted)... Well, it could be the SDRAM size detection routine. I'm using cards with 64 MB of RAM and the routine repeatedly detected 128 MB or so (max supported). I have a temporary fix for this but it needs a bit more work, I have seen a case when it failed (I'm using ARM and MIPS platforms and they may differ from x86 in endianness, cache coherency etc). If you have a card with 64 MB RAM you may want to check if the driver detects it correctly, and if not e.g. hardcode the size. Otherwise, I have no idea what could be wrong, it works for me. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Sun, Nov 16, 2014 at 12:42 AM, Krzysztof Hałasa khal...@piap.pl wrote: Andrey Utkin andrey.ut...@corp.bluecherry.net writes: In upstream there's no more module parameter for video standard (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns out not to work correctly: the frame is offset, so that in the bottom there's black horizontal bar. The S_STD ioctl call actually makes difference, because without that the frame slides vertically all the time. But after the call the picture is not correct. Which kernel version are you using? linux-next from Oct 31 with few my patches which are not in linux-next yet. -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
Andrey Utkin andrey.ut...@corp.bluecherry.net writes: The problem is the following: after ~1 hour of uptime with working application reading the streams, one card (the same one every time) stops producing interrupts (counter in /proc/interrupts freezes), and all threads reading from that card hang forever in ioctl(VIDIOC_DQBUF). There is a race condition in the IRQ handler, at least in 3.17. I don't know if it's related, will post a patch. -- Krzysztof Halasa Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
2014-11-14 15:00 GMT+04:00 Krzysztof Hałasa khal...@piap.pl: There is a race condition in the IRQ handler, at least in 3.17. I don't know if it's related, will post a patch. Thank you for your interest. Looking forward for your patch. If you don't have time, please just say what races with what, I'll check by myself. -- Andrey Utkin -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Tue, Nov 11, 2014 at 10:16 PM, Andrey Utkin andrey.ut...@corp.bluecherry.net wrote: On Tue, Nov 11, 2014 at 8:05 PM, Hans Verkuil hverk...@xs4all.nl wrote: I would first try to exclude hardware issues: since you say it is always the same card, try either replacing it or swapping it with another solo card and see if the problem follows the card or not. If it does, then it is likely a hardware problem. If it doesn't, then it suggests a race condition in the interrupt handling somewhere. Thanks for reply, Hans. Surely valid idea. I will ask for this, but it is out of my physical reach. If you have any suspects about driver code, please let me know. (We haven't tested the replacement yet.) To the big surprise, it turned out that FPS=2 on the channels works unstable, but FPS=30 works stable. -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
At Bluecherry, we have issues with servers which have 3 solo6110 cards (and cards have up to 16 analog video cameras connected to them, and being actively read). This is a kernel which I tested with such a server last time. It is based on linux-next of October, 31, with few patches of mine (all are in review for upstream). https://github.com/krieger-od/linux/ . The HEAD commit is 949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment. The problem is the following: after ~1 hour of uptime with working application reading the streams, one card (the same one every time) stops producing interrupts (counter in /proc/interrupts freezes), and all threads reading from that card hang forever in ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to read the corresponding /dev/videoX devices of H264 encoders. Application restart doesn't help, just interrupt counter increases by 64. To help that, we need reboot or programmatic PCI device reset by echo 1 /sys/bus/pci/devices/\:03\:05.0/reset, which requires unloading app and driver and is not a solution obviously. We had this issue for a long time, even before we used libavformat for reading from such sources. A few days ago, we had standalone ffmpeg processes working stable for several days. The kernel was 3.17, the only probably-relevant change in code over the above mentioned revision is an additional bool variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread() to figure out whether to do or skip solo_handle_ring(). The variable was guarded with spin_lock_irqsave(). I am not sure if it makes any difference, will try it again eventually. Any thoughts, can it be a bug in driver code causing that (please point which areas of code to review/fix)? Or is that desperate hardware issue? How to figure out for sure whether it is the former or the latter? -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On 11/11/2014 06:46 PM, Andrey Utkin wrote: At Bluecherry, we have issues with servers which have 3 solo6110 cards (and cards have up to 16 analog video cameras connected to them, and being actively read). This is a kernel which I tested with such a server last time. It is based on linux-next of October, 31, with few patches of mine (all are in review for upstream). https://github.com/krieger-od/linux/ . The HEAD commit is 949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment. The problem is the following: after ~1 hour of uptime with working application reading the streams, one card (the same one every time) stops producing interrupts (counter in /proc/interrupts freezes), and all threads reading from that card hang forever in ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to read the corresponding /dev/videoX devices of H264 encoders. Application restart doesn't help, just interrupt counter increases by 64. To help that, we need reboot or programmatic PCI device reset by echo 1 /sys/bus/pci/devices/\:03\:05.0/reset, which requires unloading app and driver and is not a solution obviously. We had this issue for a long time, even before we used libavformat for reading from such sources. A few days ago, we had standalone ffmpeg processes working stable for several days. The kernel was 3.17, the only probably-relevant change in code over the above mentioned revision is an additional bool variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread() to figure out whether to do or skip solo_handle_ring(). The variable was guarded with spin_lock_irqsave(). I am not sure if it makes any difference, will try it again eventually. Any thoughts, can it be a bug in driver code causing that (please point which areas of code to review/fix)? Or is that desperate hardware issue? How to figure out for sure whether it is the former or the latter? I would first try to exclude hardware issues: since you say it is always the same card, try either replacing it or swapping it with another solo card and see if the problem follows the card or not. If it does, then it is likely a hardware problem. If it doesn't, then it suggests a race condition in the interrupt handling somewhere. Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?
On Tue, Nov 11, 2014 at 8:05 PM, Hans Verkuil hverk...@xs4all.nl wrote: I would first try to exclude hardware issues: since you say it is always the same card, try either replacing it or swapping it with another solo card and see if the problem follows the card or not. If it does, then it is likely a hardware problem. If it doesn't, then it suggests a race condition in the interrupt handling somewhere. Thanks for reply, Hans. Surely valid idea. I will ask for this, but it is out of my physical reach. If you have any suspects about driver code, please let me know. -- Bluecherry developer. -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html