Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-26 Thread Ismael Luceno
On Tue, 11 Nov 2014 19:05:37 +0100
Hans Verkuil hverk...@xs4all.nl wrote:
 On 11/11/2014 06:46 PM, Andrey Utkin wrote:
  At Bluecherry, we have issues with servers which have 3 solo6110
  cards (and cards have up to 16 analog video cameras connected to
  them, and being actively read).
  This is a kernel which I tested with such a server last time. It is
  based on linux-next of October, 31, with few patches of mine (all
  are in review for upstream).
  https://github.com/krieger-od/linux/ . The HEAD commit is
  949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment.
  
  The problem is the following: after ~1 hour of uptime with working
  application reading the streams, one card (the same one every time)
  stops producing interrupts (counter in /proc/interrupts freezes),
  and all threads reading from that card hang forever in
  ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API
  to read the corresponding /dev/videoX devices of H264 encoders.
  Application restart doesn't help, just interrupt counter increases
  by 64. To help that, we need reboot or programmatic PCI device
  reset by echo 1  /sys/bus/pci/devices/\:03\:05.0/reset,
  which requires unloading app and driver and is not a solution
  obviously.
  
  We had this issue for a long time, even before we used libavformat
  for reading from such sources.
  A few days ago, we had standalone ffmpeg processes working stable
  for several days. The kernel was 3.17, the only probably-relevant
  change in code over the above mentioned revision is an additional
  bool variable set in solo_enc_v4l2_isr() and checked in
  solo_ring_thread() to figure out whether to do or skip
  solo_handle_ring(). The variable was guarded with
  spin_lock_irqsave(). I am not sure if it makes any difference, will
  try it again eventually.
  
  Any thoughts, can it be a bug in driver code causing that (please
  point which areas of code to review/fix)? Or is that desperate
  hardware issue? How to figure out for sure whether it is the former
  or the latter?
 
 I would first try to exclude hardware issues: since you say it is
 always the same card, try either replacing it or swapping it with
 another solo card and see if the problem follows the card or not. If
 it does, then it is likely a hardware problem. If it doesn't, then it
 suggests a race condition in the interrupt handling somewhere.
 
 Regards,
 
   Hans

CC'ing Curtis, hope you don't mind.

It's just coincidence. This has been a long standing issue, and only
depends on having enough cards.

One of the problems I had to weed out this one was that I didn't
have the right hardware (only one 16-port card), and my guess is that
Andrey is in the same position.


pgpsd0GlIkpe9.pgp
Description: OpenPGP digital signature


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-26 Thread Ismael Luceno
On Sat, 15 Nov 2014 21:42:05 +0100
khal...@piap.pl (Krzysztof Hałasa) wrote:
 Andrey Utkin andrey.ut...@corp.bluecherry.net writes:
 
  In upstream there's no more module parameter for video standard
  (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it
  turns out not to work correctly: the frame is offset, so that in
  the bottom there's black horizontal bar.
  The S_STD ioctl call actually makes difference, because without that
  the frame slides vertically all the time. But after the call the
  picture is not correct.
 
 Which kernel version are you using?
 I remember there were some problems with earlier versions, where the
 NTSC vs PAL wasn't consistenly a bool but rather a raw register value
 (or something like this), but it was fixed last time I checked.
 I'm personally using SOLO6110-based cards with v3.17 and PAL and it
 works, with minimal unrelated patches.
 

The selection works correctly for me, tested recently after a server
upgrade.


pgpbQZz234aBK.pgp
Description: OpenPGP digital signature


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-15 Thread Andrey Utkin
Thanks to all for the great help so far, but I've got another issue
with upstream driver.

In upstream there's no more module parameter for video standard
(NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns
out not to work correctly: the frame is offset, so that in the bottom
there's black horizontal bar.
The S_STD ioctl call actually makes difference, because without that
the frame slides vertically all the time. But after the call the
picture is not correct.

Such change didn't help:
https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4
So, temporarily, I've hardcoded this for exact customer who uses PAL:
https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a
rmmod/modprobe cycle works fine and doesn't make any difference from
reboot, but still it works correctly only with PAL hardcoded for the
first-time initialization.

Any ideas why wouldn't it work to change the mode after the driver load?
Would it be allowed to add back that kernel module parameter (the one
passed at module load time)?
-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-15 Thread Hans Verkuil
Hi Andrey,

On 11/15/2014 02:48 PM, Andrey Utkin wrote:
 Thanks to all for the great help so far, but I've got another issue
 with upstream driver.
 
 In upstream there's no more module parameter for video standard
 (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns
 out not to work correctly: the frame is offset, so that in the bottom
 there's black horizontal bar.
 The S_STD ioctl call actually makes difference, because without that
 the frame slides vertically all the time. But after the call the
 picture is not correct.

That's strange. I know I tested it at the time. I assume it is the PAL
standard that isn't working (as opposed to NTSC)? Or does it just always
fail when you switch between the two standards?

 
 Such change didn't help:
 https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4
 So, temporarily, I've hardcoded this for exact customer who uses PAL:
 https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a
 rmmod/modprobe cycle works fine and doesn't make any difference from
 reboot, but still it works correctly only with PAL hardcoded for the
 first-time initialization.
 
 Any ideas why wouldn't it work to change the mode after the driver load?

Not really. I will have to test this next week (either Monday or Friday) with
my solo board.

 Would it be allowed to add back that kernel module parameter (the one
 passed at module load time)?

No. That's a hack, the S_STD call should just work and we need to figure out
why it fails.

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-15 Thread Andrey Utkin
On Sat, Nov 15, 2014 at 6:08 PM, Hans Verkuil hverk...@xs4all.nl wrote:
 Hi Andrey,

 On 11/15/2014 02:48 PM, Andrey Utkin wrote:
 Thanks to all for the great help so far, but I've got another issue
 with upstream driver.

 In upstream there's no more module parameter for video standard
 (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns
 out not to work correctly: the frame is offset, so that in the bottom
 there's black horizontal bar.
 The S_STD ioctl call actually makes difference, because without that
 the frame slides vertically all the time. But after the call the
 picture is not correct.

 That's strange. I know I tested it at the time. I assume it is the PAL
 standard that isn't working (as opposed to NTSC)? Or does it just always
 fail when you switch between the two standards?

Switching to PAL is not working (NTSC is default).
Not sure if it fails to make _any_ switching, or whether it fails to
switch between (hardcoded or switched) PAL to NTSC. I can test it a
bit later.


 Such change didn't help:
 https://github.com/krieger-od/linux/commit/55b796c010b622430cb85f5b8d7d14fef6f04fb4
 So, temporarily, I've hardcoded this for exact customer who uses PAL:
 https://github.com/krieger-od/linux/commit/2c26302dfa6d7aa74cf17a89793daecbb89ae93a
 rmmod/modprobe cycle works fine and doesn't make any difference from
 reboot, but still it works correctly only with PAL hardcoded for the
 first-time initialization.

 Any ideas why wouldn't it work to change the mode after the driver load?

 Not really. I will have to test this next week (either Monday or Friday) with
 my solo board.

Thanks in advance.

 Would it be allowed to add back that kernel module parameter (the one
 passed at module load time)?

 No. That's a hack, the S_STD call should just work and we need to figure out
 why it fails.

Ok.

-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-15 Thread Krzysztof Hałasa
Andrey Utkin andrey.ut...@corp.bluecherry.net writes:

 In upstream there's no more module parameter for video standard
 (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns
 out not to work correctly: the frame is offset, so that in the bottom
 there's black horizontal bar.
 The S_STD ioctl call actually makes difference, because without that
 the frame slides vertically all the time. But after the call the
 picture is not correct.

Which kernel version are you using?
I remember there were some problems with earlier versions, where the
NTSC vs PAL wasn't consistenly a bool but rather a raw register value
(or something like this), but it was fixed last time I checked.
I'm personally using SOLO6110-based cards with v3.17 and PAL and it
works, with minimal unrelated patches.

 Any ideas why wouldn't it work to change the mode after the driver load?
 Would it be allowed to add back that kernel module parameter (the one
 passed at module load time)?

I don't think this alone would help.

Looking at my patch queue (will try to remember to have them posted)...
Well, it could be the SDRAM size detection routine. I'm using cards with
64 MB of RAM and the routine repeatedly detected 128 MB or so (max
supported). I have a temporary fix for this but it needs a bit more
work, I have seen a case when it failed (I'm using ARM and MIPS
platforms and they may differ from x86 in endianness, cache coherency
etc).

If you have a card with 64 MB RAM you may want to check if the driver
detects it correctly, and if not e.g. hardcode the size. Otherwise,
I have no idea what could be wrong, it works for me.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-15 Thread Andrey Utkin
On Sun, Nov 16, 2014 at 12:42 AM, Krzysztof Hałasa khal...@piap.pl wrote:
 Andrey Utkin andrey.ut...@corp.bluecherry.net writes:

 In upstream there's no more module parameter for video standard
 (NTSC/PAL). But there's VIDIOC_S_STD handling procedure. But it turns
 out not to work correctly: the frame is offset, so that in the bottom
 there's black horizontal bar.
 The S_STD ioctl call actually makes difference, because without that
 the frame slides vertically all the time. But after the call the
 picture is not correct.

 Which kernel version are you using?

linux-next from Oct 31 with few my patches which are not in linux-next yet.


-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-14 Thread Krzysztof Hałasa
Andrey Utkin andrey.ut...@corp.bluecherry.net writes:

 The problem is the following: after ~1 hour of uptime with working
 application reading the streams, one card (the same one every time)
 stops producing interrupts (counter in /proc/interrupts freezes), and
 all threads reading from that card hang forever in
 ioctl(VIDIOC_DQBUF).

There is a race condition in the IRQ handler, at least in 3.17.
I don't know if it's related, will post a patch.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-14 Thread Andrey Utkin
2014-11-14 15:00 GMT+04:00 Krzysztof Hałasa khal...@piap.pl:
 There is a race condition in the IRQ handler, at least in 3.17.
 I don't know if it's related, will post a patch.

Thank you for your interest.
Looking forward for your patch. If you don't have time, please just
say what races with what, I'll check by myself.

-- 
Andrey Utkin
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-13 Thread Andrey Utkin
On Tue, Nov 11, 2014 at 10:16 PM, Andrey Utkin
andrey.ut...@corp.bluecherry.net wrote:
 On Tue, Nov 11, 2014 at 8:05 PM, Hans Verkuil hverk...@xs4all.nl wrote:
 I would first try to exclude hardware issues: since you say it is always
 the same card, try either replacing it or swapping it with another solo
 card and see if the problem follows the card or not. If it does, then it
 is likely a hardware problem. If it doesn't, then it suggests a race
 condition in the interrupt handling somewhere.

 Thanks for reply, Hans.
 Surely valid idea. I will ask for this, but it is out of my physical reach.
 If you have any suspects about driver code, please let me know.

(We haven't tested the replacement yet.)
To the big surprise, it turned out that FPS=2 on the channels works
unstable, but FPS=30 works stable.

-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-11 Thread Andrey Utkin
At Bluecherry, we have issues with servers which have 3 solo6110 cards
(and cards have up to 16 analog video cameras connected to them, and
being actively read).
This is a kernel which I tested with such a server last time. It is
based on linux-next of October, 31, with few patches of mine (all are
in review for upstream).
https://github.com/krieger-od/linux/ . The HEAD commit is
949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment.

The problem is the following: after ~1 hour of uptime with working
application reading the streams, one card (the same one every time)
stops producing interrupts (counter in /proc/interrupts freezes), and
all threads reading from that card hang forever in
ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to
read the corresponding /dev/videoX devices of H264 encoders.
Application restart doesn't help, just interrupt counter increases by
64. To help that, we need reboot or programmatic PCI device reset by
echo 1  /sys/bus/pci/devices/\:03\:05.0/reset, which requires
unloading app and driver and is not a solution obviously.

We had this issue for a long time, even before we used libavformat for
reading from such sources.
A few days ago, we had standalone ffmpeg processes working stable for
several days. The kernel was 3.17, the only probably-relevant change
in code over the above mentioned revision is an additional bool
variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread()
to figure out whether to do or skip solo_handle_ring(). The variable
was guarded with spin_lock_irqsave(). I am not sure if it makes any
difference, will try it again eventually.

Any thoughts, can it be a bug in driver code causing that (please
point which areas of code to review/fix)? Or is that desperate
hardware issue? How to figure out for sure whether it is the former or
the latter?

-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-11 Thread Hans Verkuil
On 11/11/2014 06:46 PM, Andrey Utkin wrote:
 At Bluecherry, we have issues with servers which have 3 solo6110 cards
 (and cards have up to 16 analog video cameras connected to them, and
 being actively read).
 This is a kernel which I tested with such a server last time. It is
 based on linux-next of October, 31, with few patches of mine (all are
 in review for upstream).
 https://github.com/krieger-od/linux/ . The HEAD commit is
 949e18db86ebf45acab91d188b247abd40b6e2a1 at the moment.
 
 The problem is the following: after ~1 hour of uptime with working
 application reading the streams, one card (the same one every time)
 stops producing interrupts (counter in /proc/interrupts freezes), and
 all threads reading from that card hang forever in
 ioctl(VIDIOC_DQBUF). The application uses libavformat (ffmpeg) API to
 read the corresponding /dev/videoX devices of H264 encoders.
 Application restart doesn't help, just interrupt counter increases by
 64. To help that, we need reboot or programmatic PCI device reset by
 echo 1  /sys/bus/pci/devices/\:03\:05.0/reset, which requires
 unloading app and driver and is not a solution obviously.
 
 We had this issue for a long time, even before we used libavformat for
 reading from such sources.
 A few days ago, we had standalone ffmpeg processes working stable for
 several days. The kernel was 3.17, the only probably-relevant change
 in code over the above mentioned revision is an additional bool
 variable set in solo_enc_v4l2_isr() and checked in solo_ring_thread()
 to figure out whether to do or skip solo_handle_ring(). The variable
 was guarded with spin_lock_irqsave(). I am not sure if it makes any
 difference, will try it again eventually.
 
 Any thoughts, can it be a bug in driver code causing that (please
 point which areas of code to review/fix)? Or is that desperate
 hardware issue? How to figure out for sure whether it is the former or
 the latter?

I would first try to exclude hardware issues: since you say it is always
the same card, try either replacing it or swapping it with another solo
card and see if the problem follows the card or not. If it does, then it
is likely a hardware problem. If it doesn't, then it suggests a race
condition in the interrupt handling somewhere.

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] solo6x10 freeze, even with Oct 31's linux-next... any ideas or help?

2014-11-11 Thread Andrey Utkin
On Tue, Nov 11, 2014 at 8:05 PM, Hans Verkuil hverk...@xs4all.nl wrote:
 I would first try to exclude hardware issues: since you say it is always
 the same card, try either replacing it or swapping it with another solo
 card and see if the problem follows the card or not. If it does, then it
 is likely a hardware problem. If it doesn't, then it suggests a race
 condition in the interrupt handling somewhere.

Thanks for reply, Hans.
Surely valid idea. I will ask for this, but it is out of my physical reach.
If you have any suspects about driver code, please let me know.

-- 
Bluecherry developer.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html