subject:"Re\: New DVB\(\!\!\)\-Statistics API"

Re: New DVB-Statistics API - please vote

2010-02-03 Thread Manu Abraham

On Wed, Feb 3, 2010 at 2:40 PM, Hans Verkuil  wrote:
>
>> Mauro Carvalho Chehab wrote:
>>
 after the last thread which asked about signal statistics details
 degenerated into a discussion about the technical possibilites for
 implementing an entirely new API, which lead to nothing so far, I
 wanted
 to open a new thread to bring this forward. Maybe some more people can
 give their votes for the different options
>>
>> Only me and Manu manifested with opinions on this thread. Not sure why
>> nobody else gave their comments. Maybe all interested people just decided
>> to take a long vacation and are not listening to their emails ;)
>>
>> Seriously, from what I understand, this is an API improvement and we need
>> to take a decision on it. So, your opinions are important.
>>
>> ---
>>
>> Let me draw a summary of this subject, trying to be impartial.
>>
>> The original proposal were made by Manu. My proposal is derived from
>> Manu's
>> original one, both will equally provide the same features.
>>
>> The main difference is that Manu's proposal use a struct to get the
>> statistics while my proposal uses DVBS2API.
>>
>> With both API proposals, all values are get at the same time by the
>> driver.
>> the DVBS2API adds a small delay to fill the fields, but the extra delay is
>> insignificant, when compared with the I/O delays needed to retrieve the
>> values from the hardware.
>>
>> Due to the usage of DVBS2API, it is possible to retrieve a subset of
>> statistics.
>> When obtaining a subset, the DVBS2API latency is considerable faster, as
>> less
>> data needed to be transfered from the hardware.
>>
>> The DVBS2API also offers the possibility of expanding the statistics
>> group, since
>> it is not rigid as an struct.
>>
>> One criteria that should be reminded is that, according with Linux Kernel
>> rules,
>> any userspace API is stable. This means that applications compiled against
>> an
>> older API version should keep running with the latest kernel. So, whatever
>> decided,
>> the statistics API should always maintain backward compatibility.
>>
>> ---
>>
>> During the end of the year, I did some work with an ISDB-T driver for
>> Siano, and
>> I realized that the usage of the proposed struct won't fit well for
>> ISDB-T. The
>> reason is that, on ISDB-T, the transmission uses up to 3 hierarchical
>> layers.
>> Each layer may have different OFDM parameters, so the devices (at least,
>> this is the
>> case for Siano) has a group of statistics per layer.
>>
>> I'm afraid that newer standards may also bring different ways to present
>> statistics, and
>> the current proposal won't fit well.
>>
>> So, in my opinion, if it is chosen any struct-based approach, we'll have a
>> bad time to
>> maintain it, as it won't fit into all cases, and we'll need to add some
>> tricks to extend
>> the struct.
>>
>> So, my vote is for the DVBS2API approach, where a new group of statistics
>> can easily be added,
>> on an elegant way, without needing of re-discussing the better API or to
>> find a way to extend
>> the size of an struct without breaking backward compatibility.
>
> From a purely technical standpoint the DVBS2API is by definition more
> flexible and future-proof. The V4L API has taken the same approach with
> controls (basically exactly the same mechanism). Should it be necessary in
> the future to set multiple properties atomically, then the same technique
> as V4L can be used (see VIDIOC_S_EXT_CTRLS).
>
> The alternative is to make structs with lots of reserved fields. It
> depends on how predictable the API is expected to be. Sometimes you can be
> reasonably certain that there won't be too many additions in the future
> and then using reserved fields is perfectly OK.
>
> Just my 5 cents based on my V4L experience.


In fact, I don't see the advantage using a specific get/set, since
what i proposed just reads back basic data types such as a u32 for any
instance and those being "standard " for any digital communication
system. It makes no sense to go for a get/set approach. For example
there cannot be multiple properties for any digital system such as
BER, just that there are different measure s such as kilograms,
pounds, grams etc. Which is what the whole patch is meant to do in a
performance critical manner.
The whole patch is a kind of get/set and and hence it doesn't need a
get/set at the micro level.

To contradict the reverse, as an example, I can state that weight is
not measured in centimeters or inches or feet, to put it in laymans
terms. if you say that it needs to be expressed as such,  then i very
well see that there is something conceptually wrong in your thought.

My 2 cents, for those who don't understand the issue.

Regards,
Manu
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API - please vote

2010-02-03 Thread Hans Verkuil


> Mauro Carvalho Chehab wrote:
>
>>> after the last thread which asked about signal statistics details
>>> degenerated into a discussion about the technical possibilites for
>>> implementing an entirely new API, which lead to nothing so far, I
>>> wanted
>>> to open a new thread to bring this forward. Maybe some more people can
>>> give their votes for the different options
>
> Only me and Manu manifested with opinions on this thread. Not sure why
> nobody else gave their comments. Maybe all interested people just decided
> to take a long vacation and are not listening to their emails ;)
>
> Seriously, from what I understand, this is an API improvement and we need
> to take a decision on it. So, your opinions are important.
>
> ---
>
> Let me draw a summary of this subject, trying to be impartial.
>
> The original proposal were made by Manu. My proposal is derived from
> Manu's
> original one, both will equally provide the same features.
>
> The main difference is that Manu's proposal use a struct to get the
> statistics while my proposal uses DVBS2API.
>
> With both API proposals, all values are get at the same time by the
> driver.
> the DVBS2API adds a small delay to fill the fields, but the extra delay is
> insignificant, when compared with the I/O delays needed to retrieve the
> values from the hardware.
>
> Due to the usage of DVBS2API, it is possible to retrieve a subset of
> statistics.
> When obtaining a subset, the DVBS2API latency is considerable faster, as
> less
> data needed to be transfered from the hardware.
>
> The DVBS2API also offers the possibility of expanding the statistics
> group, since
> it is not rigid as an struct.
>
> One criteria that should be reminded is that, according with Linux Kernel
> rules,
> any userspace API is stable. This means that applications compiled against
> an
> older API version should keep running with the latest kernel. So, whatever
> decided,
> the statistics API should always maintain backward compatibility.
>
> ---
>
> During the end of the year, I did some work with an ISDB-T driver for
> Siano, and
> I realized that the usage of the proposed struct won't fit well for
> ISDB-T. The
> reason is that, on ISDB-T, the transmission uses up to 3 hierarchical
> layers.
> Each layer may have different OFDM parameters, so the devices (at least,
> this is the
> case for Siano) has a group of statistics per layer.
>
> I'm afraid that newer standards may also bring different ways to present
> statistics, and
> the current proposal won't fit well.
>
> So, in my opinion, if it is chosen any struct-based approach, we'll have a
> bad time to
> maintain it, as it won't fit into all cases, and we'll need to add some
> tricks to extend
> the struct.
>
> So, my vote is for the DVBS2API approach, where a new group of statistics
> can easily be added,
> on an elegant way, without needing of re-discussing the better API or to
> find a way to extend
> the size of an struct without breaking backward compatibility.

>From a purely technical standpoint the DVBS2API is by definition more
flexible and future-proof. The V4L API has taken the same approach with
controls (basically exactly the same mechanism). Should it be necessary in
the future to set multiple properties atomically, then the same technique
as V4L can be used (see VIDIOC_S_EXT_CTRLS).

The alternative is to make structs with lots of reserved fields. It
depends on how predictable the API is expected to be. Sometimes you can be
reasonably certain that there won't be too many additions in the future
and then using reserved fields is perfectly OK.

Just my 5 cents based on my V4L experience.

Regards,

  Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API - please vote

2010-02-03 Thread Mauro Carvalho Chehab

Mauro Carvalho Chehab wrote:

>> after the last thread which asked about signal statistics details
>> degenerated into a discussion about the technical possibilites for
>> implementing an entirely new API, which lead to nothing so far, I wanted
>> to open a new thread to bring this forward. Maybe some more people can
>> give their votes for the different options

Only me and Manu manifested with opinions on this thread. Not sure why
nobody else gave their comments. Maybe all interested people just decided 
to take a long vacation and are not listening to their emails ;)

Seriously, from what I understand, this is an API improvement and we need
to take a decision on it. So, your opinions are important.

---

Let me draw a summary of this subject, trying to be impartial.

The original proposal were made by Manu. My proposal is derived from Manu's
original one, both will equally provide the same features. 

The main difference is that Manu's proposal use a struct to get the 
statistics while my proposal uses DVBS2API.

With both API proposals, all values are get at the same time by the driver.
the DVBS2API adds a small delay to fill the fields, but the extra delay is
insignificant, when compared with the I/O delays needed to retrieve the 
values from the hardware.

Due to the usage of DVBS2API, it is possible to retrieve a subset of statistics.
When obtaining a subset, the DVBS2API latency is considerable faster, as less
data needed to be transfered from the hardware.

The DVBS2API also offers the possibility of expanding the statistics group, 
since
it is not rigid as an struct.

One criteria that should be reminded is that, according with Linux Kernel rules,
any userspace API is stable. This means that applications compiled against an
older API version should keep running with the latest kernel. So, whatever 
decided,
the statistics API should always maintain backward compatibility.

---

During the end of the year, I did some work with an ISDB-T driver for Siano, and
I realized that the usage of the proposed struct won't fit well for ISDB-T. The
reason is that, on ISDB-T, the transmission uses up to 3 hierarchical layers.
Each layer may have different OFDM parameters, so the devices (at least, this 
is the 
case for Siano) has a group of statistics per layer.

I'm afraid that newer standards may also bring different ways to present 
statistics, and
the current proposal won't fit well.

So, in my opinion, if it is chosen any struct-based approach, we'll have a bad 
time to
maintain it, as it won't fit into all cases, and we'll need to add some tricks 
to extend
the struct.

So, my vote is for the DVBS2API approach, where a new group of statistics can 
easily be added,
on an elegant way, without needing of re-discussing the better API or to find a 
way to extend
the size of an struct without breaking backward compatibility.

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API

2009-12-09 Thread Mauro Carvalho Chehab

Julian Scheel wrote:
> Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab:
>> Manu Abraham wrote:

> I don't think you can just take the average IPC rates into account for
> this. When doing a syscall the processors TLB cache will be cleared,
> which will force the CPU to go to the whole instruction pipeline before
> the first syscall instruction is actually executed. This will introduce
> a delay for each syscall you make. I'm not exactly sure about the length
> of the delay, but I think it should be something like 2 clock cycles.

True, but this delay is common to both S2API and struct.

>> To get all data that the ioctl approach struct has, the delay for
>> S2API will be equal.
>> To get less data, S2API will have a small delay.
>>
> Imho the S2API would be slower when reading all data the ioctl fetches,
> due to the way the instructions would be handled.
> 
> Correct me, if I'm wrong with any of this.

Not sure if I understood your question.

On both cases, just one function call will go to the driver, with one struct
(struct fe, the case of S2API) or two structs (struct fe and the
stats-specific struct(s)) in the case of a new ioctl(s).

As drivers are free to implement any logic, the driver can implement exactly
the same logic with both API calls. So, the delay to get the info will be
equal on both cases.

In a practical case, this will take at least a few milisseconds to retrieve all
data. It may take even more, since, on some drivers, you may need to wait for
some condition to happen before start measuring, in order to be sure that you'll
be getting atomic and accurate values.

After the function return, with an struct, you can just return, while, with 
S2API,
you'll need to put the data into the proper payload fields, but this will add
a delay in the order of nanoseconds.

Let's say that, to get all data, the routine needs 10 milisseconds.

The difference between new ioctl or S2API will be of about 0,00063 milliseconds
on a Pentium MMX. On a machine with 1GHz of clock, 2 IPC, the difference will
be 0,315 milliseconds.

Considering that the Linux kernel is preemptive, and an interrupt or the 
scheduler
could be called during the 10 milliseconds time, I doubt you'll be able to
notice that difference on any practical use case.

On the other hand, if you need for example just the strength of the signal at 
the AGC,
if you call via struct, you'll still be consuming the same 10 ms, while, with 
S2API,
you can do it, let's say, on 1 ms (the real numbers will depend, of course, on 
how
much I/O is needed on hardware, and on how many time do you need to wait there 
to 
get an stable value).

So, if you want to do things like moving a rotor, S2API will give better 
results.
If you want all stats, it will give the same result as a new ioctl.

While I don't think that 0,315 milliseconds is worth enough to cause any 
troubles,
With a simple change like the one bellow, this time can be reduced to 0,105 
milisseconds
with a patch like that (the patch were simplified to change just quality, but, 
of course,
such approach needs to be done on the other fields to get this result):

Index: master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
===
--- master.orig/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
+++ master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c
@@ -1220,6 +1220,7 @@ static int dtv_property_prepare_get_stat
switch (tvp->cmd) {
case DTV_FE_QUALITY:
fe->dtv_property_cache.need_stats |= FE_NEED_QUALITY;
+   fe->dtv_property_cache.quality = &tvp->u.data;
break;
case DTV_FE_QUALITY_UNIT:
fe->dtv_property_cache.need_stats |= FE_NEED_QUALITY_UNIT;
@@ -1384,9 +1385,6 @@ static int dtv_property_process_get(stru
break;

/* Quality measures */
-   case DTV_FE_QUALITY:
-   tvp->u.data = fe->dtv_property_cache.quality;
-   break;
case DTV_FE_QUALITY_UNIT:
tvp->u.data = fe->dtv_property_cache.quality_unit;
break;
@@ -1696,10 +1697,12 @@ static int dvb_frontend_ioctl_properties
}
}

-   for (i = 0; i < tvps->num; i++) {
-   (tvp + i)->result = dtv_property_process_get(fe,
-   tvp + i, inode, file, need_get_ops);
-   err |= (tvp + i)->result;
+   if (need_get_ops) {
+   for (i = 0; i < tvps->num; i++) {
+   (tvp + i)->result = dtv_property_process_get(fe,
+   tvp + i, inode, file, 
need_get_ops);
+   err |= (tvp + i)->result;
+   }
}

if (copy_to_user(tvps->props, tvp, tvps->num * sizeof(struct 
dtv_property))) {
Index: master/linux/drivers/media/dvb/d

Re: New DVB-Statistics API

2009-12-09 Thread Julian Scheel


Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab:

Manu Abraham wrote:
   

On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
  wrote:
 
   

Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. 
So, the clock
speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
being
capable of handling only one instruction per second, you'll have one 
instruction at executed
at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
faster than this).
   

Incorrect.
A CPU doesn't execute instruction per clock cycle. Clock cycles
required to execute an instruction do vary from 2 cycles 12 cycles
varying from CPU to CPU.
 

See the description of an old Pentium MMX processor (the sucessor of i586, 
running up to 200 MHz):
http://www.intel.com/design/archives/processors/mmx/docs/243185.htm

Thanks to superscalar architecture, it runs 2 instructions per clock cycle 
(IPC).

Newer processors can run more instructions per clock cycle. For example, any 
Pentium-4 processor,
can do 3 IPC:
http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm
   


I don't think you can just take the average IPC rates into account for 
this. When doing a syscall the processors TLB cache will be cleared, 
which will force the CPU to go to the whole instruction pipeline before 
the first syscall instruction is actually executed. This will introduce 
a delay for each syscall you make. I'm not exactly sure about the length 
of the delay, but I think it should be something like 2 clock cycles.

So, even on such bad hardware that is at least 20x slower than a netbook 
running at 1Gbps,
what determines the delay is the amount of I/O you're doing, and not the number 
of extra
code.
   


The I/O overhead required to read 4 registers from hardware is the
same whether you use the ioctl approach or s2api.
 

It seems you got my point. What will determinate the delay is the number of 
I/O's, and not the
amount of instructions.
   
The number of hardware I/Os is constant for both cases, so we do not 
need to discuss them as pro/con for any of the proposals.

Eventually, as you have pointed out yourself, The data struct will be
in the cache all the time for the ioctl approach. The only new
addition to the existing API in the ioctl case is a CALL instruction
as compared to the numerous instructions in comparison to that you
have pointed out as with the s2api approach.
 

True, but, as shown, the additional delay introduced by the code is less than 
0.01%, even on
a processor that has half of the speed of a 12-year old very slow CPU (a 
Pentium MMX @ 100 MHz
is capable of 2 IPC. My calculus assumed 1 IPC).

So, what will affect the delay is the number of I/O you need to do.

To get all data that the ioctl approach struct has, the delay for S2API will be 
equal.
To get less data, S2API will have a small delay.
   
Imho the S2API would be slower when reading all data the ioctl fetches, 
due to the way the instructions would be handled.


Correct me, if I'm wrong with any of this.

Cheers,
Julian

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API

2009-12-09 Thread Mauro Carvalho Chehab

Manu Abraham wrote:
> On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
>  wrote:

>> Even with STB, let's assume a very slow cpu that runs at 100 
>> Megabytes/second. So, the clock
>> speed is 10 nanoseconds. Assuming that this CPU doesn't have a good 
>> pipeline, being
>> capable of handling only one instruction per second, you'll have one 
>> instruction at executed
>> at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
>> faster than this).
> 
> Incorrect.
> A CPU doesn't execute instruction per clock cycle. Clock cycles
> required to execute an instruction do vary from 2 cycles 12 cycles
> varying from CPU to CPU.

See the description of an old Pentium MMX processor (the sucessor of i586, 
running up to 200 MHz):
http://www.intel.com/design/archives/processors/mmx/docs/243185.htm

Thanks to superscalar architecture, it runs 2 instructions per clock cycle 
(IPC).

Newer processors can run more instructions per clock cycle. For example, any 
Pentium-4 processor,
can do 3 IPC:
http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm

>> So, even on such bad hardware that is at least 20x slower than a netbook 
>> running at 1Gbps,
>> what determines the delay is the amount of I/O you're doing, and not the 
>> number of extra
>> code.
> 
> 
> The I/O overhead required to read 4 registers from hardware is the
> same whether you use the ioctl approach or s2api.

It seems you got my point. What will determinate the delay is the number of 
I/O's, and not the
amount of instructions.

> Eventually, as you have pointed out yourself, The data struct will be
> in the cache all the time for the ioctl approach. The only new
> addition to the existing API in the ioctl case is a CALL instruction
> as compared to the numerous instructions in comparison to that you
> have pointed out as with the s2api approach.

True, but, as shown, the additional delay introduced by the code is less than 
0.01%, even on
a processor that has half of the speed of a 12-year old very slow CPU (a 
Pentium MMX @ 100 MHz
is capable of 2 IPC. My calculus assumed 1 IPC).

So, what will affect the delay is the number of I/O you need to do.

To get all data that the ioctl approach struct has, the delay for S2API will be 
equal.
To get less data, S2API will have a small delay.

Cheers,
Mauro.
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API

2009-12-09 Thread Manu Abraham

On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab
 wrote:
> Manu Abraham wrote:
>
>>> Not true. As pointed at the previous answer, the difference between a new 
>>> ioctl
>>> and S2API is basically the code at dtv_property_prepare_get_stats() and
>>> dtv_property_process_get(). This is a pure code that uses a continuous 
>>> struct
>>> that will likely be at L3 cache, inside the CPU chip. So, this code will run
>>> really quickly.
>>
>>
>>
>> AFAIK, cache eviction occurs with syscalls: where content in the
>> caches near the CPU cores is pushed to the outer cores, resulting in
>> cache misses. Not all CPU's are equipped with L3 caches. Continuous
>> syscalls can result in TLB cache misses from what i understand, which
>> is expensive.
>
> It is very likely that the contents of struct fe to go into the cache during 
> the
> syscall. I was conservative when I talked about L3. Depending on the cache 
> sizes,
> I won't doubt that the needed fields of the fe struct will go to L1 cache.



Ah, so the data structure which is there in the ioctl approach as well
and "less likely" to get cache hits since the calls are lesser.


>>> As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
>>> gigahertz
>>> order, and CPU's can handle multiple instructions per clock cycle), the 
>>> added delay
>>> is in de order of nanosseconds.
>>
>>
>> Consider STB's where DVB is generally deployed rather than the small
>> segment of home users running a stack on a generic PC.
>
> Even with STB, let's assume a very slow cpu that runs at 100 
> Megabytes/second. So, the clock
> speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
> being
> capable of handling only one instruction per second, you'll have one 
> instruction at executed
> at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
> faster than this).

Incorrect.
A CPU doesn't execute instruction per clock cycle. Clock cycles
required to execute an instruction do vary from 2 cycles 12 cycles
varying from CPU to CPU.


> An I/O operation at i2c is of the order of 10^-3. Assuming that an additional 
> delay of 10%
> (10 ^ -4) will deadly affect realtime capability (with it is very doubtful), 
> this means that
> the CPU can run up to 10,000 (!!!) instructions to generate such delay. If 
> you compile that code
> and check the number or extra instructions I bet it will be shorter enough to 
> not cause any
> practical effect.
>
> So, even on such bad hardware that is at least 20x slower than a netbook 
> running at 1Gbps,
> what determines the delay is the amount of I/O you're doing, and not the 
> number of extra
> code.


The I/O overhead required to read 4 registers from hardware is the
same whether you use the ioctl approach or s2api.


>  > Hardware I/O is the most expensive operation involved.
>
> True. That's what I said.
>
>> Case #1: the ioctl approach
>
>>
>> Now Case #2: based on s2api
>
>
>> Now that we can see the actual code flow, we can see the s2api
>> approach requires an unnecessary large tokenizer/serializer, involving
>> multiple function calls.
>
> Are you seeing there 10.000 assembler instructions or so? If not, the size of 
> the code is
> not relevant.
>
> Also: gcc optimizer transforms switches into a binary tree. So, if you have 64
> cases on switch, it will require 7 comparations (log2(64)) to get a match.
>
> For example, a quick look at the code you've presented, let's just calculate
> the number of operations on the dtv_property_proccess_get() routine, without
> debug stuff:
>
> static int dtv_property_process_get() {
>CMP (if fe->ops.get_property)
>CMP (if r < 0)   < This if only happens if the 
> first one is executed. On my patch, it is not executed
>(the code you posted is the 
> one before my patch)
>SWITCH (7 CMP's) < due to binary tree optimization 
> done by gcc
>MOV
> }
>
> So, that entire code (that has about 200 lines) has, in fact
> 9 comparations and one move instruction.
>
> At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch 
> with 8
> elements (log2(8) = 3), so 3 comparations, and one move instruction.
>
> The additional cost on dvb_frontend_ioctl_properties is:
>2 MOVs
>One loop calling dtv_property_prepare_get_stats() - up to 4 times to 
> retrieve
> all quality values
>one INC
>one CMP and function CALL (the same cost exists also with the struct)
>One loop calling dtv_property_get_stats() - up to 4 times to retrieve
> all quality values
>
> So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 
> instructions.
>
> 2) the userspace->kernelspace payload.
>
> Case #1: The size of S2API structs. It will range from 24 to 84 (depending on 
> what
> you want to get, from one to 4 different value pairs).
>
> Case #2: The size of the ioctl struct: ab

Re: New DVB-Statistics API

2009-12-08 Thread Mauro Carvalho Chehab

Manu Abraham wrote:

>> Not true. As pointed at the previous answer, the difference between a new 
>> ioctl
>> and S2API is basically the code at dtv_property_prepare_get_stats() and
>> dtv_property_process_get(). This is a pure code that uses a continuous struct
>> that will likely be at L3 cache, inside the CPU chip. So, this code will run
>> really quickly.
> 
> 
> 
> AFAIK, cache eviction occurs with syscalls: where content in the
> caches near the CPU cores is pushed to the outer cores, resulting in
> cache misses. Not all CPU's are equipped with L3 caches. Continuous
> syscalls can result in TLB cache misses from what i understand, which
> is expensive.

It is very likely that the contents of struct fe to go into the cache during the
syscall. I was conservative when I talked about L3. Depending on the cache 
sizes,
I won't doubt that the needed fields of the fe struct will go to L1 cache.

>> As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
>> gigahertz
>> order, and CPU's can handle multiple instructions per clock cycle), the 
>> added delay
>> is in de order of nanosseconds.
> 
> 
> Consider STB's where DVB is generally deployed rather than the small
> segment of home users running a stack on a generic PC.

Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. 
So, the clock
speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, 
being
capable of handling only one instruction per second, you'll have one 
instruction at executed
at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is 
faster than this).

An I/O operation at i2c is of the order of 10^-3. Assuming that an additional 
delay of 10%
(10 ^ -4) will deadly affect realtime capability (with it is very doubtful), 
this means that
the CPU can run up to 10,000 (!!!) instructions to generate such delay. If you 
compile that code
and check the number or extra instructions I bet it will be shorter enough to 
not cause any
practical effect.

So, even on such bad hardware that is at least 20x slower than a netbook 
running at 1Gbps,
what determines the delay is the amount of I/O you're doing, and not the number 
of extra
code.

 > Hardware I/O is the most expensive operation involved. 

True. That's what I said.

> Case #1: the ioctl approach

>
> Now Case #2: based on s2api

> Now that we can see the actual code flow, we can see the s2api
> approach requires an unnecessary large tokenizer/serializer, involving
> multiple function calls.

Are you seeing there 10.000 assembler instructions or so? If not, the size of 
the code is
not relevant. 

Also: gcc optimizer transforms switches into a binary tree. So, if you have 64
cases on switch, it will require 7 comparations (log2(64)) to get a match.

For example, a quick look at the code you've presented, let's just calculate
the number of operations on the dtv_property_proccess_get() routine, without
debug stuff:

static int dtv_property_process_get() {
CMP (if fe->ops.get_property)
CMP (if r < 0)   < This if only happens if the 
first one is executed. On my patch, it is not executed
(the code you posted is the one 
before my patch)
SWITCH (7 CMP's) < due to binary tree optimization 
done by gcc
MOV
}

So, that entire code (that has about 200 lines) has, in fact
9 comparations and one move instruction.

At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch 
with 8
elements (log2(8) = 3), so 3 comparations, and one move instruction.

The additional cost on dvb_frontend_ioctl_properties is:
2 MOVs
One loop calling dtv_property_prepare_get_stats() - up to 4 times to 
retrieve
all quality values
one INC
one CMP and function CALL (the same cost exists also with the struct)
One loop calling dtv_property_get_stats() - up to 4 times to retrieve
all quality values

So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 
instructions.

So, the real executed code is very cheap. On that very slow CPU, the delay will 
be
about 63 x 10 nanosseconds = 630 nanosseconds of delay (being 220ns before the
call and 410ns after it).

This means 0.063% of a delay at the order of milisseconds required just one
I2C read (and, in general, you need more than one single read to get the stats).

Even if the code would double the size, the latency won't be affected.

> Aspect #1. Comparing Case #1 and Case #2, i guess anyone will agree
> that case 2 has very much code overhead in terms of the serialization
> and serialization set/unset.

It has some overhead, but if you are concerned about the realtime case, 
just run kernel perf tools. You'll see my point. If you have less I/O
to do (to retrieve just the data you need), Case #2 is much faster than Case #1.

> Aspect #2, Comparing the data payload between Case #1 and Case #2,

Re: New DVB-Statistics API

2009-12-08 Thread Manu Abraham

On Tue, Dec 8, 2009 at 5:22 PM, Mauro Carvalho Chehab
 wrote:
> Hi Julian,
>
> Let me add some corrections to your technical analysis.
>
> Julian Scheel wrote:
>> Hello together,
>>
>> after the last thread which asked about signal statistics details
>> degenerated into a discussion about the technical possibilites for
>> implementing an entirely new API, which lead to nothing so far, I wanted
>> to open a new thread to bring this forward. Maybe some more people can
>> give their votes for the different options
>>
>> Actually Manu did propose a new API for fetching enhanced statistics. It
>> uses new IOCTL to directly fetch the statistical data in one go from the
>> frontend. This propose was so far rejected by Mauro who wants to use the
>> S2API get/set calls instead.
>>
>> Now to give everyone a quick overview about the advantages and
>> disadvantages of both approaches I will try to summarize it up:
>>
>> 1st approach: Introduce new IOCTL
>>
>> Pros:
>> - Allows a quick fetch of the entire dataset, which ensures that:
>>  1. all values are fetched in one go (as long as possible) from the
>> frontend, so that they can be treated as one united and be valued in
>> conjunction
>>  2. the requested values arrive the caller in an almost constant
>> timeframe, as the ioctl is directly executed by the driver
>> - It does not interfere with the existing statistics API, which has to
>> be kept alive as it is in use for a long time already. (unifying it's
>> data is not the approach of this new API)
>>
>> Cons:
>> - Forces the application developers to interact with two APIs. The S2API
>> for tuning on the one hand and the Statistics API for reading signal
>> values on the other hand.
>>
>> 2nd approach: Set up S2API calls for reading statistics
>>
>> Pros:
>> - Continous unification of the linuxtv API, allowing all calls to be
>> made through one API. -> easy for application developers
>
> - Scaling values can be retrieved/negotiated (if we implement the set
> mode) before requesting the stats, using the same API;
>
> - API can be easily extended to support other statistics that may be needed
> by newer DTV standards.
>
>>
>> Cons:
>> - Due to the key/value pairs used for S2API getters the statistical
>> values can't be read as a unique block, so we loose the guarantee, that
>> all of the values can be treatend as one unit expressing the signals
>> state at a concrete time.
>
> You missed the point here. The proposal patch groups all S2API
> pairs into a _single_ call into the driver:
>
>> + for (i = 0; i < tvps->num; i++)
>> + need_get_ops += dtv_property_prepare_get_stats(fe,
>> +  tvp + i, inode, file);
>> +
>> + if (!fe->dtv_property_cache.need_stats) {
>> + need_get_ops++;
>> + } else {
>> + if (fe->ops.get_stats) {
>> + err = fe->ops.get_stats(fe);
>> + if (err < 0)
>> + return err;
>> + }
>> + }
>
> The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) 
> that
> will describe all value pairs that userspace is expecting. After doing it,
> a single call is done to get_stats() callback.
>
> All the driver need to do is to fill all values at dtv_property_cache. If the 
> driver
> fills more values than requested by the user, the extra values will simply be 
> discarded.
>
> In order to reduce latency, the driver may opt to not read the register 
> values for the
> data that aren't requested by the user, like I did on cx24123 driver.
>
> Those values will all be returned at the same time to userspace by a single 
> copy_to_user()
> operation.
>
>> - Due to the general architecture of the S2API the delay between
>> requesting and receiving the actual data could become too big to allow
>> realtime interpretation of the data (as it is needed for applications
>> like satellite finders, etc.)
>
> Not true. As pointed at the previous answer, the difference between a new 
> ioctl
> and S2API is basically the code at dtv_property_prepare_get_stats() and
> dtv_property_process_get(). This is a pure code that uses a continuous struct
> that will likely be at L3 cache, inside the CPU chip. So, this code will run
> really quickly.



AFAIK, cache eviction occurs with syscalls: where content in the
caches near the CPU cores is pushed to the outer cores, resulting in
cache misses. Not all CPU's are equipped with L3 caches. Continuous
syscalls can result in TLB cache misses from what i understand, which
is expensive.


These are the numbers Intel lists for a Pentium M:

To  Cycles
Register<= 1
L1d ~3
L2  ~14
Main Memory ~240



> As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
> gigahertz
> order, and CPU's can handle multiple instructions per clock cycle), the a

Re: New DVB-Statistics API

2009-12-08 Thread Mauro Carvalho Chehab

Hi Julian,

Let me add some corrections to your technical analysis.

Julian Scheel wrote:
> Hello together,
> 
> after the last thread which asked about signal statistics details
> degenerated into a discussion about the technical possibilites for
> implementing an entirely new API, which lead to nothing so far, I wanted
> to open a new thread to bring this forward. Maybe some more people can
> give their votes for the different options
> 
> Actually Manu did propose a new API for fetching enhanced statistics. It
> uses new IOCTL to directly fetch the statistical data in one go from the
> frontend. This propose was so far rejected by Mauro who wants to use the
> S2API get/set calls instead.
> 
> Now to give everyone a quick overview about the advantages and
> disadvantages of both approaches I will try to summarize it up:
> 
> 1st approach: Introduce new IOCTL
> 
> Pros:
> - Allows a quick fetch of the entire dataset, which ensures that:
>  1. all values are fetched in one go (as long as possible) from the
> frontend, so that they can be treated as one united and be valued in
> conjunction
>  2. the requested values arrive the caller in an almost constant
> timeframe, as the ioctl is directly executed by the driver
> - It does not interfere with the existing statistics API, which has to
> be kept alive as it is in use for a long time already. (unifying it's
> data is not the approach of this new API)
> 
> Cons:
> - Forces the application developers to interact with two APIs. The S2API
> for tuning on the one hand and the Statistics API for reading signal
> values on the other hand.
> 
> 2nd approach: Set up S2API calls for reading statistics
> 
> Pros:
> - Continous unification of the linuxtv API, allowing all calls to be
> made through one API. -> easy for application developers

- Scaling values can be retrieved/negotiated (if we implement the set
mode) before requesting the stats, using the same API;

- API can be easily extended to support other statistics that may be needed
by newer DTV standards.

> 
> Cons:
> - Due to the key/value pairs used for S2API getters the statistical
> values can't be read as a unique block, so we loose the guarantee, that
> all of the values can be treatend as one unit expressing the signals
> state at a concrete time.

You missed the point here. The proposal patch groups all S2API
pairs into a _single_ call into the driver:

> + for (i = 0; i < tvps->num; i++)
> + need_get_ops += dtv_property_prepare_get_stats(fe,
> +  tvp + i, inode, file);
> +
> + if (!fe->dtv_property_cache.need_stats) {
> + need_get_ops++;
> + } else {
> + if (fe->ops.get_stats) {
> + err = fe->ops.get_stats(fe);
> + if (err < 0)
> + return err;
> + }
> + }

The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) 
that
will describe all value pairs that userspace is expecting. After doing it,
a single call is done to get_stats() callback.

All the driver need to do is to fill all values at dtv_property_cache. If the 
driver
fills more values than requested by the user, the extra values will simply be 
discarded.

In order to reduce latency, the driver may opt to not read the register values 
for the
data that aren't requested by the user, like I did on cx24123 driver.

Those values will all be returned at the same time to userspace by a single 
copy_to_user()
operation.

> - Due to the general architecture of the S2API the delay between
> requesting and receiving the actual data could become too big to allow
> realtime interpretation of the data (as it is needed for applications
> like satellite finders, etc.)

Not true. As pointed at the previous answer, the difference between a new ioctl
and S2API is basically the code at dtv_property_prepare_get_stats() and
dtv_property_process_get(). This is a pure code that uses a continuous struct
that will likely be at L3 cache, inside the CPU chip. So, this code will run
really quickly.

As current CPU's runs at the order of Teraflops (as the CPU clocks are at 
gigahertz
order, and CPU's can handle multiple instructions per clock cycle), the added 
delay
is in de order of nanosseconds. 

On the other hand, a simple read of a value from an i2c device is in the order
of milisseconds, since I2C serial bus, speed is slow (typically operating at
100 Kbps).

So, the delay is determined by the number of I2C calls you have at the code.

With the new ioctl proposal, as you need to read all data from I2C (even the 
ones
that userspace don't need), you'll have two situations:
- if you use S2API call to request all data provided by ioctl approach,
the delay will be the same;
- if you use S2API call to request less stats, the S2API delay will
be shorter. 

For example,

Re: New DVB(!!)-Statistics API

2009-12-08 Thread Julian Scheel


Am 08.12.09 10:16, schrieb Julian Scheel:

Hello together,

after the last thread which asked about signal statistics details 
degenerated into a discussion about the technical possibilites for 
implementing an entirely new API, which lead to nothing so far, I 
wanted to open a new thread to bring this forward. Maybe some more 
people can give their votes for the different options


Actually Manu did propose a new API for fetching enhanced statistics. 
It uses new IOCTL to directly fetch the statistical data in one go 
from the frontend. This propose was so far rejected by Mauro who wants 
to use the S2API get/set calls instead.


Now to give everyone a quick overview about the advantages and 
disadvantages of both approaches I will try to summarize it up:


1st approach: Introduce new IOCTL

Pros:
- Allows a quick fetch of the entire dataset, which ensures that:
 1. all values are fetched in one go (as long as possible) from the 
frontend, so that they can be treated as one united and be valued in 
conjunction
 2. the requested values arrive the caller in an almost constant 
timeframe, as the ioctl is directly executed by the driver
- It does not interfere with the existing statistics API, which has to 
be kept alive as it is in use for a long time already. (unifying it's 
data is not the approach of this new API)


Cons:
- Forces the application developers to interact with two APIs. The 
S2API for tuning on the one hand and the Statistics API for reading 
signal values on the other hand.


2nd approach: Set up S2API calls for reading statistics

Pros:
- Continous unification of the linuxtv API, allowing all calls to be 
made through one API. -> easy for application developers


Cons:
- Due to the key/value pairs used for S2API getters the statistical 
values can't be read as a unique block, so we loose the guarantee, 
that all of the values can be treatend as one unit expressing the 
signals state at a concrete time.
- Due to the general architecture of the S2API the delay between 
requesting and receiving the actual data could become too big to allow 
realtime interpretation of the data (as it is needed for applications 
like satellite finders, etc.)


I hope that this summary is technically correct in all points, if not 
I'd be thankful for objective corrections. I am not going to 
articulate my personal opinion in this mail, but I will do it in 
another mail in reply to this one, so that this mail can be seen as a 
neutral summary of the current situation.


So now it's everyones turn to think about the options and give an 
opinion. In the end I am quite sure that all of us would have great 
benefits of an improved statistics API.


Cheers,
Julian
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The above mail of course has nothing to do with USB-Statistics. I must 
have been slightly confused when typing the message title! Sorry for that!

--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: New DVB-Statistics API - please vote

Re: New DVB-Statistics API - please vote

Re: New DVB-Statistics API - please vote

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB-Statistics API

Re: New DVB(!!)-Statistics API

11 matches

Site Navigation

Mail list logo

Footer information