Re: New DVB-Statistics API - please vote
On Wed, Feb 3, 2010 at 2:40 PM, Hans Verkuil wrote: > >> Mauro Carvalho Chehab wrote: >> after the last thread which asked about signal statistics details degenerated into a discussion about the technical possibilites for implementing an entirely new API, which lead to nothing so far, I wanted to open a new thread to bring this forward. Maybe some more people can give their votes for the different options >> >> Only me and Manu manifested with opinions on this thread. Not sure why >> nobody else gave their comments. Maybe all interested people just decided >> to take a long vacation and are not listening to their emails ;) >> >> Seriously, from what I understand, this is an API improvement and we need >> to take a decision on it. So, your opinions are important. >> >> --- >> >> Let me draw a summary of this subject, trying to be impartial. >> >> The original proposal were made by Manu. My proposal is derived from >> Manu's >> original one, both will equally provide the same features. >> >> The main difference is that Manu's proposal use a struct to get the >> statistics while my proposal uses DVBS2API. >> >> With both API proposals, all values are get at the same time by the >> driver. >> the DVBS2API adds a small delay to fill the fields, but the extra delay is >> insignificant, when compared with the I/O delays needed to retrieve the >> values from the hardware. >> >> Due to the usage of DVBS2API, it is possible to retrieve a subset of >> statistics. >> When obtaining a subset, the DVBS2API latency is considerable faster, as >> less >> data needed to be transfered from the hardware. >> >> The DVBS2API also offers the possibility of expanding the statistics >> group, since >> it is not rigid as an struct. >> >> One criteria that should be reminded is that, according with Linux Kernel >> rules, >> any userspace API is stable. This means that applications compiled against >> an >> older API version should keep running with the latest kernel. So, whatever >> decided, >> the statistics API should always maintain backward compatibility. >> >> --- >> >> During the end of the year, I did some work with an ISDB-T driver for >> Siano, and >> I realized that the usage of the proposed struct won't fit well for >> ISDB-T. The >> reason is that, on ISDB-T, the transmission uses up to 3 hierarchical >> layers. >> Each layer may have different OFDM parameters, so the devices (at least, >> this is the >> case for Siano) has a group of statistics per layer. >> >> I'm afraid that newer standards may also bring different ways to present >> statistics, and >> the current proposal won't fit well. >> >> So, in my opinion, if it is chosen any struct-based approach, we'll have a >> bad time to >> maintain it, as it won't fit into all cases, and we'll need to add some >> tricks to extend >> the struct. >> >> So, my vote is for the DVBS2API approach, where a new group of statistics >> can easily be added, >> on an elegant way, without needing of re-discussing the better API or to >> find a way to extend >> the size of an struct without breaking backward compatibility. > > From a purely technical standpoint the DVBS2API is by definition more > flexible and future-proof. The V4L API has taken the same approach with > controls (basically exactly the same mechanism). Should it be necessary in > the future to set multiple properties atomically, then the same technique > as V4L can be used (see VIDIOC_S_EXT_CTRLS). > > The alternative is to make structs with lots of reserved fields. It > depends on how predictable the API is expected to be. Sometimes you can be > reasonably certain that there won't be too many additions in the future > and then using reserved fields is perfectly OK. > > Just my 5 cents based on my V4L experience. In fact, I don't see the advantage using a specific get/set, since what i proposed just reads back basic data types such as a u32 for any instance and those being "standard " for any digital communication system. It makes no sense to go for a get/set approach. For example there cannot be multiple properties for any digital system such as BER, just that there are different measure s such as kilograms, pounds, grams etc. Which is what the whole patch is meant to do in a performance critical manner. The whole patch is a kind of get/set and and hence it doesn't need a get/set at the micro level. To contradict the reverse, as an example, I can state that weight is not measured in centimeters or inches or feet, to put it in laymans terms. if you say that it needs to be expressed as such, then i very well see that there is something conceptually wrong in your thought. My 2 cents, for those who don't understand the issue. Regards, Manu -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New DVB-Statistics API - please vote
> Mauro Carvalho Chehab wrote: > >>> after the last thread which asked about signal statistics details >>> degenerated into a discussion about the technical possibilites for >>> implementing an entirely new API, which lead to nothing so far, I >>> wanted >>> to open a new thread to bring this forward. Maybe some more people can >>> give their votes for the different options > > Only me and Manu manifested with opinions on this thread. Not sure why > nobody else gave their comments. Maybe all interested people just decided > to take a long vacation and are not listening to their emails ;) > > Seriously, from what I understand, this is an API improvement and we need > to take a decision on it. So, your opinions are important. > > --- > > Let me draw a summary of this subject, trying to be impartial. > > The original proposal were made by Manu. My proposal is derived from > Manu's > original one, both will equally provide the same features. > > The main difference is that Manu's proposal use a struct to get the > statistics while my proposal uses DVBS2API. > > With both API proposals, all values are get at the same time by the > driver. > the DVBS2API adds a small delay to fill the fields, but the extra delay is > insignificant, when compared with the I/O delays needed to retrieve the > values from the hardware. > > Due to the usage of DVBS2API, it is possible to retrieve a subset of > statistics. > When obtaining a subset, the DVBS2API latency is considerable faster, as > less > data needed to be transfered from the hardware. > > The DVBS2API also offers the possibility of expanding the statistics > group, since > it is not rigid as an struct. > > One criteria that should be reminded is that, according with Linux Kernel > rules, > any userspace API is stable. This means that applications compiled against > an > older API version should keep running with the latest kernel. So, whatever > decided, > the statistics API should always maintain backward compatibility. > > --- > > During the end of the year, I did some work with an ISDB-T driver for > Siano, and > I realized that the usage of the proposed struct won't fit well for > ISDB-T. The > reason is that, on ISDB-T, the transmission uses up to 3 hierarchical > layers. > Each layer may have different OFDM parameters, so the devices (at least, > this is the > case for Siano) has a group of statistics per layer. > > I'm afraid that newer standards may also bring different ways to present > statistics, and > the current proposal won't fit well. > > So, in my opinion, if it is chosen any struct-based approach, we'll have a > bad time to > maintain it, as it won't fit into all cases, and we'll need to add some > tricks to extend > the struct. > > So, my vote is for the DVBS2API approach, where a new group of statistics > can easily be added, > on an elegant way, without needing of re-discussing the better API or to > find a way to extend > the size of an struct without breaking backward compatibility. >From a purely technical standpoint the DVBS2API is by definition more flexible and future-proof. The V4L API has taken the same approach with controls (basically exactly the same mechanism). Should it be necessary in the future to set multiple properties atomically, then the same technique as V4L can be used (see VIDIOC_S_EXT_CTRLS). The alternative is to make structs with lots of reserved fields. It depends on how predictable the API is expected to be. Sometimes you can be reasonably certain that there won't be too many additions in the future and then using reserved fields is perfectly OK. Just my 5 cents based on my V4L experience. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New DVB-Statistics API - please vote
Mauro Carvalho Chehab wrote: >> after the last thread which asked about signal statistics details >> degenerated into a discussion about the technical possibilites for >> implementing an entirely new API, which lead to nothing so far, I wanted >> to open a new thread to bring this forward. Maybe some more people can >> give their votes for the different options Only me and Manu manifested with opinions on this thread. Not sure why nobody else gave their comments. Maybe all interested people just decided to take a long vacation and are not listening to their emails ;) Seriously, from what I understand, this is an API improvement and we need to take a decision on it. So, your opinions are important. --- Let me draw a summary of this subject, trying to be impartial. The original proposal were made by Manu. My proposal is derived from Manu's original one, both will equally provide the same features. The main difference is that Manu's proposal use a struct to get the statistics while my proposal uses DVBS2API. With both API proposals, all values are get at the same time by the driver. the DVBS2API adds a small delay to fill the fields, but the extra delay is insignificant, when compared with the I/O delays needed to retrieve the values from the hardware. Due to the usage of DVBS2API, it is possible to retrieve a subset of statistics. When obtaining a subset, the DVBS2API latency is considerable faster, as less data needed to be transfered from the hardware. The DVBS2API also offers the possibility of expanding the statistics group, since it is not rigid as an struct. One criteria that should be reminded is that, according with Linux Kernel rules, any userspace API is stable. This means that applications compiled against an older API version should keep running with the latest kernel. So, whatever decided, the statistics API should always maintain backward compatibility. --- During the end of the year, I did some work with an ISDB-T driver for Siano, and I realized that the usage of the proposed struct won't fit well for ISDB-T. The reason is that, on ISDB-T, the transmission uses up to 3 hierarchical layers. Each layer may have different OFDM parameters, so the devices (at least, this is the case for Siano) has a group of statistics per layer. I'm afraid that newer standards may also bring different ways to present statistics, and the current proposal won't fit well. So, in my opinion, if it is chosen any struct-based approach, we'll have a bad time to maintain it, as it won't fit into all cases, and we'll need to add some tricks to extend the struct. So, my vote is for the DVBS2API approach, where a new group of statistics can easily be added, on an elegant way, without needing of re-discussing the better API or to find a way to extend the size of an struct without breaking backward compatibility. Cheers, Mauro -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New DVB-Statistics API
Julian Scheel wrote: > Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab: >> Manu Abraham wrote: > I don't think you can just take the average IPC rates into account for > this. When doing a syscall the processors TLB cache will be cleared, > which will force the CPU to go to the whole instruction pipeline before > the first syscall instruction is actually executed. This will introduce > a delay for each syscall you make. I'm not exactly sure about the length > of the delay, but I think it should be something like 2 clock cycles. True, but this delay is common to both S2API and struct. >> To get all data that the ioctl approach struct has, the delay for >> S2API will be equal. >> To get less data, S2API will have a small delay. >> > Imho the S2API would be slower when reading all data the ioctl fetches, > due to the way the instructions would be handled. > > Correct me, if I'm wrong with any of this. Not sure if I understood your question. On both cases, just one function call will go to the driver, with one struct (struct fe, the case of S2API) or two structs (struct fe and the stats-specific struct(s)) in the case of a new ioctl(s). As drivers are free to implement any logic, the driver can implement exactly the same logic with both API calls. So, the delay to get the info will be equal on both cases. In a practical case, this will take at least a few milisseconds to retrieve all data. It may take even more, since, on some drivers, you may need to wait for some condition to happen before start measuring, in order to be sure that you'll be getting atomic and accurate values. After the function return, with an struct, you can just return, while, with S2API, you'll need to put the data into the proper payload fields, but this will add a delay in the order of nanoseconds. Let's say that, to get all data, the routine needs 10 milisseconds. The difference between new ioctl or S2API will be of about 0,00063 milliseconds on a Pentium MMX. On a machine with 1GHz of clock, 2 IPC, the difference will be 0,315 milliseconds. Considering that the Linux kernel is preemptive, and an interrupt or the scheduler could be called during the 10 milliseconds time, I doubt you'll be able to notice that difference on any practical use case. On the other hand, if you need for example just the strength of the signal at the AGC, if you call via struct, you'll still be consuming the same 10 ms, while, with S2API, you can do it, let's say, on 1 ms (the real numbers will depend, of course, on how much I/O is needed on hardware, and on how many time do you need to wait there to get an stable value). So, if you want to do things like moving a rotor, S2API will give better results. If you want all stats, it will give the same result as a new ioctl. While I don't think that 0,315 milliseconds is worth enough to cause any troubles, With a simple change like the one bellow, this time can be reduced to 0,105 milisseconds with a patch like that (the patch were simplified to change just quality, but, of course, such approach needs to be done on the other fields to get this result): Index: master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c === --- master.orig/linux/drivers/media/dvb/dvb-core/dvb_frontend.c +++ master/linux/drivers/media/dvb/dvb-core/dvb_frontend.c @@ -1220,6 +1220,7 @@ static int dtv_property_prepare_get_stat switch (tvp->cmd) { case DTV_FE_QUALITY: fe->dtv_property_cache.need_stats |= FE_NEED_QUALITY; + fe->dtv_property_cache.quality = &tvp->u.data; break; case DTV_FE_QUALITY_UNIT: fe->dtv_property_cache.need_stats |= FE_NEED_QUALITY_UNIT; @@ -1384,9 +1385,6 @@ static int dtv_property_process_get(stru break; /* Quality measures */ - case DTV_FE_QUALITY: - tvp->u.data = fe->dtv_property_cache.quality; - break; case DTV_FE_QUALITY_UNIT: tvp->u.data = fe->dtv_property_cache.quality_unit; break; @@ -1696,10 +1697,12 @@ static int dvb_frontend_ioctl_properties } } - for (i = 0; i < tvps->num; i++) { - (tvp + i)->result = dtv_property_process_get(fe, - tvp + i, inode, file, need_get_ops); - err |= (tvp + i)->result; + if (need_get_ops) { + for (i = 0; i < tvps->num; i++) { + (tvp + i)->result = dtv_property_process_get(fe, + tvp + i, inode, file, need_get_ops); + err |= (tvp + i)->result; + } } if (copy_to_user(tvps->props, tvp, tvps->num * sizeof(struct dtv_property))) { Index: master/linux/drivers/media/dvb/d
Re: New DVB-Statistics API
Am 09.12.09 14:02, schrieb Mauro Carvalho Chehab: Manu Abraham wrote: On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab wrote: Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. So, the clock speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, being capable of handling only one instruction per second, you'll have one instruction at executed at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is faster than this). Incorrect. A CPU doesn't execute instruction per clock cycle. Clock cycles required to execute an instruction do vary from 2 cycles 12 cycles varying from CPU to CPU. See the description of an old Pentium MMX processor (the sucessor of i586, running up to 200 MHz): http://www.intel.com/design/archives/processors/mmx/docs/243185.htm Thanks to superscalar architecture, it runs 2 instructions per clock cycle (IPC). Newer processors can run more instructions per clock cycle. For example, any Pentium-4 processor, can do 3 IPC: http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm I don't think you can just take the average IPC rates into account for this. When doing a syscall the processors TLB cache will be cleared, which will force the CPU to go to the whole instruction pipeline before the first syscall instruction is actually executed. This will introduce a delay for each syscall you make. I'm not exactly sure about the length of the delay, but I think it should be something like 2 clock cycles. So, even on such bad hardware that is at least 20x slower than a netbook running at 1Gbps, what determines the delay is the amount of I/O you're doing, and not the number of extra code. The I/O overhead required to read 4 registers from hardware is the same whether you use the ioctl approach or s2api. It seems you got my point. What will determinate the delay is the number of I/O's, and not the amount of instructions. The number of hardware I/Os is constant for both cases, so we do not need to discuss them as pro/con for any of the proposals. Eventually, as you have pointed out yourself, The data struct will be in the cache all the time for the ioctl approach. The only new addition to the existing API in the ioctl case is a CALL instruction as compared to the numerous instructions in comparison to that you have pointed out as with the s2api approach. True, but, as shown, the additional delay introduced by the code is less than 0.01%, even on a processor that has half of the speed of a 12-year old very slow CPU (a Pentium MMX @ 100 MHz is capable of 2 IPC. My calculus assumed 1 IPC). So, what will affect the delay is the number of I/O you need to do. To get all data that the ioctl approach struct has, the delay for S2API will be equal. To get less data, S2API will have a small delay. Imho the S2API would be slower when reading all data the ioctl fetches, due to the way the instructions would be handled. Correct me, if I'm wrong with any of this. Cheers, Julian -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New DVB-Statistics API
Manu Abraham wrote: > On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab > wrote: >> Even with STB, let's assume a very slow cpu that runs at 100 >> Megabytes/second. So, the clock >> speed is 10 nanoseconds. Assuming that this CPU doesn't have a good >> pipeline, being >> capable of handling only one instruction per second, you'll have one >> instruction at executed >> at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is >> faster than this). > > Incorrect. > A CPU doesn't execute instruction per clock cycle. Clock cycles > required to execute an instruction do vary from 2 cycles 12 cycles > varying from CPU to CPU. See the description of an old Pentium MMX processor (the sucessor of i586, running up to 200 MHz): http://www.intel.com/design/archives/processors/mmx/docs/243185.htm Thanks to superscalar architecture, it runs 2 instructions per clock cycle (IPC). Newer processors can run more instructions per clock cycle. For example, any Pentium-4 processor, can do 3 IPC: http://www.intel.com/support/processors/pentium4/sb/CS-017371.htm >> So, even on such bad hardware that is at least 20x slower than a netbook >> running at 1Gbps, >> what determines the delay is the amount of I/O you're doing, and not the >> number of extra >> code. > > > The I/O overhead required to read 4 registers from hardware is the > same whether you use the ioctl approach or s2api. It seems you got my point. What will determinate the delay is the number of I/O's, and not the amount of instructions. > Eventually, as you have pointed out yourself, The data struct will be > in the cache all the time for the ioctl approach. The only new > addition to the existing API in the ioctl case is a CALL instruction > as compared to the numerous instructions in comparison to that you > have pointed out as with the s2api approach. True, but, as shown, the additional delay introduced by the code is less than 0.01%, even on a processor that has half of the speed of a 12-year old very slow CPU (a Pentium MMX @ 100 MHz is capable of 2 IPC. My calculus assumed 1 IPC). So, what will affect the delay is the number of I/O you need to do. To get all data that the ioctl approach struct has, the delay for S2API will be equal. To get less data, S2API will have a small delay. Cheers, Mauro. -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New DVB-Statistics API
On Wed, Dec 9, 2009 at 3:43 AM, Mauro Carvalho Chehab wrote: > Manu Abraham wrote: > >>> Not true. As pointed at the previous answer, the difference between a new >>> ioctl >>> and S2API is basically the code at dtv_property_prepare_get_stats() and >>> dtv_property_process_get(). This is a pure code that uses a continuous >>> struct >>> that will likely be at L3 cache, inside the CPU chip. So, this code will run >>> really quickly. >> >> >> >> AFAIK, cache eviction occurs with syscalls: where content in the >> caches near the CPU cores is pushed to the outer cores, resulting in >> cache misses. Not all CPU's are equipped with L3 caches. Continuous >> syscalls can result in TLB cache misses from what i understand, which >> is expensive. > > It is very likely that the contents of struct fe to go into the cache during > the > syscall. I was conservative when I talked about L3. Depending on the cache > sizes, > I won't doubt that the needed fields of the fe struct will go to L1 cache. Ah, so the data structure which is there in the ioctl approach as well and "less likely" to get cache hits since the calls are lesser. >>> As current CPU's runs at the order of Teraflops (as the CPU clocks are at >>> gigahertz >>> order, and CPU's can handle multiple instructions per clock cycle), the >>> added delay >>> is in de order of nanosseconds. >> >> >> Consider STB's where DVB is generally deployed rather than the small >> segment of home users running a stack on a generic PC. > > Even with STB, let's assume a very slow cpu that runs at 100 > Megabytes/second. So, the clock > speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, > being > capable of handling only one instruction per second, you'll have one > instruction at executed > at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is > faster than this). Incorrect. A CPU doesn't execute instruction per clock cycle. Clock cycles required to execute an instruction do vary from 2 cycles 12 cycles varying from CPU to CPU. > An I/O operation at i2c is of the order of 10^-3. Assuming that an additional > delay of 10% > (10 ^ -4) will deadly affect realtime capability (with it is very doubtful), > this means that > the CPU can run up to 10,000 (!!!) instructions to generate such delay. If > you compile that code > and check the number or extra instructions I bet it will be shorter enough to > not cause any > practical effect. > > So, even on such bad hardware that is at least 20x slower than a netbook > running at 1Gbps, > what determines the delay is the amount of I/O you're doing, and not the > number of extra > code. The I/O overhead required to read 4 registers from hardware is the same whether you use the ioctl approach or s2api. > > Hardware I/O is the most expensive operation involved. > > True. That's what I said. > >> Case #1: the ioctl approach > >> >> Now Case #2: based on s2api > > >> Now that we can see the actual code flow, we can see the s2api >> approach requires an unnecessary large tokenizer/serializer, involving >> multiple function calls. > > Are you seeing there 10.000 assembler instructions or so? If not, the size of > the code is > not relevant. > > Also: gcc optimizer transforms switches into a binary tree. So, if you have 64 > cases on switch, it will require 7 comparations (log2(64)) to get a match. > > For example, a quick look at the code you've presented, let's just calculate > the number of operations on the dtv_property_proccess_get() routine, without > debug stuff: > > static int dtv_property_process_get() { >CMP (if fe->ops.get_property) >CMP (if r < 0) < This if only happens if the > first one is executed. On my patch, it is not executed >(the code you posted is the > one before my patch) >SWITCH (7 CMP's) < due to binary tree optimization > done by gcc >MOV > } > > So, that entire code (that has about 200 lines) has, in fact > 9 comparations and one move instruction. > > At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch > with 8 > elements (log2(8) = 3), so 3 comparations, and one move instruction. > > The additional cost on dvb_frontend_ioctl_properties is: >2 MOVs >One loop calling dtv_property_prepare_get_stats() - up to 4 times to > retrieve > all quality values >one INC >one CMP and function CALL (the same cost exists also with the struct) >One loop calling dtv_property_get_stats() - up to 4 times to retrieve > all quality values > > So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 > instructions. > > 2) the userspace->kernelspace payload. > > Case #1: The size of S2API structs. It will range from 24 to 84 (depending on > what > you want to get, from one to 4 different value pairs). > > Case #2: The size of the ioctl struct: ab
Re: New DVB-Statistics API
Manu Abraham wrote: >> Not true. As pointed at the previous answer, the difference between a new >> ioctl >> and S2API is basically the code at dtv_property_prepare_get_stats() and >> dtv_property_process_get(). This is a pure code that uses a continuous struct >> that will likely be at L3 cache, inside the CPU chip. So, this code will run >> really quickly. > > > > AFAIK, cache eviction occurs with syscalls: where content in the > caches near the CPU cores is pushed to the outer cores, resulting in > cache misses. Not all CPU's are equipped with L3 caches. Continuous > syscalls can result in TLB cache misses from what i understand, which > is expensive. It is very likely that the contents of struct fe to go into the cache during the syscall. I was conservative when I talked about L3. Depending on the cache sizes, I won't doubt that the needed fields of the fe struct will go to L1 cache. >> As current CPU's runs at the order of Teraflops (as the CPU clocks are at >> gigahertz >> order, and CPU's can handle multiple instructions per clock cycle), the >> added delay >> is in de order of nanosseconds. > > > Consider STB's where DVB is generally deployed rather than the small > segment of home users running a stack on a generic PC. Even with STB, let's assume a very slow cpu that runs at 100 Megabytes/second. So, the clock speed is 10 nanoseconds. Assuming that this CPU doesn't have a good pipeline, being capable of handling only one instruction per second, you'll have one instruction at executed at each 10 nanoseconds (as a reference, a Pentium 1, running at 133 Mbps is faster than this). An I/O operation at i2c is of the order of 10^-3. Assuming that an additional delay of 10% (10 ^ -4) will deadly affect realtime capability (with it is very doubtful), this means that the CPU can run up to 10,000 (!!!) instructions to generate such delay. If you compile that code and check the number or extra instructions I bet it will be shorter enough to not cause any practical effect. So, even on such bad hardware that is at least 20x slower than a netbook running at 1Gbps, what determines the delay is the amount of I/O you're doing, and not the number of extra code. > Hardware I/O is the most expensive operation involved. True. That's what I said. > Case #1: the ioctl approach > > Now Case #2: based on s2api > Now that we can see the actual code flow, we can see the s2api > approach requires an unnecessary large tokenizer/serializer, involving > multiple function calls. Are you seeing there 10.000 assembler instructions or so? If not, the size of the code is not relevant. Also: gcc optimizer transforms switches into a binary tree. So, if you have 64 cases on switch, it will require 7 comparations (log2(64)) to get a match. For example, a quick look at the code you've presented, let's just calculate the number of operations on the dtv_property_proccess_get() routine, without debug stuff: static int dtv_property_process_get() { CMP (if fe->ops.get_property) CMP (if r < 0) < This if only happens if the first one is executed. On my patch, it is not executed (the code you posted is the one before my patch) SWITCH (7 CMP's) < due to binary tree optimization done by gcc MOV } So, that entire code (that has about 200 lines) has, in fact 9 comparations and one move instruction. At dtv_property_prepare_get_stats(), the code is even cheaper: just a switch with 8 elements (log2(8) = 3), so 3 comparations, and one move instruction. The additional cost on dvb_frontend_ioctl_properties is: 2 MOVs One loop calling dtv_property_prepare_get_stats() - up to 4 times to retrieve all quality values one INC one CMP and function CALL (the same cost exists also with the struct) One loop calling dtv_property_get_stats() - up to 4 times to retrieve all quality values So, if I've calculated it right, we're talking about 2+1+16+1+2+1+40 = 63 instructions. So, the real executed code is very cheap. On that very slow CPU, the delay will be about 63 x 10 nanosseconds = 630 nanosseconds of delay (being 220ns before the call and 410ns after it). This means 0.063% of a delay at the order of milisseconds required just one I2C read (and, in general, you need more than one single read to get the stats). Even if the code would double the size, the latency won't be affected. > Aspect #1. Comparing Case #1 and Case #2, i guess anyone will agree > that case 2 has very much code overhead in terms of the serialization > and serialization set/unset. It has some overhead, but if you are concerned about the realtime case, just run kernel perf tools. You'll see my point. If you have less I/O to do (to retrieve just the data you need), Case #2 is much faster than Case #1. > Aspect #2, Comparing the data payload between Case #1 and Case #2,
Re: New DVB-Statistics API
On Tue, Dec 8, 2009 at 5:22 PM, Mauro Carvalho Chehab wrote: > Hi Julian, > > Let me add some corrections to your technical analysis. > > Julian Scheel wrote: >> Hello together, >> >> after the last thread which asked about signal statistics details >> degenerated into a discussion about the technical possibilites for >> implementing an entirely new API, which lead to nothing so far, I wanted >> to open a new thread to bring this forward. Maybe some more people can >> give their votes for the different options >> >> Actually Manu did propose a new API for fetching enhanced statistics. It >> uses new IOCTL to directly fetch the statistical data in one go from the >> frontend. This propose was so far rejected by Mauro who wants to use the >> S2API get/set calls instead. >> >> Now to give everyone a quick overview about the advantages and >> disadvantages of both approaches I will try to summarize it up: >> >> 1st approach: Introduce new IOCTL >> >> Pros: >> - Allows a quick fetch of the entire dataset, which ensures that: >> 1. all values are fetched in one go (as long as possible) from the >> frontend, so that they can be treated as one united and be valued in >> conjunction >> 2. the requested values arrive the caller in an almost constant >> timeframe, as the ioctl is directly executed by the driver >> - It does not interfere with the existing statistics API, which has to >> be kept alive as it is in use for a long time already. (unifying it's >> data is not the approach of this new API) >> >> Cons: >> - Forces the application developers to interact with two APIs. The S2API >> for tuning on the one hand and the Statistics API for reading signal >> values on the other hand. >> >> 2nd approach: Set up S2API calls for reading statistics >> >> Pros: >> - Continous unification of the linuxtv API, allowing all calls to be >> made through one API. -> easy for application developers > > - Scaling values can be retrieved/negotiated (if we implement the set > mode) before requesting the stats, using the same API; > > - API can be easily extended to support other statistics that may be needed > by newer DTV standards. > >> >> Cons: >> - Due to the key/value pairs used for S2API getters the statistical >> values can't be read as a unique block, so we loose the guarantee, that >> all of the values can be treatend as one unit expressing the signals >> state at a concrete time. > > You missed the point here. The proposal patch groups all S2API > pairs into a _single_ call into the driver: > >> + for (i = 0; i < tvps->num; i++) >> + need_get_ops += dtv_property_prepare_get_stats(fe, >> + tvp + i, inode, file); >> + >> + if (!fe->dtv_property_cache.need_stats) { >> + need_get_ops++; >> + } else { >> + if (fe->ops.get_stats) { >> + err = fe->ops.get_stats(fe); >> + if (err < 0) >> + return err; >> + } >> + } > > The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) > that > will describe all value pairs that userspace is expecting. After doing it, > a single call is done to get_stats() callback. > > All the driver need to do is to fill all values at dtv_property_cache. If the > driver > fills more values than requested by the user, the extra values will simply be > discarded. > > In order to reduce latency, the driver may opt to not read the register > values for the > data that aren't requested by the user, like I did on cx24123 driver. > > Those values will all be returned at the same time to userspace by a single > copy_to_user() > operation. > >> - Due to the general architecture of the S2API the delay between >> requesting and receiving the actual data could become too big to allow >> realtime interpretation of the data (as it is needed for applications >> like satellite finders, etc.) > > Not true. As pointed at the previous answer, the difference between a new > ioctl > and S2API is basically the code at dtv_property_prepare_get_stats() and > dtv_property_process_get(). This is a pure code that uses a continuous struct > that will likely be at L3 cache, inside the CPU chip. So, this code will run > really quickly. AFAIK, cache eviction occurs with syscalls: where content in the caches near the CPU cores is pushed to the outer cores, resulting in cache misses. Not all CPU's are equipped with L3 caches. Continuous syscalls can result in TLB cache misses from what i understand, which is expensive. These are the numbers Intel lists for a Pentium M: To Cycles Register<= 1 L1d ~3 L2 ~14 Main Memory ~240 > As current CPU's runs at the order of Teraflops (as the CPU clocks are at > gigahertz > order, and CPU's can handle multiple instructions per clock cycle), the a
Re: New DVB-Statistics API
Hi Julian, Let me add some corrections to your technical analysis. Julian Scheel wrote: > Hello together, > > after the last thread which asked about signal statistics details > degenerated into a discussion about the technical possibilites for > implementing an entirely new API, which lead to nothing so far, I wanted > to open a new thread to bring this forward. Maybe some more people can > give their votes for the different options > > Actually Manu did propose a new API for fetching enhanced statistics. It > uses new IOCTL to directly fetch the statistical data in one go from the > frontend. This propose was so far rejected by Mauro who wants to use the > S2API get/set calls instead. > > Now to give everyone a quick overview about the advantages and > disadvantages of both approaches I will try to summarize it up: > > 1st approach: Introduce new IOCTL > > Pros: > - Allows a quick fetch of the entire dataset, which ensures that: > 1. all values are fetched in one go (as long as possible) from the > frontend, so that they can be treated as one united and be valued in > conjunction > 2. the requested values arrive the caller in an almost constant > timeframe, as the ioctl is directly executed by the driver > - It does not interfere with the existing statistics API, which has to > be kept alive as it is in use for a long time already. (unifying it's > data is not the approach of this new API) > > Cons: > - Forces the application developers to interact with two APIs. The S2API > for tuning on the one hand and the Statistics API for reading signal > values on the other hand. > > 2nd approach: Set up S2API calls for reading statistics > > Pros: > - Continous unification of the linuxtv API, allowing all calls to be > made through one API. -> easy for application developers - Scaling values can be retrieved/negotiated (if we implement the set mode) before requesting the stats, using the same API; - API can be easily extended to support other statistics that may be needed by newer DTV standards. > > Cons: > - Due to the key/value pairs used for S2API getters the statistical > values can't be read as a unique block, so we loose the guarantee, that > all of the values can be treatend as one unit expressing the signals > state at a concrete time. You missed the point here. The proposal patch groups all S2API pairs into a _single_ call into the driver: > + for (i = 0; i < tvps->num; i++) > + need_get_ops += dtv_property_prepare_get_stats(fe, > + tvp + i, inode, file); > + > + if (!fe->dtv_property_cache.need_stats) { > + need_get_ops++; > + } else { > + if (fe->ops.get_stats) { > + err = fe->ops.get_stats(fe); > + if (err < 0) > + return err; > + } > + } The dtv_property_prepare_get_stats will generate a bitmap field (need_stats) that will describe all value pairs that userspace is expecting. After doing it, a single call is done to get_stats() callback. All the driver need to do is to fill all values at dtv_property_cache. If the driver fills more values than requested by the user, the extra values will simply be discarded. In order to reduce latency, the driver may opt to not read the register values for the data that aren't requested by the user, like I did on cx24123 driver. Those values will all be returned at the same time to userspace by a single copy_to_user() operation. > - Due to the general architecture of the S2API the delay between > requesting and receiving the actual data could become too big to allow > realtime interpretation of the data (as it is needed for applications > like satellite finders, etc.) Not true. As pointed at the previous answer, the difference between a new ioctl and S2API is basically the code at dtv_property_prepare_get_stats() and dtv_property_process_get(). This is a pure code that uses a continuous struct that will likely be at L3 cache, inside the CPU chip. So, this code will run really quickly. As current CPU's runs at the order of Teraflops (as the CPU clocks are at gigahertz order, and CPU's can handle multiple instructions per clock cycle), the added delay is in de order of nanosseconds. On the other hand, a simple read of a value from an i2c device is in the order of milisseconds, since I2C serial bus, speed is slow (typically operating at 100 Kbps). So, the delay is determined by the number of I2C calls you have at the code. With the new ioctl proposal, as you need to read all data from I2C (even the ones that userspace don't need), you'll have two situations: - if you use S2API call to request all data provided by ioctl approach, the delay will be the same; - if you use S2API call to request less stats, the S2API delay will be shorter. For example,
Re: New DVB(!!)-Statistics API
Am 08.12.09 10:16, schrieb Julian Scheel: Hello together, after the last thread which asked about signal statistics details degenerated into a discussion about the technical possibilites for implementing an entirely new API, which lead to nothing so far, I wanted to open a new thread to bring this forward. Maybe some more people can give their votes for the different options Actually Manu did propose a new API for fetching enhanced statistics. It uses new IOCTL to directly fetch the statistical data in one go from the frontend. This propose was so far rejected by Mauro who wants to use the S2API get/set calls instead. Now to give everyone a quick overview about the advantages and disadvantages of both approaches I will try to summarize it up: 1st approach: Introduce new IOCTL Pros: - Allows a quick fetch of the entire dataset, which ensures that: 1. all values are fetched in one go (as long as possible) from the frontend, so that they can be treated as one united and be valued in conjunction 2. the requested values arrive the caller in an almost constant timeframe, as the ioctl is directly executed by the driver - It does not interfere with the existing statistics API, which has to be kept alive as it is in use for a long time already. (unifying it's data is not the approach of this new API) Cons: - Forces the application developers to interact with two APIs. The S2API for tuning on the one hand and the Statistics API for reading signal values on the other hand. 2nd approach: Set up S2API calls for reading statistics Pros: - Continous unification of the linuxtv API, allowing all calls to be made through one API. -> easy for application developers Cons: - Due to the key/value pairs used for S2API getters the statistical values can't be read as a unique block, so we loose the guarantee, that all of the values can be treatend as one unit expressing the signals state at a concrete time. - Due to the general architecture of the S2API the delay between requesting and receiving the actual data could become too big to allow realtime interpretation of the data (as it is needed for applications like satellite finders, etc.) I hope that this summary is technically correct in all points, if not I'd be thankful for objective corrections. I am not going to articulate my personal opinion in this mail, but I will do it in another mail in reply to this one, so that this mail can be seen as a neutral summary of the current situation. So now it's everyones turn to think about the options and give an opinion. In the end I am quite sure that all of us would have great benefits of an improved statistics API. Cheers, Julian -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html The above mail of course has nothing to do with USB-Statistics. I must have been slightly confused when typing the message title! Sorry for that! -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html