[linrad] Re: Network speed problems.
Is there something wrong in what I do? Does Windows behave the same way? On Linux, try this: - if it's not installed already, install the netcat tool through your distribution's package system - make sure your multicast routing is set up properly - do: dd if=/dev/zero bs=1k count=100k | nc -u -q 1 239.255.0.16 1234 (on some distributions the nc command is called netcat, with the same syntax) This sends 100MB of zeroes to the multicast address. On my PII/350 this gives: 102400+0 records in 102400+0 records out 104857600 bytes (105 MB) copied, 8.88197 seconds, 11.8 MB/s ...which is as close to the wire speed as one can expect. I cannot test this on my PI/166, since it has no Fast Ethernet interface. Note that this test may give false negatives. The netcat tool is not optimized for speed (or processor loading), but if your computer can get >11MB/sec with netcat, you can expect linrad to be able to get the same amount of bandwidth. The opposite may not be true. JD 'insomnia' B. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: Network speed problems.
Presumably this is well known, I an just a newcomer to networking and I could not guess it would behave this way. It's not well known; with modern hardware (P3 and newer) you should easily be able to saturate a 100Base-TX network (ie get full bandwidth). Why else would all modern machines come with 1Gb networking cards ? Not all network hardware is created equal; some Ethernet chips have trouble running at full speed (I seem to recall older RTL8xxx parts, but it's been a while. Is there some settings I should change? One possibillity seems to be to open several sockets on different ports simultaneously to increase the throughput. The kernel socket layer should never be the bottleneck on non-ancient hardware. Is there something wrong in what I do? Do you have a short piece of code that shows this behaviour ? A standalone program would be best. To be able to run at 96 khz with fft1 transfer it seems I will have to use at least 6 sockets in parallel with different port numbers. That should never happen. Trying to find some hints on the Internet I came across this: http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IBMp690/IBM/usr/share/man/info/en_US/a_doc_lib/aixbman/prftungd/2365c93.htm It seems to indicate that I should send much larger packets??? (a multiple of 4096 bytes 'header' included) That's for a completely different operating system, and has nothing to do with the way Linux works. JDB. -- Riddoch's Myth of computing: Any computer problem is invariably the fault of the closest sysadmin. # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: Network standards for SDR
>As for the > ReceiveData() function, that line can be directly replaced with your recvfrom() call (I just was too lazy too look up recvfrom() when I wrote that example). Well, if you care to write the full code you will find that this statement is not quite what it looks like. It's not much different. This is what a modified version of your program looks like: typedef struct { short header_len; // This field is always present, and always first short data_len;// This field is always present, and always second [ header contents, including header version, type, etc. goes here ] char data[NET_MULTICAST_PAYLOAD]; } NET_RX_STRUCT; NET_RX_STRUCT msg; rxin_char=(void*)(&timf1_char[timf1p_pa]); timf1p_pa=(timf1p_pa+ad_read_bytes)&timf1_bytemask; for(j=0; j memcpy(&rxin_char[j], ((void *)&msg) + msg.header_len, NET_MULTICAST_PAYLOAD); } For the time being I kept the data size constant, to not mix the issues (variable data size adds 5-6 lines). And yes, there is a memcpy. But read below... By the way, there appears to be an inconsistency in your program. Here: timf1p_pa=(timf1p_pa+ad_read_bytes)&timf1_bytemask; you make sure that the address pointer for the circular buffer wraps around, but I see no such protection in the for() loop. Or am I missing something ? > even if it did, it matters very little on modern CPUs (packets this size will remain entirely within the CPU cache). Linrad is intended to run on elderly computers and it is also intended to run at much higher bandwidths on modern ones. You suggest that the data is put into a buffer to which a pointer is returned by ReceiveData. The next step would be to store the payload into a circular buffer. This will cause the data to become written into memory twice. An extra memcpy makes little difference in this loop. Note that the recvfrom() needs to do the equivalent of a memcpy() anyway. I wrote a little test program (see the bottom of this mail) to test the speed difference between 1 and 2 copy instructions if the destination buffer is larger than the cache. On a Pentium MMX 166MHz, a Thinkpad laptop with X running, I get: Single copy: 1000 loops in 129.44 seconds, or 79.11 MiBps. Double copy: 1000 loops in 147.21 seconds, or 69.56 MiBps. The first copy takes about two cycles per byte, adding a second copy adds less than 0.3 cycles per byte. On a Pentium II 350MHz (rescued from the garbage a month ago): Single copy: 1000 loops in 55.76 seconds, or 183.64 MiBps. Double copy: 1000 loops in 66.36 seconds, or 154.30 MiBps. The ratio is similar: first copy just under 2 cycles/byte, second copy adds 0.36 cycles/byte. Your scenario will likely be even closer, since the kernel will need to read the UDP datagrams from main memory, too. Processing of the data is in another thread that require hundreds of packages in the circular buffer. It will fetch its input from memory because other threads have been using the cash in the meantime. Is there any way at all that you can avoid that, and process the data as it comes in ? My first big multi-threaded program was a real-time streaming video encoder for a quad Pentium Pro machine, and switching processing from a frame at a time to a macroblock (16x16pixels) at a time sped the encoder up tremendously, even though the required number of operations almost doubled. > Zero-copy architectures make sense for hi-speed packet switching on slow computers; as soon as you add any processing on the data, that single extra copy gets lost in the noise. Cache line/block alignment is much more important for performance. Actually this is not in agreement with my observations. It does depend on how efficcien "processing" is done. In some cases, yes. I've re-written a fixed-point FFT for ARM so that reading the first word would trigger the loading of a full cache line, so that the FFT would never have to wait for its data. But even that would get lost in the noise once you actually started processing the data. The most demanding task is the full bandwidth, full dynamic range FFT. It would be identical in all computers and it does not make any sense to do it in more than one computer. Why ? Because this one computer would be much faster than the others ? > Do you want to have an exact, synchronized display on multiple machines ? This would be the case also if raw data were used. [snip] The "innocent" slave does not have to know that a data stream is "cooked". It can be processed as if it were raw data, but a clever slave can make use of complex information that it might want to as for. If you want to compute the noise floor power density you want to know what percentage of samples that were blanked out because of noise pulses for example. Normally one would not care at all. So would it be correct to say that: (a) if all comp
[linrad] Re: Network standards for SDR
> This is still easy to parse, since all a user needs to do is something like struct NET_RX_STRUCT *rx_packet; char *my_data; short i, my_data_len; rx_packet = ReceiveData(); my_data = ((char *) rx_packet) + rx_packet->header_len; for(i = 0; i < rx_packet->data_len; i++) DoSomethingWithMyData(my_data[i]); Yes, but but these modern ways of writing scares off all my friends who can use old-fashioned C but not C++. First of all ReceiveData() has to be written, separate buffers of size NET_RX_STRUCT have to be allocated and managed etc. I do not currently have such code and I suspect it involves needless copy operations. I am looking for bandwidths of 2 MHz and above (for VHF noise blanking to remove static rain) so needless copy - probably up and down to main memory is something I want to avoid. I've never written a single line of C++ in my life. As for the ReceiveData() function, that line can be directly replaced with your recvfrom() call (I just was too lazy too look up recvfrom() when I wrote that example). I don't see that having a header in front of the package needs more copy calls than one after the package; even if it did, it matters very little on modern CPUs (packets this size will remain entirely within the CPU cache). Zero-copy architectures make sense for hi-speed packet switching on slow computers; as soon as you add any processing on the data, that single extra copy gets lost in the noise. Cache line/block alignment is much more important for performance. > This is enough for basic decoding of any stream, no ? Even the center frequency can be seen as superfluous (since it's only for display and not strictly needed for decoding). The primary usage of the Linrad network was for the second operator in a contest station. It is an obvious advantage that the display is always correct - particularly if several bands are monitored simultaneously. I cannot imagine what systems would evolve over the coming five years that couldn't fit in this framework. Linrad can also send data in the frequency domain and there is quite a lot of info that a slave will need. Admittedly those formats are likely to be used by Linrad only but they carry many more complications. OK, I see. I was assuming that the multicast connection would be used for distributing raw data only. Re-reading your earlier posts it looks like you want to be able to send both raw *and* cooked (processed) data. Why ? Do you expect the slaves to be much slower than the master ? Do you want to have an exact, synchronized display on multiple machines ? Looking at other network protocols (especially streaming), it has historically been a bad idea to combine multiple modes into one protocol, for reasons of maintainability, performance and clarity. Linrad is your code, and it's completely up to you, but might I suggest you consider either splitting the transmission modes in raw and cooked, or (better) multicasting only raw, unprocessed data and sending all filter parameters over a separate channel ? > [about an ADC-to-Ethernet] Probably it would be better to connect it to a socket on a dedicated ethernet port on one computer, the one which has the controls for the radio hardware connected to this soundcard. The master wants 100% reliable data because I assume you do not want to put the master system clock on the audio-to-Ethernet converter. Why not ? It has a GPS-controlled OCXO to synchronize all sampling clocks and to keep time, isn't that sufficient ? Are you aware of any standard format for streaming unprocessed audio data? At my previous job we had a few, but they were paper-only. There is MADI (digital audio) and SMPTE-259M (digital video) over ATM, but that doesn't quite apply here. I believe the AES have some, but those are for-pay documents, and I'm not an AES member anymore. What data format were you contemplating before this discussion started? Pretty much what I described above: header with header length, data length, number of channels, sample size, sample rate, timestamp and a few descriptive fields (with a version field, so that -- if truly necessary -- upgrades are possible). Nothing that isn't strictly required: less is more. It's what everybody else does. I know that that's not much of an argument ('50 million Elvis fans can't be wrong'), but in 15 years of working on network protocols, this is pretty much the only way that I've seen working reliably for successful sampled AV or radio projects (I've seen a similar system used on an antenna array for MIMO trials). Conversely, I have never ever seen a combined raw/cooked protocol that worked, or better: that remained working. Or it evolved into something like WAV: a historical accident that everyone loves to hate. JDB. -- In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away. --
[linrad] Re: Network standards for SDR
Leif and all, Would you agree on milliseconds since midnight? From JDB I learned that a double with seconds since Unix epoch would be a bad idea since conversion may be difficult on non-PC platforms. (It is the internal time format within Linrad however) Yes, milliseconds since UTC would be OK. Maybe you should send BOTH this quantity AND a double with seconds since Unix epoch (which I would actually prefer). I don't see the conversion issue as a big deal; little-endian to big-endian copnversion is trivial, and doesn't nearly everybody use IEEE floating point these days? Pretty hard to fit in a 256-cell CPLD. JDB. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: Network standards for SDR
The newcomer who wants to write his own software does not have to know anything about the header, he can just use the 1024 bytes of data and ignore whatever has been appended. Having a header which has to be properly decoded in order to extract the data builds a threshold that makes it more difficult to get started. (Processing simple .wav files has a pretty high threshold in decoding the header. Common practice between amateurs has been to just dicard the header and read the actual data using the information supplied with the file. Those who worked with the UNKN422 challenge were adviced to do so for example.) WAV is an example of a file format where *everyone* added their own custom headers/chunks, without any planning. As a result, no program can read all existing WAV files; WAV is considered an example how *not* to do a file format. Would you consider a very simple, easy to parse header like this: typedef struct { short header_len; // This field is always present, and always first short data_len;// This field is always present, and always second [ header contents, including header version, type, etc. goes here ] char data[]; } NET_RX_STRUCT; This is still easy to parse, since all a user needs to do is something like struct NET_RX_STRUCT *rx_packet; char *my_data; short i, my_data_len; rx_packet = ReceiveData(); my_data = ((char *) rx_packet) + rx_packet->header_len; for(i = 0; i < rx_packet->data_len; i++) DoSomethingWithMyData(my_data[i]); > That, too, makes it harder for dedicated hardware receivers; ideally these would not need _any_ communication from the slave to the master. As I see it, encoding this information in the header of each package is a low-overhead way to reduce ambiguity, too. The problem is that there are so many possibillities. I do not want to invent a complicated scheme for describing the myriad of things I can imagine now only to discover in a few years that something entirely different has evolved. I would suggest keeping it extremely simple. There is not very much information that varies between sampled systems: - sample size - sample rate - number of channels (could even be fixed to 'always I/Q') and, for radio systems, - center frequency This is enough for basic decoding of any stream, no ? Even the center frequency can be seen as superfluous (since it's only for display and not strictly needed for decoding). I cannot imagine what systems would evolve over the coming five years that couldn't fit in this framework. At 17:47 +0100 04-01-2007, Leif Asbrink wrote (in another mail): Would you agree on milliseconds since midnight? From JDB I learned that a double with seconds since Unix epoch would be a bad idea since conversion may be difficult on non-PC platforms. (It is the internal time format within Linrad however) I would use the same interface that gettimeofday() uses: a long with seconds since the Epoch (Jan 1 1970), and a long with microseconds. The formats I intend to use within Linrad will use IA32 little endian (as well as IA32 float) I have no intention to make Linrad portable to other platforms and I am pretty sure I will not change my mind on this point for the next 5 years or more. Probably never. OK, that's fine, so please document this somewhere so those of us on non-IA32 can deal with it. As an example: I'm currently soldering the prototype of an audio-to-Ethernet converter as part of a portable hard disk recorder. This design uses the CS5381 ADC, one of the best professional audio converters on the market with a dynamic range approaching 120dB. This is an open-hardware system[1], and with a few modifications I could see it being usable for Linrad. A lot of the limitations (time jitter on the system clock etc) that are present on a PC platform simply do not appear for such a dedicated device. How would you like me to interface such a system to Linrad ? Should it be able to act as a Linrad master ? JDB [1] Converter schematics are here: http://www.lartmaker.nl/recbox-adc-cs5381-main.png http://www.lartmaker.nl/recbox-adc-cs5381-power.png http://www.lartmaker.nl/recbox-adc-cs5381.pdf -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: Network standards for SDR
A general point: in virtually all communications protocols the (descriptive) header comes before the data block, since the receiver usually needs to decode the header to be sure what to do with the data. This also makes it possible to vary the length of the data block, if desired (for instance, to tune to FFT block sizes or sampling hardware word length). > - I would want a timestamp in there somewhere. It might be derived from block_no, but why not make it explicit ? I do not see what it would be good for. Why do you want the clock from the master while there is another one in the slave? Array processing. It would be very useful for a situation where you have multiple masters on one network (either during a contest, or -in my case- with a few servers each connected to an antenna+receiver). Time sync is not hard over either GPS/TAC or ntp. Even in one-master situations it could be useful: with timestamps, it is very easy to make something similar to the Time Machine. > - how is the sampling rate communicated ? The slave(client) asks the server for the meaning of the data. Number of channels, nominal sampling rate, whether the format is real or complex etc. That, too, makes it harder for dedicated hardware receivers; ideally these would not need _any_ communication from the slave to the master. As I see it, encoding this information in the header of each package is a low-overhead way to reduce ambiguity, too. > - if you are not doing so already, please please _please_ use the functions htons() / ntohs() and friends to convert between host byte order and network byte order (or forever determine that linrad communicates with either little endian (IA32) or big endian (Alpha, PowerPC etc) byte order. I would want to be able to use a PC as the server and my PowerBook as the client, for instance. I do not see how it matters. Linrad does not put port numbers or addresses in the packages, that is done by the operating system and the inner workings of Linrad is not visible from the network. Byte ordering is not restricted to port numbers or addresses. Every time you put an integer which is larger than one byte into a packet, the transmitter and receiver need to agree on the byte order. See http://en.wikipedia.org/wiki/Endianness for details. Taking my example, if the master runs on an Intel machine and the slave on my PowerBook, if the master transmits a block_no of 0x01020304, my PowerBook will see that as 0x04030201. Not good. JDB. -- Years from now, if you are doing something quick and dirty, you imagine that I am looking over your shoulder and say to yourself, "Dijkstra would not like this," well that would be immortality for me. -- Edsger Dijkstra, 1930 - 2002 # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: Network standards for SDR
// Structure for multicasting receive data on the network. #define NET_MULTICAST_PAYLOAD 1024 typedef struct { char buf[NET_MULTICAST_PAYLOAD]; double passband_center; float userx_freq; unsigned int block_no; unsigned char userx_no; char passband_direction; } NET_RX_STRUCT; Very interesting ! A couple of observations: - I would want a timestamp in there somewhere. It might be derived from block_no, but why not make it explicit ? - how is the sampling rate communicated ? - using float/double makes it much harder for dedicated hardware receivers to act as server. - if you are not doing so already, please please _please_ use the functions htons() / ntohs() and friends to convert between host byte order and network byte order (or forever determine that linrad communicates with either little endian (IA32) or big endian (Alpha, PowerPC etc) byte order. I would want to be able to use a PC as the server and my PowerBook as the client, for instance. JDB. -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: complete answer from Roger
> - it is just like opening a "regular" socket with agroup target address. See > http://www.cs.unc.edu/~jeffay/dirt/FAQ/comp249-001-F99/mcast-socket.html or the bible of Stevens My problem is that this is far to cryptic. I could spend a lot of time searching the net, but maybe someone can point me to something a little more novice oriented. I do not have "the bible of Stevens". A little Googling produces: http://jungla.dit.upm.es/~jmseyas/linux/mcast.lj/mcast-lj.html ...the original of which appears to be here: http://www.linuxjournal.com/article/3041 JDB [ta-ta-ta-talking to myself] -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>
[linrad] Re: complete answer from Roger
> - it is just like opening a "regular" socket with agroup target address. See > http://www.cs.unc.edu/~jeffay/dirt/FAQ/comp249-001-F99/mcast-socket.html or the bible of Stevens My problem is that this is far to cryptic. I could spend a lot of time searching the net, but maybe someone can point me to something a little more novice oriented. I do not have "the bible of Stevens". A little Googling produces: http://jungla.dit.upm.es/~jmseyas/linux/mcast.lj/mcast-lj.html http://www.linuxjunkies.org/html/Multicast-HOWTO.html#s6 http://www.wlug.org.nz/SourceSpecificMulticastExample (easiest one first) HTH, JDB [looking into bolting Ethernet+multicasting onto an audio ADC] -- LART. 250 MIPS under one Watt. Free hardware design files. http://www.lartmaker.nl/ # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: <[EMAIL PROTECTED]> To switch to the DIGEST mode, E-mail to <[EMAIL PROTECTED]> To switch to the INDEX mode, E-mail to <[EMAIL PROTECTED]> Send administrative queries to <[EMAIL PROTECTED]>