Thanks for that info. The "box cars" analogy helps with describing the issue.
Yes, we are very, very guilty of sending out trains with just 1 box. The direct memory API was essentially a patch to our code base to correct that issue, especially for large memory block transfers. Our current target code base was developed, literally, decades ago for ISA and PCI bus probes where this was not an issue. And our first USB based probes ran our entire software stack on the probe's CPU, so it still wasn't much of an issue. The XDS110 firmware design is constrained that the target code always wants a response to each and every request limiting our ability to send multiple "boxes" on each "train." The memory APIs are to help with this issue filling the "boxes" with memory data up or down the pipe. So if I understand correctly, the target code in OpenOCD optimizes this by queueing up DAP register accesses and then calls the execute queue command to run them all at once. I can add APIs to the XDS110 firmware to enable that support to help speed things up. That should be sufficient and be fully compatible with OpenOCD. In the meantime, though I would still like to issue a patch to add support for the XDS110 with the current firmware. Even after adding new APIs, I think we still need to support users that haven't updated their XDS110s yet but issue them a warning that they could get better performance if they get the firmware update. Does that sound reasonable? Edward -----Original Message----- From: Duane Ellis [mailto:open...@duaneellis.com] Sent: Sunday, June 04, 2017 11:52 AM To: Fewell, Edward Cc: Andreas Fritiofson; openocd-devel@lists.sourceforge.net Subject: Re: [OpenOCD-devel] Debug probe hardware that can read/write target memory directly ed>> Keep in mind this is a full speed device, not high speed. Cutting down on the number of USB packets needed does help considerably. Full vrs High speed is not important here. Better: draw a timeline of the USB traffic of actual usb traffic captured on the wire and work on fixing that. I am not talking about one of those “software sniffers” - these do not capture the NAKs and other bus traffic. you need a capture from a real hardware sniffer (TotalPhase Beagle, Ellisys, or CatC USB Chief) In theory, a USB 1.1 device transfers at 12mBIT - or 1.5mByte/second, but USB has an over you only get about 2/3 of the total data rate, or about 1mByte per second. But how do you get that? that’s the key. Lets’ assume that the command/data over head is another 2/3 = so - 600Kbytes/second would be unbelievable. Nothing is currently that fast. I like to describe the problem as train “box cars”, periodically USB sends a packet (think of the packet as a train box car) the true limitation is not bit rate, but packet rate, aka: Box cars per second. If you send each box car out the door with 1 box - leaving room for 63 other boxes - you efficiency is horrible. Second, if you send a command and that command requires a reply, these two transfers often become very inefficient usb transfers (ie: the box cars are mostly empty) Third, if there is any delay in preparing the *next* packet, your time slot for that packet is LOST, effectively you just pushed a empty box car down the train track. This is why I say: Draw a time line of actual USB traffic, as captured by an “on-the-wire” USB protocol analyzer You need to include the NAKs in that time line graph and work on getting rid of those. You also need to include the SOF counts with dead time between them Each one of those is a missed opportunity to send data, effectively a box car with zero boxes just rolled down the track. -Duane. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ OpenOCD-devel mailing list OpenOCD-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openocd-devel