On Mon, Jan 18, 2010 at 05:46:34AM +0100, Peter Stuge wrote: > sascha wrote: > > the test values are not correct. the seeker utility used > > the current number of seconds as the seed for rand() and so the > > processes used the same random offsets. the number is 10000 > > seeks/second on all 16 usb drives when reading from /dev/sd*, > > Nice! Is that with 512 byte read between seeks? If not, it's a lot > less than 1/10 of high-speed USB performance. If with reads it might > improve still if there were more devices. (But each USB can only have > 127 including hubs.) The USB interface chips in the memory sticks > could be the limiting factor for you.
the block size is 512. smaller sizes bring no benefit with the current access methods tested (mmap, non blocking io, normal io), 512 is also the smallest possible transfer size of files opened with O_DIRECT. A 790Mchains table uses 75 bytes in a index region on average which means that either less ram would be needed for the index (coalescing 4 adjacent regions and loading only 25% of the index values into ram) or that the start values and end values can be combined in a single file and no extra access is needed if a match is found. > > The one transfer per frame limit applies only to interrupt USB > transfers, and USB storage devices use either a control/bulk/interrupt > protocol, or a bulk-only protocol. I discovered that only floppy > drives use the CBI interface. Many bulk transfers can go in one > frame if the host controller implements it. One SCSI command transfer > and one response transfer is needed per seek. Max 512 bytes data per > packet but SCSI protocol overhead means that each data block will > always be two data packets (=more overhead). Since USB concurrent access does not slow down the transfers compared to a single usb device, either more than one USB transaction is in flight on the bus or the storage device transaction is split into multiple usb transactions. i guess. > > 512 bytes is fairly large data blocks so the overhead could perhaps > be an easy tradeoff for the genericness and availability of memory > sticks. > > How many instructions are needed to process each block before the > next seek? 10? 100? 1000? Can processing be done in parallel also? the positions to which to seek come out of the chain generator at a rate between 10k and 20k. > > > > 800 when reading from a 64GB LVM2 logical volume. And then 6000 > > when seeking in the files on 3 partially filled LVM2 volumes. The > > reason for the last 2 timings is not yet clear to me. > > Interactions between the different layers combined with scheduling > would be my guess. > The aswer to this problem was that on a raid of 4 devices, you need 4 threads and you have to make sure that your random numbers are sorted in a way that you access the devices in a round robin fashion. LVM is also no bottleneck. The timing on the LV device file: msecs seeks/s usec/seek 14066 710 1406 14321 698 1432 14291 699 1429 14320 698 1432 14322 2792 358 A file on the filesystem: msecs seeks/s usec/seek 14627 683 1462 14646 682 1464 14967 668 1496 14590 685 1459 14978 2670 374 for the SSD: msecs seeks/s usec/seek 2886 3464 288 2886 3464 288 the last line is always the summary for the threads above. The SSD is 32gbyte, the raid is 64gbyte, so for the usb setup to be faster you would have to use 4GB sticks (which are also cheaper than 16gbyte) and and 3 host controllers to access the 256 devices + hubs. cost factor: 2Eur/GB for 32gbyte sized SSD 1,2Eur/GB for 4gbyte sized usb flash whether 256 devices are even possible in reality is another question. 64 devices per bus would already saturate 32mbyte/s usb speed. (512 * 1000 bytes per device per second) _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
