On Tuesday 21 February 2006 15:44, Rudolf Cejka wrote:
> Kern Sibbald wrote (2006/02/21):
> > > One thread would read data and compute md5, the second thread would
> > > just try to write data to the tape.
> >
> > This is already the case. Reading the data and computing the md5 sum is
> > done in the FD, writing the data is done in the SD. Not only are they
> > separate threads, but they are separate processes.
>
> Oops, I'm sorry - I wanted to say crc32 instead of md5, my fault.
> I tried to measure time for read(), time for computing crc32 and
> time for write() and computing crc32 was significant time delay
> between reading and writing within speed about 80 MB/s on
> Xeon 3.6 GHz.

Ah, yes, the crc32 is computed for each block by the SD just before writing 
the block to the tape. Interesting that it slows things down. It sounds like 
it is worth looking into.

>
> However now I'm again looking into sources and could find it anymore,
> if there is secured full data stream by crc32, or just something or
> nothing anymore. I'll look at it once more.

Unless there is a bug, the crc32 is still computed and checked when reading 
back each block.

>
> The other point was in that one thread may just read and the second may
> just write with some memory buffering and synchronization, which can
> (or yes - can not too) help in feeding data to the tape. This is inspired
> by star (ftp://ftp.berlios.de/pub/star/), which helped us with performance
> tunning serveral times in the past.

Yes, I have always planned to have a sort of circular pool of buffers that 
would be filled by the job threads by reading the data from the FD, then 
handed off to a single device that actually writes the buffer to the Volume.  
This would probably speed things up a good deal.

>
> > There are *many* ways to speed up the performance, both external to
> > Bacula and internal to Bacula. For the moment, internal changes will have
> > to wait for some contributor because I am still adding essential
> > features.
>
> Yes - fully agree. I meant it mainly as that it is good to have
> it widely known...
>
> > Maybe I am not understanding you correctly, but the above is not strictly
> > true. Depending on how the user configures the SD, there can be one
> > spooling area per device.
>
> I more meant "one spooling area per thread" - if I have allowed maximum N
> parallel flows storing to bacula-sd, there would be interesting to have
> 1 to N spool areas, where if despooling is in an action, particular
> spool area would have an option, that spooling can be paused during this
> time, so that spool area is reaserved just for reading or just for writing.

Yes, perhaps this would help, but it seems to me a lot of complexity that will 
only benefit users with separate spool disks.  Implementing a separate write 
thread as discussed above would *probably* give a good performance boost at 
low programming cost -- I say *probably* because performance enhancements can 
be tricky to the point of being counter productive if one is not careful to 
identify the real bottlenecks before changing things ...

>
> > This is not generally a problem worth changing since disks
> > tend tend to be significantly faster than tapes ...
>
> The situation is changing since new tape drives have native speed
> above 50 MB/s, whereas disks are still beaten by moves of heads
> measured in miliseconds, so the real reading speed from disks
> is not changing very much over time.
>
> > If one really wants tapes to write faster, someone might come up with a
> > patch to the Bacula SD tape driver that uses overlapped I/O (I think this
> > is sometimes called asynchronous I/O).
>
> I thouhgt about it too, however I not sure how much it would help.
> I do not belive in OS, SW and HW so much in this way.

After thinking about it a bit more, an asynchronous write to a tape is a bad 
idea and is better handled by the separate write thread.  However, to get 
more data across a network, asynchronous socket writes, or using a multiple 
value write to reduce the current two socket writes per data item to one, 
could significantly increase the data rate across a fast network ...


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to