> From: "John Scully" > To: <[email protected]> > Cc: <[email protected]>
I'm not sending this to [email protected] because I'm not subscribed to that list. > I wonder if Vernon Schryver at rhyolite could tie fuzzy OCR into the DCC > (distributed Checksum) project. ... Perhaps the easiest way to do that would be to pretend the OCR'ed mail messages were plain text to start with, and feed them to dccifd or dccproc. I'm not really suggesting that because I'm not sure whether it is a good idea on various non-technical grounds. If this OCR system is a product sold in an appliance or as a managed service, it would need to buy a commercial license to use the DCC programs. Besides the license issue on the source, it would simply be wrong to take and sell the CPU cycles, bandwidth, disk space, and, most important, the human system administration work of the public DCC server operators. > To give you an idea, our DCC server currently has these stats: The key > items - 22,057,457 checksums in memory, using a little over 1.1G of RAM. We > receive about 4,000 reports per minute from the network and send about 200 > per minute from emails we process. I currently recommend more than 3 GByte of RAM for a DCC server. 4 is not too much, but 8 GBytes probably are unless you are handling several tens of million mail messages/day. There are private systems that stuff up to 10 million mail messages per day through their DCC clients and servers. That's about 7000 messages/minute or 100 msgs/second. DCC servers do that with a special purpose hash table of a database. The public DCC servers handle requests from the perhaps 40,000 small, anonymous DCC client installations. Some of the public DCC servers handle more than 20 million requests per day. Each request involves accumulating a total number of receipients for 3 message checksums and sending the answer back to the DCC client. Vernon Schryver [EMAIL PROTECTED] _______________________________________________ DCC mailing list [email protected] http://www.rhyolite.com/mailman/listinfo/dcc
