Also, munching fritos and looking at this, we could assume that any asset that is new to one avatar that was created by a different avatar is a high-probability candidate for being a duplicate and should be checked out.
That would capture a good chunk ( over 50%?) of duplicates without having to touch the renaming-or-making-a-copy processes. Again, this could be event-driven, or db-trigger-driven on INSERT, etc. (Or does MySQL not have transactions and not have on-insert triggers? I'm used to Oracle. ) Wade On Thu, Mar 8, 2012 at 8:06 PM, Wade Schuette <[email protected]>wrote: > Justin, > > I have to respectfully agree with Cory. > > Wouldn't something like the following address your valid concerns about > complexity and reducing total load as well as perceived system response > time to both filing and retrieving assets? > > First, if you use event-driven processes, there's no reason to rescan the > entire database, and by separating the processes into distinct streams, > they are decoupled which is actually a good thing and simplifies both > sides. There's no reason I can see they need to be coupled, and > separating them allows them to be optimized and tested separately, which is > a good thing. > > In fact, the entire deduplication process could run overnight at a > low-load time, which is even better, or have multiple "worker" processes > assisgned to it, if it's taking too long. Seems very flexible. > > I'm assuming that a hash-code isn't unique, but just specifies the bucket > into which this item can be categorized. > > When a new asset arrives, if the hash-code already exists, put the > unique-ID in a pipe and finish filing it and move on. If the hash-code > doesn't already exist, just file it and move on. > > At the other end of the pipe, this wakes up a process that can, as time > allows, check in the background to see if not only the hash-code is the > same, but the entire item is the same, and if so, change the handle to > point to the existing copy. ( For all I know, this can be done in one > step if CRC codes are sufficiently unique, but computing such a code is cpu > intensive unless you can do it in hardware.) > > Of course, now the question arises of what happens when the original > person DELETES the shared item. If you have solid database integrity, you > only need to know how many pointers to it exist, and if someone deletes > "their copy", you decrease the count by one, and when the count gets to > one, the next delete can actually delete the entry. > > > > Wade > > > > > > On 3/8/12 7:41 PM, Justin Clark-Casey wrote: > >> On 08/03/12 22:00, Rory Slegtenhorst wrote: >> >>> @Justin >>> Can't we do the data de-duplication on a database level? Eg find the >>> duplicates and just get rid of them on a regular >>> interval (cron)? >>> >> >> This would be enormously intricate. Not only would you have to keep >> rescanning the entire asset db but it adds another moving part to an >> already complex system. >> >> > -- R. Wade Schuette, CDP, MBA, MPH 698 Monterey Ave Morro Bay CA 93442 cell: 1 (734) 635-0508 fax: 1 (734) 864-0318 [email protected]
_______________________________________________ Opensim-dev mailing list [email protected] https://lists.berlios.de/mailman/listinfo/opensim-dev
