best practices for finding duplicate chunks

Gerald Taylor Sun, 14 Aug 2005 13:45:10 -0700

I just revived a database that was in a version 3.23 server and moved itto a 4.1 There are big fields of TEXT based data. They have a way ofcompressing the amount of TEXT data by identifying common subchunks andputting them in a "subchunk" table and replacing them with a markerinside the main text that will pull in that subchunk whenever the parentchunk is requested. This subchunking seems to have been done kind ofad hoc, because I've noticed the database still has quite a bit ofduplicated chunks from one record to another. The client does not wantto buy another drive to store data (even tho he really should forother reasons anyway but who cares what I think) , so he wants itcompressed, and oh well I look on it as an opportunity for somehousecleaning. Now that we have 4.1 what is the best practice forautomated looking for common subchunks, factoring them out, and thenreplacing the original parent text with itself with the chunk cut outand a marker inserted. The hard part is finding them, ovbiously. Therest is easy.


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

best practices for finding duplicate chunks

Reply via email to