dads <wayne.dads.b...@gmail.com> writes: > I've been tidying up the archived xml and have been thinking what's > the best way to approach this issue as it took a long time to deal > with big quantities of xml. If you have 5/6 years worth of 26000+ > 5-20k xml files per year. The archived stuff is zipped but what is > better, 26000 files in one big zip file, 26000 files in one big zip > file but in folders for months and days, or zip files in zip files!
If I'm reading that properly, you have 5-6 years worth of files, 26000 files per year, 5-20k bytes per file? At 10k bytes/file that's about 1.3GB which isn't all that much data by today's standards. > Generally the requests are less than 3 months old so that got me into > thinking should I create a script that finds all the file names and > corresponding web number of old xml and bungs them into a db table one > for each year and another script that after everyday archives the xml > and after 3months zip it up, bungs info into table etc. Sorry for the > ramble I just want other peoples opinions on the matter. =) Extract all the files and put them into some kind of indexed database or search engine. I've used solr (http://lucene.apache.org/solr) for this purpose and while it has limitations, it's fairly easy to set up and use for basic purposes. -- http://mail.python.org/mailman/listinfo/python-list