Hi Kasper, files come in all kinds, and zipping them may not be the best option depending on the use-case (for already compressed ones for example). Maybe you are referring to text files? In that case, zipping them would save quite a bit of disk space indeed.
> FilesPipeline does not allow me to provide my own files store, unless I hack away on the scrapy code itself > Are you interested in a patch that allows the FilesPipeline to accept custom store schemes? As far as I know, this is possible. I would second Lhassan here, on using a subclassed FilesPipeline, with a custom STORE_SCHEME referencing your store. Have you tried this? Or are you having trouble getting this to work? Maybe the documentation is a bit daunting? in which case we can improve it. Regards, Paul. On Thursday, August 18, 2016 at 3:28:49 PM UTC+2, Kasper Marstal wrote: > > Hi Lhassan > > Okay, thank your for your reply. > > Kasper > > On Thursday, August 18, 2016 at 2:50:18 PM UTC+2, Lhassan Baazzi wrote: >> >> Hi, >> >> If you are going with the zip option, just create you own pipeline that >> extend the base file pipeline and publish it as a package on Github, if >> someone else needed and use it. >> >> Best Regards. >> Lhassan >> Le 18 août 2016 13:16, "Kasper Marstal" <[email protected]> a écrit : >> >>> Hi all, >>> >>> I am scraping a couple of million documents and need to save space on my >>> disk to store the data. An attractive option is to save the files directly >>> to a ZIP file since the compression ratio is really good with this kind of >>> data (~18). However, the FilesPipeline does not allow me to provide my own >>> files store, unless I hack away on the scrapy code itself, which I would >>> like to avoid. So, a couple of questions for the scrapy developers: >>> >>> - Are you interested in a patch that allows the FilesPipeline to accept >>> custom store schemes? OR >>> - Are you interested in a patch with a ZipFilesStore? In addition, >>> - Is this ZIP-file approach a common way of dealing with large amounts >>> of data, or do you have best-practices on this subject that I am not aware >>> of? >>> >>> Kind Regards, >>> Kasper Marstal >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
