Hi Kasper,

files come in all kinds, and zipping them may not be the best option 
depending on the use-case (for already compressed ones for example).
Maybe you are referring to text files? In that case, zipping them would 
save quite a bit of disk space indeed.

> FilesPipeline does not allow me to provide my own files store, unless I 
hack away on the scrapy code itself
> Are you interested in a patch that allows the FilesPipeline to accept 
custom store schemes? 

As far as I know, this is possible.
I would second Lhassan here, on using a subclassed FilesPipeline, with a 
custom STORE_SCHEME referencing your store.
Have you tried this?
Or are you having trouble getting this to work? Maybe the documentation is 
a bit daunting? in which case we can improve it.

Regards,
Paul.

On Thursday, August 18, 2016 at 3:28:49 PM UTC+2, Kasper Marstal wrote:
>
> Hi Lhassan
>
> Okay, thank your for your reply.
>
> Kasper
>
> On Thursday, August 18, 2016 at 2:50:18 PM UTC+2, Lhassan Baazzi wrote:
>>
>> Hi,
>>
>> If you are going with the zip option, just create you own pipeline that 
>> extend the base file pipeline and publish it as a package on Github, if 
>> someone else needed and use it.
>>
>> Best Regards.
>> Lhassan
>> Le 18 août 2016 13:16, "Kasper Marstal" <[email protected]> a écrit :
>>
>>> Hi all,
>>>
>>> I am scraping a couple of million documents and need to save space on my 
>>> disk to store the data. An attractive option is to save the files directly 
>>> to a ZIP file since the compression ratio is really good with this kind of 
>>> data (~18). However, the FilesPipeline does not allow me to provide my own 
>>> files store, unless I hack away on the scrapy code itself, which I would 
>>> like to avoid. So, a couple of questions for the scrapy developers:
>>>
>>> - Are you interested in a patch that allows the FilesPipeline to accept 
>>> custom store schemes? OR
>>> - Are you interested in a patch with a ZipFilesStore? In addition, 
>>> - Is this ZIP-file approach a common way of dealing with large amounts 
>>> of data, or do you have best-practices on this subject that I am not aware 
>>> of? 
>>>
>>> Kind Regards,
>>> Kasper Marstal
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to