#252: BibDocFile: Allow using custom HTTP headers to retrieve URLs
--------------------------+-------------------------------------------------
Reporter: bthiell | Owner: bthiell
Type: enhancement | Status: assigned
Priority: major | Milestone:
Component: WebSubmit | Version:
Resolution: | Keywords: headers
--------------------------+-------------------------------------------------
Comment (by simko):
1) BibDocFile is mostly seen as an internal-file-manipulating library,
so while this functionality would be indeed useful, we should better
delimit it name-wise. By "accessing" external files, do you mean (i)
accessing for uploading or (ii) accessing for indexing?
* In the former case, the external file getting library is presented
as part of the upload process, hence BibUpload. The variable would
be living alongside or merged with
CFG_BIBUPLOAD_FFT_ALLOWED_LOCAL_PATHS.
* In the latter case, the external file indexing library is presented
as part of the indexing process, hence BibIndex. The variable
would be living alongside or replace
CFG_BIBINDEX_FULLTEXT_INDEX_LOCAL_FILES_ONLY.
So, depending on the needs, we could invent possibly two variables of
the kind you propose, since they may serve two independent purposes.
2) Such a dictionary could instruct Invenio which external files to
upload and which not, by presence/absence of an ending catch-all
wildcard stance:
{{{
CFG_BIBUPLOAD_FFT_ALLOWED_EXTERNAL_URLS = {
'http://myurl.com/*': {'User-Agent': 'Me'},
'http://yoururl.com/*': '{'User-Agent': 'You'},
'http://*': {'User-Agent': 'invenio-crawler'},
]
}}}
which would permit replacing some of the existing CFG variables
mentioned above.
(BTW this is kind of similar to, but more complete than,
CFG_BIBINDEX_PERFORM_OCR_ON_DOCNAMES.)
--
Ticket URL: <http://invenio-software.org/ticket/252#comment:3>
Invenio <http://invenio-software.org>