Dear Devin,
Il giorno mar, 15/06/2010 alle 16.56 +0200, Devin Bougie ha scritto:
> When we upload .docx files to our installation of v0.99.1, Invenio
> appears to change the extension of the file from "docx" to "ocx". Any
> suggestions for changing this behavior would be greatly appreciated.
> Please let me know if there is any more information I can provide.
the heuristics on how extensions are recognized has changed quite a lot
since release 0.99.1 and I currently don't have at hand a machine to
perform a quick test.
However you might try to add the docx (and all the other new extensions
from Microsoft Office) to the config variable:
CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS.
I also see that this is triggering a bug in the 0.99.1 heuristic by the
fact that ocx (substring of docx) happens to be a valid extension.
Could you also test this patch (after taking a backup of bibdocfile.py
module) that contains a backported algorithm from latest GIT?
To apply just do:
$ cd /opt/cds-invenio/lib/python/invenio
$ patch -p4
< /tmp/0001-BibDocFile-backport-extension-guessing-algorithm.patch
Let me know if this fixes your problem (and if you don't see any
collateral issues).
Best regards,
Samuele
>From 618f4727fe406ec186fc0e89a7fe2cbd8dabfcaa Mon Sep 17 00:00:00 2001
From: Samuele Kaplun <[email protected]>
Date: Tue, 15 Jun 2010 17:24:54 +0200
Subject: [PATCH] BibDocFile: backport extension guessing algorithm
* Fix extension guessing algorithm by backporting latest version from
master. Previous algorithm was guessing "foo.docx" as having extension
"ocx". This is fixed.
---
modules/websubmit/lib/bibdocfile.py | 70 ++++++++++++++++++++++++----------
1 files changed, 49 insertions(+), 21 deletions(-)
diff --git a/modules/websubmit/lib/bibdocfile.py b/modules/websubmit/lib/bibdocfile.py
index 0689771..a44749a 100644
--- a/modules/websubmit/lib/bibdocfile.py
+++ b/modules/websubmit/lib/bibdocfile.py
@@ -66,32 +66,60 @@ CFG_BIBDOCFILE_STRONG_FORMAT_NORMALIZATION = False
KEEP_OLD_VALUE = 'KEEP-OLD-VALUE'
-_mimes = MimeTypes()
+_mimes = MimeTypes(strict=False)
_mimes.suffix_map.update({'.tbz2' : '.tar.bz2'})
_mimes.encodings_map.update({'.bz2' : 'bzip2'})
-_extensions = _mimes.encodings_map.keys() + \
- _mimes.suffix_map.keys() + \
- _mimes.types_map[1].keys() + \
- CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS
-_extensions.sort()
-_extensions.reverse()
-_extensions = set([ext.lower() for ext in _extensions])
-class InvenioWebSubmitFileError(Exception):
- pass
+def _generate_extensions():
+ """
+ Generate the regular expression to match all the known extensions.
+
+ @return: the regular expression.
+ @rtype: regular expression object
+ """
+ _tmp_extensions = _mimes.encodings_map.keys() + \
+ _mimes.suffix_map.keys() + \
+ _mimes.types_map[1].keys() + \
+ CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS
+ extensions = []
+ for ext in _tmp_extensions:
+ if ext.startswith('.'):
+ extensions.append(ext)
+ else:
+ extensions.append('.' + ext)
+ extensions.sort()
+ extensions.reverse()
+ extensions = set([ext.lower() for ext in extensions])
+ extensions = '\\' + '$|\\'.join(extensions) + '$'
+ extensions = extensions.replace('+', '\\+')
+ return re.compile(extensions, re.I)
+
+#: Regular expression to recognized extensions.
+_extensions = _generate_extensions()
def file_strip_ext(afile):
- """Strip in the best way the extension from a filename"""
- lowfile = afile.lower()
- ext = '.'
- while ext:
- ext = ''
- for c_ext in _extensions:
- if lowfile.endswith(c_ext):
- lowfile = lowfile[0:-len(c_ext)]
- ext = c_ext
- break
- return afile[:len(lowfile)]
+ """
+ Strip in the best way the extension from a filename.
+
+ >>> file_strip_ext("foo.tar.gz")
+ 'foo'
+ >>> file_strip_ext("foo.buz.gz")
+ 'foo.buz'
+ >>> file_strip_ext("foo.buz")
+ 'foo'
+
+ @param afile: the path/name of a file.
+ @type afile: string
+ @return: the name/path without the extension (and version).
+ @rtype: string
+ """
+ nextfile = _extensions.sub('', afile)
+ if nextfile == afile:
+ nextfile = os.path.splitext(afile)[0]
+ while nextfile != afile:
+ afile = nextfile
+ nextfile = _extensions.sub('', afile)
+ return nextfile
def normalize_format(format):
"""Normalize the format."""
--
1.7.0.4