Re: Please allow any indicator in any field
Hello Samuele, Dear Ferran, Alexander, I will try to explain you how Invenio is evolving and let's see if my understanding is correct, and if this will satisfy your MARC needs (Tibor, Esteban, please correct me anytime I am wrong). Sorry, I disagree about the expression «your MARC needs». I think that the correct expression should be to change it to «Marc21 compliance». Marc21 is a common agreement among a large, world-wide library community. UAB is a tiny, tiny fraction of this communitity, and UAB choose to change to CATMARC to Marc21 some years ago, with all Catalan libraries because we were following a general, world-wide movement to a global standard, it was a strategic decission. We could perfectly disappear and the Marc21 community wouldn't notice at all. We don't have any special needs. Marc21 allows for local fields and/or subfields and we find some uses for it. [...] Take a look at: https://github.com/inveniosoftware/invenio-demosite/blob/pu/invenio_demosite/recordext/fields/atlantis.cfg this is the default BibField configuration of Atlantis Demo Site. Note: this is not the default of Invenio, just Atlantis. And in Invenio core? Well no configuration is enforced there. It's up to you. You can start off Atlantis configuration and encode the whole MARC21 (or the part of MARC21 that you need), and Invenio will speak MARC21. I still remember my first Invenio installation, back when it was called CDSware. I still remember the confusion to try a large and complex piece of software (and then, it wasn't so large as it is now) that I didn't fully understand. I have helped some other local institutions interested in trying Invenio, and I have seen that they are as confused as I was. What you call «Atlantis Demo Site» is the basis of those prospective new Invenio users. This is what they install and this is what they hope to work. My opinion is that you cannot expect prospective new Invenio users to fiddle with Jinja2 templates just to make a 245 title with indicators to appear as title, to be indexed as title and to be exported in whatever format as title. The default values (what you call «Atlantis Demo Site») should comply with Marc21 as much as possible. If it doesn't, the barriers for newcomers to adopt Invenio are (a) unnecessarily difficult to overcome and (b) you are asking each of them to repeat the same exercice just to load a few records and to see the result. If you just correct the current, sub-standard records in your Atlantis Demo Site to a more realistic ones, and make the default configuration recognize them, the goal will be accomplished. I sincerely think that it is not difficult. It is just that you are interested to make the Invenio community grow. Are you? If there is some magical parameter to change it to something else like Unimarc, the better. But it should be easy, trivial and clear. Please help newcomers, or Invenio will always be something small, exotic and marginal in the digital libraries and repositories landscape. Best regards, Ferran
Re: Please allow any indicator in any field
Hello Rob, CERN may use a subset of Marc21 where most of the indicators are __. Ok, that's CERNs library decission. But if the Invenio developers claim that Invenio supports Marc21, you *must* allow other indicators there, and consider it valid. Then don't say it supports MARC21. Simple solution. Sorry, they do: Invenio complies with standards such as the Open Archives metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. http://invenio-software.org/ Flexible metadata: Standard metadata format (MARC) http://invenio-software.org/wiki/General/Features MARC format is the standard in the library world. It is well established and has been used since 1960s. [...] http://invenio-demo.cern.ch/help/admin/howto-marc And those reasons were the ones we used to choose Invenio over the other alternatives quite some years ago. The primary goal of invenio should be to meet the needs of its original institution(s). «Primary goal» should not exclude others, specially when an easy compatible solution exist: take any indicator as valid. Not perfect, but *much* better than now. If marc indicators are not necessary in the database functions of the originating institution, then feel free to ignore them. Avoid getting dragged back 40 years to the days of library catalogs by any mandates to follow every rule to the letter. Those rules may have made sense in 1970 but they don't always now. And MARC development has been under the control of library association committees, made up of librarians, who make decisions based on cataloging rules for description of items (as typed on paper cards) and not contemporary technology. That's why some institutions, like CERN or FNAL may take a subset of Marc21. But it turns out that there are millions, hundreds of millions Marc21 records out there, outside those HEP institutions, where those indicators exist. And if the library community has decided that indicators are useful, please respect their decissions. That's not what I'm discussing here. What I'm proposing is that, just changing the developers mind and taking into account that indicators may have different values that __, and changing the code for %% or whatever wildcard character is relevant, those existing records are going to be recognized by any institution testing Invenio with default values, without having to patch it as some of us have done. I have discuraged Invenio to more than one institution for those reasons. It is important to remember that marc, was originally a U.S. Library of Congress file format designed for large main-frame machines in a day of top down programming, and magnetic tape reel storage. Access was entirely sequential which explains some of the record architecture. the format was intended to be used to generate paper file cards. Yes. For similar technological limitations Unix has those cryptic abbreviations. But both Marc an Unix, curiously both born at the end of 1960s, have been proven much better than the alternatives, and the reason that they still in use. And backwards compatibility with the existing heritage has one of the reasons of its current value. Modern computing should have freedom to use marc in any way that makes it as suitable as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. Yes, of course. But as compatibility is not so difficult, I argue that it should be a goal for Invenio. Even more as the current users have records *with* indicators. And let the librarians decide whether those cataloging rules are applicable or not. Most of the times they are right. Best regards, Ferran
Re: Please allow any indicator in any field
Hello Alexander, [...] I know Marc21 reasonably well, and I don't remember now any case where having different indicators mean something so different that has to be treat differently. Here I would be more careful. Basically, I would treat Marc fields and indicators not as 3 digits plus two other funny chars but consider the whole bunch as a 5 character wide filed designation. I think, here I'm in fact a bit more in line with Estebans approach. At least if I understand it correctly. (Though I agree with you that one might not come up with a complete bibfield list, but just with a set of most common usages.) But «most common usages» won't cover them all, and so, you cannot load arbitrary records coming from unknown sources and expect Invenio to do the expected thing with them. So, what I'm proposing is: let «any» be the rule, and let's cover the exceptions later. What I'm proposing is to follow the Postel's rule: Be conservative in what you do, be liberal in what you accept from others. http://en.wikipedia.org/wiki/Robustness_principle [...] The JSON structure that we create from Marc21 (or anything else) contains as much information from the master format as you want (even the indicators). Meaning that if your data model is well written it is a lossless conversion and there is a one to one mapping that makes possible doing Marc21 to JSON to Marc21. I hope that. +1 I share some concerns about this with Ferran and Martin and some others, and I'm very sure it's quite a task... I don't think it is so difficult if the code just accepts 245%% for title, 100%% for first author, etc. With a 10% effort we could cover more than 95% of the cases. Alexander, would you accept to exchange the current Invenio default behaviour with the default I'm proposing? Knowing that it would not be perfect, do you think that it would be better? Best regards, Ferran
Re: [pu jsonalchemy] Aggregation of several fields into now
Hello Alexander, Esteban et al, Alexander Wagner a.wag...@fz-juelich.de wrote: On 27.03.2014 15:38, Esteban Gabancho wrote: I think the second solution is the closest one to reality, the `None` express that the record doesn’t have a first author. And I also think that we could apply this solution for other cases where we have this kind of situation (like with the `110__` and `710__`). What do you think? If I may: as a librarian you have a 100. You may not have a 700, but in case you have only one author it is 100 by definition. [to the non librarians in the crowd; Alexander knows it already] Or a 110, or a 111, I'd like to remind. An author is not only a 100 (personal author), but it may be a corporate one (110) or a conference (111). And please, don't forget that any of those tags may have arbitrary values as indicators. Best regards, Ferran
Re: RFC unifying phrase search behaviour
Hi, Alexander Wagner a.wag...@fz-juelich.de wrote: On 24.02.2014 11:30, Tibor Simko wrote: Hi! People don't easily distinguish between the following queries: title:'some phrase' substring title:some phrase exact search [...] Once more, I agree with Alexander. The whole reply. Danke, Ferran
Re: Invenio i18n in crowdin?
Hello Theodoros and Tibor, On Wed, 18 Dec 2013, th...@physics.auth.gr wrote: Crowdin seems nice with lots of features. For 'open' opensource projects (and also for academic/research(?) institutions) its completely free. [...] Personally I prefer Emacs's po-mode. Offline, ultra fast, auto-completion, one-key access to the phrase context, language syntax checking, etc. I also appreciate Emacs po-mode, among those reasons Tibor observes, to switch between the for or five languages I use: I translate into two, but, after the English original, I often read the French and Italian translations to look for inspiration or alternatives, just pressing the 'a' key. Match that! Ferran
Re: RFC: enable bibsched scripting capabilities [draft patch]
Hi Samuele, El Fri, 28 Jun 2013 09:26:03 +0200 Samuele Kaplun samuele.kap...@cern.ch escrigué: [...] As I feel that it may be useful to somebody else, so I submit it for your consideration. Indeed it could be useful to manipulate tasks in the queue programmatically... so thanks for your idea! The only thing is that we are about to integrate into main Invenio e refactored bibsched which is the results of several improvements we have introduced for the INSPIRE usecase... We can see how easy it is to apply your suggested CLI improvements. Good. [...] You can have a peek at it here: http://invenio-software.org/repo/personal/invenio-adeiana/log/?h=bibsched-refactoring I'll take a look in the next few days, thank you. Ferran
Re: RFC: enable bibsched scripting capabilities [draft patch]
Hi Samuele, El Fri, 28 Jun 2013 13:38:00 +0200 Samuele Kaplun samuele.kap...@cern.ch escrigué: On Friday 28 June 2013 13:32:01 Alexander Wagner wrote: Or, the other way round: I had expected, that if the job as such has a problem it wouldn't reach running status /regardless/ if I hit R interactively, or the scheduler launches the same thing automatically. Maybe if Ferran would hit manually R at midnight then the task would actually not run? There is indeed this bug addressed here: http://invenio-software.org/ticket/1432 Except that, and this is valid also for the date parse hypothesis, in my case it happens only in one Invenio for each server. The other one runs flawlessly. And there are no code differences, I've checked several times. Ferran
RFC: enable bibsched scripting capabilities [draft patch]
Hi all, since our 1.1 migration we've had some misterious behaviours with bibsched that, apparently, nobody else has. It may be related to having more than one Invenio in a single system, but I haven't been able to prove it. One very curious behaviour is that the first task after midnight switches to SCHEDULED state but doesn't run. On the test server it happens to Traces, and on the production it happens to DDD. No matter how many hours I spend (and I have spent many!) finding why and how, the mystery continues. I get the friendly daily mail as such: Emergency from http://ddd.uab.cat: BibSched halted: Process bibsort (task_id: 140997) was launched but seems not to be able to reach RUNNING status. Anyhow, I needed a mechanism to automate my daily manual task to put bibsched into manual mode, know which is the task in SCHEDULED state, run it and put bibsched back to automatic mode. I have been patching bibsched to allow, at least, those basic scripting capabilities. I don't know how many more tasks do I need (acKnowledge errors, maybe?). I'm unsure on the names I have choosen. As I feel that it may be useful to somebody else, so I submit it for your consideration. Comments welcome, Ferran BibSched: enable scripting commands [DRAFT] BibSched is a curses commands with only a few command-line options. This first patch adds a new command (appropiately called command) with two options: --mode=[automatic, manual] --key=k:task_id The first one allows to swith from manual to automatic modes, and the second allows to apply commads to tasks. Currently only one is implemented: R for run. --- modules/bibsched/lib/bibsched.py | 155 + 1 files changed, 138 insertions(+), 17 deletions(-) diff --git a/modules/bibsched/lib/bibsched.py b/modules/bibsched/lib/bibsched.py index 01314ae..dfba9fb 100644 --- a/modules/bibsched/lib/bibsched.py +++ b/modules/bibsched/lib/bibsched.py @@ -31,6 +31,7 @@ import getopt from itertools import chain from socket import gethostname from subprocess import Popen +from cStringIO import StringIO import signal from invenio.bibtask_config import \ @@ -281,9 +282,78 @@ def bibsched_send_signal(proc, task_id, sig): return False return False +def parse_report_queue_status(): +'''Get queue status parting the output of report_queue_status. + +Returns: a dictionary with the numeric task_id as key and a +dictionary for each value +''' +# print calling report_queue_status... +out = StringIO() +report_queue_status(verbose=True, status=('WAITING', 'RUNNING', 'SCHEDULED'), stream=out) +report = out.getvalue() +tasks = {} +for line in report.split('\n'): +fields = {} +if '' in line: +words = line.split('') +while words: +word = words.pop(0) +if word.endswith('='): +key = word[:-1].split()[-1] +value = words.pop(0) +fields[key] = value +key = int(fields['ID']) +tasks[key] = fields +return tasks + + +def command(opt=, arg=): +'''Check command parameters and call Manager with the appropiate values''' + +print opt = [%s] arg = [%s] % (opt, arg) +if opt in ('-m', '--mode'): +arg = arg.upper() +if 'AUTOMATIC'.startswith(arg): +mode = 'A' +elif 'MANUAL'.startswith(arg): +mode = 'M' +else: +mode = None +print sys.stderr,'Unknown mode: %s' % (arg) +sys.exit(1) +if mode: +print 'Manager, mode = %s' % (mode) +print 'redirect...' +# old_stdout, old_stderr = redirect_stdout_and_stderr() +old_stdout = sys.stdout +Manager(old_stdout, mode) +elif opt in ('-k', '--key'): +if arg.count(':') != 1: +print sys.stderr, Error: syntax: K:task_id +sys.exit(1) +else: +(cmd, task_id) = arg.split(':') +if len(cmd) == 1: +cmd = cmd.upper() +else: +print sys.stderr, Error: Key must be single character +sys.exit(1) +if not task_id.isdigit(): +prit sys.stderr, Error: task id not numeric +sys.exit(1) +print 'Manager, command = %s. [%s] [%s]' % (arg, cmd, task_id) +print 'redirect...' +# old_stdout, old_stderr = redirect_stdout_and_stderr() +old_stdout = sys.stdout +task_id = int(task_id) +if cmd == R: +Manager(old_stdout, cmd, task_id) + + class Manager(object): -def __init__(self, old_stdout): +def __init__(self, old_stdout, key='', task_id=0): import curses import curses.panel from curses.wrapper import wrapper @@ -316,8 +386,40 @@ class Manager(object): self.header_lines = 3 except IOError:
Re: Exceptions due to attacks
Hi Theodoros, Hello Ferran, My dev 1.0.1.1218 and latest maint-1.1 sites correctly display a 404 not found page for either /record/xxx/files/wp-whatever /record/xxx/wp-whatever /record/wp-whatever without sending me an exception error The same applies if wp-whatever is replaced by ../../etc/passwd and the likes. I tried the same with cds.lib.auth.gr and it also displays a 404 error (i don't know if an error is logged) but try an index.php or any other missing hit at http://cds.cern.ch. It is effectively handled by Invenio. So, I understand that we need a general solution to provide an (a) 404 not found to the attacker, and/or (b) a digested summary to the admin. Aren't the other sites having this flood of attacks? I doubt we are the only ones. Ferran
Re: Exceptions due to attacks
Hello Theodoros, On 25/4/2013 12:37 μμ, Ferran Jorba wrote: but try an index.php or any other missing hit at http://cds.cern.ch. It is effectively handled by Invenio. My point exactly. I see that both my installations and CERN's correctly handle those 'attacks'. I even tried with .php and .py files and there is no exception raised and sent to the admin even if you set CFG_SITE_ADMIN_EMAIL_EXCEPTIONS = 2 in invenio(-local).conf To tell you the truth, I wasn't aware of this variable. I have the default value: ## CFG_SITE_ADMIN_EMAIL_EXCEPTIONS -- set this to 0 if you do not want ## to receive any captured exception via email to CFG_SITE_ADMIN_EMAIL ## address. Captured exceptions will still be available in ## var/log/invenio.err file. Set this to 1 if you want to receive ## some of the captured exceptions (this depends on the actual place ## where the exception is captured). Set this to 2 if you want to ## receive all captured exceptions. CFG_SITE_ADMIN_EMAIL_EXCEPTIONS = 1 Unless I'm missing something here, I suspect something weird happening only with your installation... Watching more carefully (thanks!) my installation, I see that it returns a 404 Not found (http://traces.uab.cat/abc), even in the HTTP headers. So, now, with a better understanding of what is happening, what I'd like is a value for CFG_SITE_ADMIN_EMAIL_EXCEPTIONS that doesn't send me an email for a 404 status. I'll try to do a local fix and provide it upstream if there is interest. Thanks for your feedback, Ferran
Re: Exceptions due to attacks
Hi Samuele, [...] wait :-) This is already implemented, as Theodoros reported ;-) Just have a look in maint-1.1 to: commit 0aeb9fa7e8a6b6809be5d586bcdcf0e7a9784e05 Author: Samuele Kaplun samuele.kap...@cern.ch Date: Tue Oct 27 16:48:22 2009 +0100 WebStyle: configurable alerts for HTTP errors [...] Isn’t this commit available in your repo and already doing what you are looking for? I have it in my 1.1.0 production system! I've just modified the value (removed 404r), executed inveniocfg --update-config-py and no more mailbombs! Thanks to all, Ferran
Re: Exceptions due to attacks
Hello, In data mercoledì 20 marzo 2013 08:19:27, Johnny Mariéthoz ha scritto: every day I have some exceptions due to attacks such as: IOError: request data read error (webinterface_handler_wsgi.py:377:readline) an example of request is: /record/17041/files/wp-content/plugins/mm-forms-community/includes/doajaxfil eupload.php Is it possible to return a 404 status for such as request? which version of Invenio are you running? Depending on it this is indeed the default configuration. I will check the commit log, and point you out the missing patches... Is there any progress on this issue? Under 1.1 the missing pages produce much more noise than the old mod_python. Thanks, Ferran
Re: Celery integration for next
Hello Lars, I've finished initial integration of Invenio in Celery for next: http://invenio-software.org/repo/personal/invenio-lnielsen/commit/?h=next-celeryid=6d09ef545f03edfa6d7c77cd3a2447873b16c87e It basically follows what we discussed in DevForum (https://invenio-software.org/wiki/Tools/Celery/InvenioIntegration). Take a look if you have a minute and let me know if there's issues. [...] Yes, please, I have a doubt: at UAB, we have more than one Invenio installation in the same host, installed as plain users (not root nor www-data), and served by Apache (specifically, apache-itk) with virtual hosts. Will this celery integration be compatible with our setup? Thanks, Ferran PS And congratulations for your zenodo branch. It looks gorgeous! We are constantly looking at it for inspiration, and we are taking some ideas for our forthcoming 1.1 upgrade.
Re: Celery integration for next
Hello Lars, I think there should be no problem in your case. On your host, you would install one RabbitMQ server (the broker) - on the broker you would create 1 RabbitMQ virtual host per Apache virtual host. For each invenio installation you would start 1 worker. Great, good to know it's so simple! [...] Do you install each Invenio installation in virtual environments? If not, this might be the only issue, however I think at most a worker-start-script per invenio installation would need to be created. No, we are not using virtual environments; I'm trying to keep it as flat as possible, and I'm not using any extra layer that I don't really need. Alternatively, we are also thinking of a lite-solution, so you won't even need to install a broker (RabbitMQ) and start the Celery workers. Celery has a flag so that it can run tasks synchronously instead of asynchronously (so the lite version would seem slower, but still do the job in the end). In which situations it «seems slower»? The end user front-end or the librarians-systems back-end? Currently there's an overlap between bibsched and Celery, which we haven't completely sorted out what goes where. For now, bibsched is still the master of bibupload and friends. In the short term it seems most natural that Celery would take over bibtasklets + new territories. On the long run, we'll have to get some experiences first. Sure. Thanks for answering so fast, Ferran Cheers, Lars On 20/03/13 08:48, Ferran Jorba wrote: Hello Lars, I've finished initial integration of Invenio in Celery for next: http://invenio-software.org/repo/personal/invenio-lnielsen/commit/?h=next-celeryid=6d09ef545f03edfa6d7c77cd3a2447873b16c87e It basically follows what we discussed in DevForum (https://invenio-software.org/wiki/Tools/Celery/InvenioIntegration). Take a look if you have a minute and let me know if there's issues. [...] Yes, please, I have a doubt: at UAB, we have more than one Invenio installation in the same host, installed as plain users (not root nor www-data), and served by Apache (specifically, apache-itk) with virtual hosts. Will this celery integration be compatible with our setup? Thanks, Ferran PS And congratulations for your zenodo branch. It looks gorgeous! We are constantly looking at it for inspiration, and we are taking some ideas for our forthcoming 1.1 upgrade.
Re: Celery integration for next
Hello Lars, [...] So definitely for most cases you do want to run celery, but for small installations without big requirements this can be an easy way to get up and running. Sure, thanks again, Ferran
Re: Websearch Collections
Hello Yigit, I have a question about web search and collections. Is it possible to remove/ add some collections from/into the list under ?Search Collections? so that users are not confused with so many collection names? For instance, what should I do if I want to erase the parent collections, but keep all the children there. What I actually want is to be able to edit the select field in a flexible manner. Might we be able to do that via Invenio?s Admin Interface? If not, I will edit websearch_webinterface.py and websearch_templates.py, but I?m looking for the easiest way to do that. Does anyone have an idea on that? That would be a nice idea, that I'd like you to share if you can implement it. IMHO, I did a more general proposal five years ago that got some support but now is sleeping on the Invenio ticket database, and I suspect that it addresses the same problem that you are talking about. I suggested a method to restrict further searches to the current collection (and thus, subcollections): http://invenio-software.org/ticket/450 I still think that it can be done with little effort. The problem is that I don't have the time to do it myself. Sigh, Ferran
Re: Results of Advanced Search: Missing Message
Hello Alexander, [...] Unfortunately, I fear I have to start the year with a bug report. One of our users just noted that if you use the advanced search and enter a query without any results you do not get any notification in certain circumstances. E.g. http://goo.gl/gsJ3p (searching müller and wert in the author index of JuSER) triggers this behaviour. However, if I fill in only one field it seems to report properly. I think this should be fixed. We have also been bitten by this bug on our test 1.1 installation: the same arguments give different results on simple and advanced search. I searched my mail and I've seen that there is no answer. Yet ;-) Thanks, Ferran
Re: Add all search results to a basket?
Hello Alexander, Alexander Wagner a.wag...@fz-juelich.de wrote: Hi! Is there currently a way to add all records retrieved by a search to a basket? I mean without hooking on every hit and then add per page? Probably even some hook all records on this page function? I just got this question from our users and at the moment have to admit that the only answer I know would be to do it per record. (Note: I do not want to store the query in the basket but really the resulting records.) +1 Ferran
Re: Add all search results to a basket?
Hi Lars and Alexander, Would it be all records for an entire search or just all records displayed on one result page? In case of the latter, there's a toggle all button in the next-branch (see example on: http://invenio-demo-next.cern.ch/search?p=action_search=). Wow, I like this next demo! Congratulations! For me it would be good enough, given that there is this «Search settings» where the number of results can be increased. Probably it would be insufficient for some specific cases, but I'd say it is a great solution for most users. Thanks for your great work, Ferran
Translations, branches, and a plea for a Translator's Corner
Hello Tibor, branches has made the translators job a little more confusing, specially because there is no reference page for us to read. Probably you have written a mail to a list about which branch should we be working with, but, frankly, I cannot find it. One of the (many) great things about Debian is the Debian Developers' Corner (http://www.debian.org/devel/), that sure you know much better than me. It would be very useful for us, translators, to have a Translators' Corner in the Trac pages where to locate reference information. I'd be satisfied even if starts humble and small, like which branch (or branches?) should be work with. There is a lot more to be added, like .po and webdoc and specific hints, etc., but as Laozi said, even the longest journey begins with a single step :) OTOH, my reformatting of the websearch webdoc pages is still waititing for your approval, and decissions like how to organize sections and languages should go to this page as well. Thanks, Ferran
Re: Formatting numbers for display
Hello, [...] We used websearch_templates.tmpl_nice_number instead. Shouldn't the locale aware version be used? I need to use it outside of websearch_templates. It seems like this functionality should be in miscutils instead. Yes, please! Ferran
Re: [INSPIRE-ADMIN] Queue blocked
Hello Thorsten and Tibor, On Fri, 02 Mar 2012, Thorsten S wrote: however, why are individual bad input records not simply skipped *and logged as such for later inspection* and the queue continues otherwise unaffected? a single bad edit shouldn't stall everything afterwards, unless there is a convincing reason that I am not aware of Briefly put, upload jobs are usually run sequentially, since in general a future upload job may depend on the result of a previous upload job. So, if an upload job ends in an error, manual inspection and resolution may be required. Historically, in this case, the upload queue was stopped and an SMS alert was sent to the admin support person, who then logged in and inspected the problem and unblocked the queue. This `easy technical solution' works, but requires human intervention, which is definitely sub-optimal. After several years of checking that, in my installations, I never found an error that required, so to say, a general stop-panic-and-see situation, I wrote a crude yes-I-know script that I run every 20 minutes and leaves a couple of logs for later inspection. I can leave the office for whatever reason with confidence that the systems will keep running. You may have to adjust some Invenio paths, but it may help you to decide to roll your own. Hope it helps, Ferran # -*- coding: utf-8 -*- # Time-stamp: 2010.07.19 08:59:38 error2ack.sh d...@homs.uab.es # This scripts acknowledges bibtasks errors and resumes the scheduler what=error2ack cd ~/tmp filerotate() { if [ -s $filename ]; then test -s $filename.8 mv $filename.8 $filename.9 test -s $filename.7 mv $filename.7 $filename.8 test -s $filename.6 mv $filename.6 $filename.7 test -s $filename.5 mv $filename.5 $filename.6 test -s $filename.4 mv $filename.4 $filename.5 test -s $filename.3 mv $filename.3 $filename.4 test -s $filename.2 mv $filename.2 $filename.3 test -s $filename.1 mv $filename.1 $filename.2 test -s $filename.0 mv $filename.0 $filename.1 mv $filename $filename.0 fi } for filename in $what.log $what.sql; do filerotate done echo select id, status, runtime, proc, arguments from schTASK where status in ('ERROR', 'DONE WITH ERRORS'); | \ ~/invenio/bin/dbexec $what.log if [ -s $what.log ]; then awk 'NR 1 { q = sprintf(%c,39); \ printf(update schTASK set status=%s%s%s where id=%s;\n, \ q, ACK ERROR, q, $1) }' $what.log $what.sql cat $what.sql | ~/invenio/bin/dbexec ~/invenio/bin/bibsched restart fi
Re: Time based po snapshots?
Hello Tibor, Tibor Simko tibor.si...@cern.ch wrote: On Wed, 21 Dec 2011, Ferran Jorba wrote: How often and, specially, *when* are you plan to update them? Have you thought on a periodicity, as I suggested (ex: first day of each month)? I thought about 4 times a year. Considering how many updates the PO files currently get, 12 times a year would probably be an overkill. Anyway, the update frequency should be rather tightly coupled with release frequency model, so we'll see later this year. Fair enough. But please let us know a few days (a week, perhaps?) in advance so we can do the final sprint. Thanks, Ferran
Re: Slow MySQL queries with large data structures in memory
Hello Benoit, [...] In [4]: %time res = run_sql(SELECT id_bibrec FROM bibrec_bib03x LIMIT 100)CPU times: user 1.96 s, sys: 0.06 s, total: 2.02 s Wall time: 2.30 s Any idea about why we're seeing this and how we can fix it? It is quite a big problem for us as our citation dictionaries are so big. I have noticed in more than one case that for some minimally complex (?!) operations the bottleneck is MySQL, not Python, so if can move part of the manipulation from one the other you have surprises. I cannot remember the exact case, but the equivalent with yours should be changing: res = run_sql(SELECT id_bibrec FROM bibrec_bib03x LIMIT 100) to: res = run_sql(SELECT id_bibrec FROM bibrec_bib03x) res = res[:100] I remember gains of 10x. YMMV, but you can try it. Ferran
Re: Time based po snapshots?
Hello Tibor, a couple of more questions: I've updated PO files not only for the maintenance branch (v0.99.4), but also in the master and next branches, see for e.g. commit b85da374b1e. How often and, specially, *when* are you plan to update them? Have you thought on a periodicity, as I suggested (ex: first day of each month)? This way we, the translators could know when to send you the updated files before they re-syncronize with the source code, without worrying about working on an obsolete file. You can checkout latest Invenio master and work on the PO files there and send me patches as described in: So, do you recommend us to work only the master branch, and forget about next or maint? Thanks again, Ferran
Re: Time based po snapshots?
Hello Tibor, On Mon, 31 Oct 2011, Tibor Simko wrote: We plan to enter a soft feature freeze for the v1.0 branch by next week, so this will be a perfect opportunity to release updated PO files at the same time. So stay tuned. Took a while, but the new PO files are available now. Ok, I see the new files for the 0.99.4 release. But how are they updated on the next or maint branches? How do I get and patch them? Have you thought on a workflow? Please assume no deep git mastering on my side; I have no experience yet on remote branches, sorry ;-( Thanks, Ferran
Re: Author handling bfe_authors.py et al
Hello Alexander, Currently, bfe_authors.py uses authors = [] authors_1 = bfo.fields('100__') authors_2 = bfo.fields('700__') This enforces authors to be considered only if both indicators are blank. However, you may notice that from the authoritative definition at http://www.loc.gov/marc/bibliographic/bd100.html this is an example of the default values of Invenio not being Marc21 compliant. I've complained about this a few times, and I should have filled a task about those defaults, and sent a few patches, although I haven't yet ;-(. The problem of the default values not being correct has those consequences: if you import records from another catalog, those values mishave in Invenio. Librarians (and library-educated computer people) expect those records to behave like in the other system. So it is interoperability and economy. The problem arises not only in those Python bibformat snippets, but also in bibindex definition, and all export formats. For example, my bfe_author.py has those fields: if (authors_type in ['','personal']): authors.extend(bfo.fields('100%%')) if (authors_type in ['','corporate']): authors.extend(bfo.fields('110%%')) if (authors_type in ['','meeting']): authors.extend(bfo.fields('111%%')) if (authors_type in ['','personal']): authors.extend(bfo.fields('700%%')) if (authors_type in ['','corporate']): authors.extend(bfo.fields('710%%')) if (authors_type in ['','meeting']): authors.extend(bfo.fields('711%%')) if (authors_type in ['','personal','corporate','meeting']): authors.extend(bfo.fields('720%%')) (In my case I have the need of show sometimes personal or corporate authors, depending of the collection; I understand that it is not always the case). In bibindex (/admin/bibindex/bibindexadmin.py/field), you have also to add those fields: author 100%, 110%, 111%, 700%, 710%, 711%, 720% In the Marcxml to DC xls xls as well: xsl:for-each select=datafield[(@tag=100 or @tag=110 or @tag=111)] dc:creator xsl:call-template name=subfieldSelect xsl:with-param name=codesab/xsl:with-param /xsl:call-template /dc:creator /xsl:for-each xsl:for-each select=datafield[(@tag=700 or @tag=710 or @tag=711 or @tag=720)] dc:contributor xsl:call-template name=subfieldSelect xsl:with-param name=codesab/xsl:with-param /xsl:call-template /dc:contributor /xsl:for-each I borrowed the following xsl function from somewhere (LC, I think): !--- Added FJ 5-feb-2010 to resolve template -- xsl:template name=subfieldSelect xsl:param name=codesabcdefghijklmnopqrstuvwxyz/xsl:param xsl:param name=delimeter xsl:text /xsl:text /xsl:param xsl:variable name=str xsl:for-each select=subfield xsl:if test=contains($codes, @code) xsl:value-of select=text()/ xsl:value-of select=$delimeter/ /xsl:if /xsl:for-each /xsl:variable xsl:value-of select=substring($str,1,string-length($str)-string-length($delimeter))/ /xsl:template And so on. It is a major task, but much needed. Newcomers are likely to feel frustated due to the system not behaving as espected. Ferran
Re: Author handling bfe_authors.py et al
Hello Alexander, this is an example of the default values of Invenio not being Marc21 compliant. Right. And then these are bad defaults. I've complained about this a few times, and I should have filled a task about those defaults, and sent a few patches, although I haven't yet ;-(. The reasons why I haven't done it myself, besides the lack-of-time issue (bad excuse) are that on my instances I have a mix of better-than-default values and local ones; I don't have (or I don't have the resources to have) a reasonably recent Invenio instance running anywhere (we are stilll at 0.99.1), so I'd be patching something old; and, even with those restrictions, when I tried, I found those example records (modules/miscutil/sql/tabfill.sql) and the testing infrastructure that I didn't know how to handle. So I feel overwhelmed each time I try ;-( But idealy one should be able to go, for example, to http://www.archive.org/details/ol_data and get and load all University of Toronto Library catalog in the local Invenio and use it, maybe just adjusting some valid collection field value. Now it is not the case. And it is a pity, because after the suitable adjustments, Invenio is very able to handle them. It is even possible to have something like authority records in it (at least we have them more-or-less working at http://traces.uab.cat/). Best regards, Ferran
Re: Author handling bfe_authors.py et al
Hi again, The reasons why I haven't done it myself, besides the lack-of-time issue (bad excuse) are that on my instances I have a mix of better-than-default values and local ones; Well, it would be great if you could drop me some sort of list in case your previous post was not complete. We're about to roll out some installation here based on recent Invenio so we might work that in if it's not already done. Let me publish my logical fields list here on the list, because it is easy and likely to be useful to most readers (I've left off a few local fields): 4. Logical fields overview _ |Field__|MARC_Tags|Translations___| | |00%, 01%, 02%, 03%, 04%, 05%,| | | |06%, 07%, 08%, 09%, 10%, 11%,| | | |12%, 13%, 14%, 15%, 16%, 17%,| | | |18%, 19%, 20%, 21%, 22%, 23%,| | | |24%, 25%, 26%, 27%, 28%, 29%,| | | |30%, 31%, 32%, 33%, 34%, 35%,| | | |36%, 37%, 38%, 39%, 40%, 41%,| | | |42%, 43%, 44%, 45%, 46%, 47%,|ca, cs, de, el, en, es, fr, it,| |any_field |48%, 49%, 50%, 51%, 52%, 53%,|no, pt, ru, sk, sv, uk | | |54%, 55%, 56%, 57%, 58%, 59%,| | | |60%, 61%, 62%, 63%, 64%, 65%,| | | |66%, 67%, 68%, 69%, 70%, 71%,| | | |72%, 73%, 74%, 75%, 76%, 77%,| | | |78%, 79%, 80%, 81%, 82%, 83%,| | | |84%, 85%, 86%, 87%, 88%, 89%,| | | |90%, 91%, 92%, 93%, 94%, 95%,| | |___|96%,_97%,_98%|___| |title |130%, 210%, 222%, 240%, 245%,|ca, cs, de, el, en, es, fr, it,| |___|246%,_247%,_730%,_740%___|no,_pt,_ru,_sk,_sv,_uk_| |author |100%, 110%, 111%, 700%, 710%,|ca, cs, de, el, en, es, fr, it,| |___|711%,_720%___|no,_pt,_ru,_sk,_sv,_uk_| |abstract |520% |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |keyword|653% |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |series_|830%,_440%,_490%_|ca,_en,_es_| |subject|600%, 610%, 611%, 650%, 651% |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |fulltext |8564%u |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |collection |980% |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |year |260%c, 973%y |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |record_ID |001 |ca, cs, de, el, en, es, fr, it,| |___|_|no,_pt,_ru,_sk,_sv,_uk_| |issn___|773%x,_022%a_|ca,_en,_es,_fr_| The indexes is one-to-one with this one *except* for keyword. What we have done is to keep the proper subject tags on the official 600, 610, 611, 650 and 653 and keyword as 653, but merge them as *indexes*, so the index for keyword (Pàgina inicial Admin Area Manage Indexes) has both the subject and the keyword fields. That's the solution we've come with. And about the bibformat and friends, that is: lib/python/invenio/bibformat_elements/ etc/bibformat/format_templates/ etc/bibformat/output_formats/ I keep them under guilt patches (http://repo.or.cz/w/guilt.git or http://packages.debian.org/guilt), but they would only apply to a 0.99.1 release. I can happily send you a tarball for each; but please understand there is a mix of better, worse and bad solutions, as I have been learning to tame the beast over those years. I'll come to you back in a while. Cheers, Ferran
Re: Time based po snapshots?
Hello Tibor, On Tue, 25 Oct 2011, Ferran Jorba wrote: Could the project create official updated po files? Monthly (better), bimonthly or quarterly (worse) would be fine for me. We plan to enter a soft feature freeze for the v1.0 branch by next week, so this will be a perfect opportunity to release updated PO files at the same time. So stay tuned. Sorry I wrote my other mail at the time you were writing yours. As for the future periodical updates, I agree with you that we can indeed be updating PO files more frequently. Many services are still being run off master, but with the maint-vX.Y/master/next branch policy now firmly in place, the whole release model of Invenio can be turned more towards time-oriented, not feature-oriented, model. A bit like the Django guys are doing. This is where we are heading with stable/unstable features, so PO files can come along. Good. Thanks, Ferran
Re: Re-implementation of OAI repository in Invenio
Hello Alexander, Good morning! [I think that your intention was to respond to the whole list, so let me continue the disussion here.] I agree with Ferran that 024 should be taken into account. From a marcish perspective it seems better suited than 035. You may want to I was suggesting 024 instead of a 909, not 035. 035 is intended for external identifiers, so an external OAI id, a handle, a LCCN, a local library catalog number, whatever other identifier should definitively go to 035. http://www.loc.gov/marc/bibliographic/bd035.html The question is where we do store our own OAI id when generated by the oaiarchive daemon, and we think that 024 is correct. note the marc mechanism using 024 7_ $2src. If I get your reqs correctly, this is ecactly what is required, as you can store provenance together with the Id in question. (Cf. stotage of doi according to LoC.) The use of a $2src seems to be allowed only if your src belong to an approved list maintained by the Library of Congress. Or at least this is what we understood following the links from http://www.loc.gov/marc/bibliographic/bd024.html http://www.loc.gov/standards/sourcelist/standard-identifier.html http://www.loc.gov/standards/sourcelist/index.html so we thought that a first indicator 8 (Unspecified type of standard number or code) was a better choice. Unfortunately, and I think this as a reason for its missuse', dupe checking is currently bound to 035. For compatibility with forghein data this should, however, be expanded to 024. I don't follow you here. If I harvest your http://example.de/record/1234, I think that it should have a 024.8x $a oai:example.de:1234, value, and I'll be storing your oai:example.de:1234 in my 035 $a, with an $9 of my own choice. And I can re-expose it as oai:example.cat:2345, keeping your oai:example.de:1234 as dc:identifier (in our Marc21 to DC XSL we export them both). Example: http://ddd.uab.cat/record/77021/export/hm http://ddd.uab.cat/record/77021/export/xd Best regards, Ferran
Re: Re-implementation of OAI repository in Invenio
Hello Samuele, [...] I don't follow you here. If I harvest your http://example.de/record/1234, I think that it should have a 024.8x $a oai:example.de:1234, value, and I'll be storing your oai:example.de:1234 in my 035 $a, with an $9 of my own choice. And I can re-expose it as oai:example.cat:2345, keeping your oai:example.de:1234 as dc:identifier (in our Marc21 to DC XSL we export them both). Example: http://ddd.uab.cat/record/77021/export/hm http://ddd.uab.cat/record/77021/export/xd And this is precisely what is finally supported by the new branch that will soon be merged. OTOH you use the default shipped invenio.conf Well, no, sorry for the poor example. What happens is that this particular record is no re-exposed, and I didn't find a quick example of a record with an external OAI id *and* a local 024. This is better, because it has a few 035 (but no OAI ids due to a recent external migration) and a local 024: http://ddd.uab.cat/record/70053/export/hm (which is trying to be as much backward compatible as possible, and therefore keeping on using 909CO as default), but with the more OAI-PMH I'm afraid I'll challenge you here ;-) Soon I'll be opening a task in invenio-software.org trac requesting that the default Marc values of Invenio should match the standard, for the benefit of all, specially newcomers. Experienced Invenio admins already know how to tune, change, etc. But we should be more friendly to newcommers, and if we say that we follow Marc21, we should comply much better than now, don't you think? Thanks, Ferran
Re: Re-implementation of OAI repository in Invenio
Hello Samuele, [...] In Invenio there is a special treatment for this 035 field, namely that the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely a record. So shall I simply add by default to 035 the above mentioned attributes? E.g. * baseURL - $u (different than $9 which is a semantic string. The baseURL might change because of technical reasons, and therefore the $9 subfield, when present will receive priority in identify a record). * identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG) * datestamp - $d * metadataNamespace - $m * originDescription - $o * harvestDate - $h * altered - $a Is there anyone in the Invenio community whose system is harvesting record (putting external IDs in 035) and is trying to expose them? We also do it. After backporting your 2fb7275849e83f5afbb7915000e208a3e053889a patch to 0.99.1, now we store in 035 $a all kinds of external identifiers, including external OAI ids, with $9 acting as a kind of «namespace identifier» to avoid conflicts. As per our own OAI id, we spent some time to conclude that, instead of a local 9XX field, it should go to 024.8_: http://www.loc.gov/marc/bibliographic/bd024.html CFG_OAI_SET_FIELD = 0248_9 CFG_OAI_ID_FIELD = 0248_a So, even if we are re-exposing a small part of our harvested holdings, at this moment we don't reuse the same tag for both uses. I understand the need of your suggested fields ($d, $m, etc.), but please don't hurry up adding non standard subfields to 035. The more your default values depart from Marc21 standard, the more difficulties you are posing to interchange records with other databases, and more troubles to potential Invenio newcomers. I don't have the solution right now, but your fields don't appear in the standard: http://www.loc.gov/marc/bibliographic/bd035.html Maybe you can ask to some librarian before deciding them. Thanks, Ferran
Re: How are Invenio Trac tickets priorised and assigned?
Hi Samuele, [...] Is there anything I can do to increase the interest for some of them? your mail is helping a lot, in this sense :-) Glad to hear it, thanks, Ferran
Re: Invenio and INSPIRE code swarm movie
Hello Tibor and Samuele, Il giorno lun, 18/04/2011 alle 13.46 +0200, Tibor Simko ha scritto: As we have discussed some weeks ago, here are two short animations with visual representation of Invenio and INSPIRE commit history: http://invenio-software.org/download/invenio-code-swarm.avi http://invenio-software.org/download/inspire-code-swarm.avi what tool have you used to generate these? Is it Gource? http://code.google.com/p/gource/ It seems to me that it is code_swarm, right? http://code.google.com/p/codeswarm/ Thanks for sharing it, Tibor. I've been imagining many possibilities since: processing Apache logs, with the names being either the collections or the submitters, collection growth (with different variations data from either sbmSUBMISSIONS or bibrec tables, maybe crossing it with guess_primary_collection_of_a_record), etc. etc. But wandering at the http://www.michaelogawa.com/research/ site, I feel that the evolines (or storylines) visualisation prototype is also fascinating: http://www.michaelogawa.com/research/storylines/ Ars longa, vita brevis, Ferran
Is it enough to submit a ticket and leave as new? (re #425)
Hello Tibor, I'm really interested in ticket http://invenio-software.org/ticket/425, because I hope it can speed up our indexing time, as well as provide a more general way to remove diacritics. As we have a bunch of digitalisation of old material, with OCR provided by different softwares, we have lots of funny combinations of characters. I'm sorry I don't have the time infrastructure to build and test a patch for it, but given that the test units Invenio provides, it sould not be difficult to test. Can I do anything else? Is it up to you to assign it to somebody else? Do you think it is feasible to get into the 1.0 release? Thanks, Ferran
Re: BibClassify with RDF and MySQL store
Hello, On Sat, Dec 18, 2010 at 12:33 PM, Samuele Kaplun samuele.kap...@cern.ch wrote: Hi Roman, Il giorno sab, 18/12/2010 alle 12.17 +0100, Roman Chyla ha scritto: I agree this is cool, but something doesn't fit, at least I don't understand how this could be used for the task of bibclassify, the dict is good if you know (more or less) what you are looking for, but the task of bibclassify is to find entities inside the fulltext - and to find that out, bibclassify has to search for it - and it is not exactly the same thing as the spell checking. I must be missing something, could you explain to me what advantage at all there would be in using the dict? As a fast cache of single level entries? I could see how it would be more useful for the cache, citation links etc., but not for bibclassify. I suggested to look at dict for those reasons: 1. I doesn't neet 24 GB of RAM to start working, regardless of the size of the corpora. ;-) 2. It easily permits shared and reused corpora. 3. The protocol itself is easy to understand, not unlike HTTP. 4. The *meaning* of the returned value is up to the client, not unlike the Unix way of doing things. In dict, you just return data, it is up to you to interpret it. You can tag relations, codes, etc. 5. Integrating a dict client in the python Invenio code has a small cost (dicoclient.py from the dico client, see http://packages.debian.org/dico or http://puszcza.gnu.org.ua/software/dico/) is only 13 KB and doesn't have dependencies other than standard Python libs. I am not that aware of how BibClassify works right now, but if its final goal is to look for the most frequent keywords (from a controlled set) inside a fulltext, then, post-poning the issue of the grammar (plural, genders, conjugations :-S), I think that it would be indeed possible to use dictd in a orthogonal way than we currently do with ontologies. Currently for each word in the ontology (correct me if I am wrong) we look how many times it appears in the text. On the other hand with dict, we might simply take all the words in the text, and filter them against the dictionary (which is built after the ontology), and then sum up the occurencies of repeated words. OK, I see what you mean - could work, but would work mean 'improved'? If you take an average of 3000 words times the real time reported for lookup above: 3000 * 0.004 = 12s or 3000 * 0.006 = 18s that is two to three times slower than the current bibclassify implementation (in case of HEP). It could be faster for bigger dictionaries, like Eurovoc, because bibclassify will slow down -- or if we manage to cut down the lookup time (by making it local process?) With dict you can use stateless connection (like HTTP) but also you can reuse an already opened session, so the latency should be better. The two methods should accomplish the same goal (if I am not wrong on BibClassify algorithm) but the latter should be in principle extremely fast, unless the grammar issue is the bottleneck. in principle, direct lookups must be replaced by some approximate lookups (btw, I think dictd could handle grammar variations better than the current regex pattersn, so that would be a gain) - but it will return more entries in many cases, then it is necessary to choose the right one. Might be easy for limited domains - for Eurovoc, you will need some sort of disambiguation Another interesting problem is the single keyword made of several tokens, like 'search engine' in the sentence: Invenio comes with its own search engine implementation? will you ask for: 1. invenio 2. comes 6 search 7 engine 8 implementation -- somehow combine 6+7 based on the responses? or create collocations and ask for them (will double the number of lookups, and does not skip inserted words) Invenio comes comes with ... search engine Don't get me wrong, dictd is cool. I am just saying it is tiny bit more complicated. Maybe. I don't know the details of BibClassify, sorry, and I wasn't advocating to rewrite all BibClassify using dict, of course. What I'm suggesting is that those dictionaries, ontologies, etc that you need for BibClassify to work could be served and easily reused with dict, cheaper, faster and maybe better that SQL or Solr or whatever other alternative. Thanks, Ferran PS Eric Lease Morgan did some experiments a while ago using dict for serving LC Authorities Catalog, see http://serials.infomotions.com/code4lib/archive/2008/200803/0557.html
Re: BibClassify with RDF and MySQL store
Hello Samuele, [warning: I may be way off-road] I am starting to play a bit with the EuroVoc http://eurovoc.europa.eu/ ontology in order to integrate it into OpenAIRE Orphan Record Repository, for automatic keyword extraction for EU documents. This ontology is *big*! and multilingual. I can't even load it with RDFLIB on my laptop (4GB of RAM). [...] Blame XML bloat (again). For dictionaries and such, that is, a large corpus of data that doesn't change so much, in other words, that it is not transactional, why don't you use specialised software? Enter http://dict.org, a protocol (http://www.dict.org/rfc2229.txt) and a canonical implementation for dictionaries, blazingly fast, veteran and well known (see for example http://packages.debian.org/dictd and http://packages.debian.org/dict), plus several other implementations (http://www.dict.org/w/software/start), among others in Python (even curl is also a dict client) Creating and indexing a dict server or about half a milion entries using the standard dict.org utilities takes less than a minute, and the searches are resolved in miliseconds, for postitive and negative or approximate answers, for example: $ time dict -h localhost 00075743be0748c4965848c62c2f5a70 1 definition found From unknown [md5sums]: 00075743be0748c4965848c62c2f5a70 00075743be0748c4965848c62c2f5a70 /mnt/VOLUM-I/3-12/ddd/veterinaria/revhigsanvet/tif/revhigsanvet_a1915m11t5n8/revhigsanvet_a1915m11t5n8_21.tif 00075743be0748c4965848c62c2f5a70 /mnt/VOLUM-Ib/3-12/ddd/veterinaria/revhigsanvet/tif/revhigsanvet_a1915m11t5n8/revhigsanvet_a1915m11t5n8_21.tif real 0m0.004s user 0m0.000s sys0m0.000s $ time dict -h localhost 00075743be0748c4 No definitions found for 00075743be0748c4 real 0m0.004s user 0m0.000s sys0m0.000s $ time dict -h localhost 00075743be0748c4965848c62c2f5a7 No definitions found for 00075743be0748c4965848c62c2f5a7, perhaps you mean: md5sums: 00075743be0748c4965848c62c2f5a70 real 0m0.006s user 0m0.000s sys0m0.000s $ dict -h localhost -I dictd 1.10.11/rf on Linux 2.6.26-2-amd64 On nuix.uab.es: up 21+03:06:13, 813 forks (1.6/hour) Database Headwords Index Data Uncompressed md5sums 580225 23 MB32 MB165 MB $ dict -h dict.org -I dictd 1.9.15/rf on Linux 2.6.30-bpo.1-686 On miranda.org: up 51+17:54:33, 16914217 forks (13619.5/hour) Database Headwords Index Data Uncompressed gcide 203645 3859 kB 12 MB 38 MB wn 154563 3089 kB 8744 kB 26 MB moby-thes 30263528 kB 10 MB 28 MB elements 130 2 kB 14 kB 45 kB vera 9203103 kB160 kB558 kB jargon 2374 42 kB621 kB 1430 kB [...] Part of this fast speed is that the input file for creating the dictionary is sorted, and then it does binary searches on a mmapped file. As the protocol is inherently client-server, the same ontology (dictionary) can be (re-)used among different Invenio instances. It is not a toy. I haven't been able to make any noticeable use in my instance even massively querying it. You can follow part of my experiments here: http://news.gmane.org/gmane.network.protocols.dict.user Sorry, I had to say it, Ferran
[patch] BibHarvest: sort remote set names
Hello, please consider this tiny patch for inclusion. It applies cleanly to current git tree. Thanks, Ferran PS Is it ok to send ot to Jerome CCing the list, or should I address it to Tibor instead? BibHarvest: sort remote set names * When remote OAI site has a large number of sets, showing them in random order makes very difficult to make a sensible selection. Sort them. --- modules/bibharvest/lib/oai_harvest_admin.py |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/modules/bibharvest/lib/oai_harvest_admin.py b/modules/bibharvest/lib/oai_harvest_admin.py index 2684122..4a06472 100644 --- a/modules/bibharvest/lib/oai_harvest_admin.py +++ b/modules/bibharvest/lib/oai_harvest_admin.py @@ -208,6 +208,7 @@ def perform_request_editsource(oai_src_id=None, oai_src_name='', sets = findSets(oai_src_baseurl) if sets: +sets.sort() # Show available sets to users sets_specs = [set[0] for set in sets] sets_names = [set[1] for set in sets]
Re: [patch] BibHarvest: sort remote set names
Hi Tibor, Thanks; I have added sort() to one more place so that the order would be the same also when adding new OAI sources, and committed. You are right, I've been bitten myself afterwards. And also, I see that probably the .sort() lines should be *after* the '# Show available sets to users' comment line, not before. Sorry! PS Is it ok to send ot to Jerome CCing the list, or should I address it to Tibor instead? For smaller patches of this kind, it is perfectly OK to use the list or to send them to us privately. For bigger patches, it would be better to make a ticket at http://invenio-software.org/ and attach the patch there. Understood. Thanks again, Ferran
Freshness of po files (was: Re: [patch + tar.gz] I18N: updates to Catalan and Spanish translations)
Hello Tibor, [CCing the list, as somebody else may be interested in my question] On Wed, 13 Oct 2010, Ferran Jorba wrote: a few more transations. Thanks, committed. Fine. So I'm sending the patch and tar.gz as safety measure. Typically I just use tar.gz, the PO files there are always perfect :) Good to know. However, I have question: how fresh are the PO files I get from git? Sometimes I hit S in emacs po-mode to see the context and I don't see the string in the code snipped. Does anybody, either you in the repository, or me at home, have to do anything to update them? Thanks, Ferran
Where to create a new ticket (was: Recommend lynx instead of html2text)
Hello Tibor et al, a few weeks ago, I found out that html2text is inadequate to create plain text from HTML, because it only knows about iso-8859-1. I suggest lynx instead, as I explained in this mail (attached fragment). As I haven't seen any news about this, maybe I should create a ticket for it. But I don't know where, because I'm not up to date about your trac migration. Should I do it myself? Your guidance will be appreciated. Ferran ---BeginMessage--- [...] A second issue I'm having is that, in our site, we have a lot of HTML documents, and a bunch of them are in non-utf8 charset (mostly iso-8859-1 and windows-1251). I have been watching and debugging it the whole morning. In a word, bibindex_engine expect everything in utf8, and when it is not, it complains loudly. Adding the exception to the message, I got: 2010-03-15 09:47:05 -- Error: Cannot put word num??riques with sign 1 for recID 10 (exception: 'utf8' codec can't decode bytes in position 9-11: invalid data). How to get utf8 clean text from any HTML document, from any charset? html2text has the -ascii option to output unaccented text, but it didn't do anything good in my files. Fortunately, lynx does it cleanly. This quick-and-dirty patch allows me to do some progress: @@ -417,6 +417,8 @@ def get_words_from_fulltext(url_direct_or_indirect, stemming_language=None): elif os.path.basename(conv_program) == html2text: cmd = %s %s %s % \ (conv_program, tmp_name, tmp_dst_name) +cmd = lynx -dump -display_charset=utf8 %s %s % \ +(tmp_name, tmp_dst_name) else: write_message(Error: Do not know how to handle %s conversion program. % conv_program, sys.stderr) # try to run it: [...] ---End Message---
Re: The invasion of the XML entities
Hello Benoit, [...] However in the example attached, we have XML entities both in the title and in the abstract. The abstract seems to be correctly unescaped but the title remains escaped leading to some bad results (Title is #1069;#1083;#1077;#1082;#1090;#1088;#1086;#1085;#1085;#1072; #1103;j#1090;#1077;#1086;#1088;#1080;#1103; #1084; ...). I don't seem to be able to find the cause of this weird behavior. Maybe one of you can? Maybe this record was imported from somewhere? I've had similar cases from sources with a mix of different encodings. Recode is your friend. I've included this step in my problematic workflow: $ recode --diacritics html..utf8 %s %s' In your case, it produces this Cyrillic result: Электронна яjтеория м ... Hope it helps, Ferran
Re: The Multivio project
Hello Miguel, I hope you'll excuse me for using the list, but I have an announcement that might be of interest to you. There's a project going on here at RERO called Multivio whose goal is to provide a presentation layer for archives of digital documents: https://www.multivio.org/ [...] Please don't hesitate to take a look at the project site https://www.multivio.org/, try some examples, try it with your own documents and send us some feedback at i...@multivio.org. It certainly looks interesting! I've tested a couple of PDF files from our site. The first one happened to weight 11 MB (an old scanned journal, from http://ddd.uab.cat/record/53804), and it took so long that I had to abort it: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/garbanzo/garbanzo_a1873n47.pdf The second one, a modern native PDF (from http://ddd.uab.cat/record/5), was sligtly better: http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/autonoma/autonoma_a2010m3n233.pdf I'd certainly choose Multivio instead of our Flash based equivalent (no, it's no my fault, I give it to you so you can see a propietary alternative): http://www.uab.es/revista-autonoma/ However, I'd say that the quality of the thumbnails can be improved. Which tool are you using? In my case, I've found that, by far, the fastest and best results are using a combination of Xpdf's pdftoppm and Imagemagick's convert. We create the thumbnails of the first page of our PDFs this way (simplified): $ pdftoppm -f $page -l $page $file.pdf $file $ convert -thumbnail 85 $file-0$page.ppm $file.png $ rm $file-0$page.ppm pdftoppm converts all pages to ppm if no -f or -l parameters are given. Relying on ImageMagick's own PDF to PNG (or any other graphic format) conversion, the route goes through Ghostscript, and it brings any system to its knees, and the quality is worses. Hope it helps, Ferran
Re: The Multivio project
Hello Samuele, In data lunedì 3 maggio 2010 11:19:23, Ferran Jorba ha scritto: It certainly looks interesting! I've tested a couple of PDF files from our site. The first one happened to weight 11 MB (an old scanned journal, from http://ddd.uab.cat/record/53804), and it took so long that I had to abort it: just for reference, in case it's needed also by other users, the pdfopt utils (from ghostscript) can transform any PDF into a linearized PDF (also called fast web view mode), that will add hints to the PDF to reference single pages without downloading the full file. I guess this would make the multivio able to open your 11Mb scanned document without any problem. Thanks for thee suggestion. I've tried it on one of our 100 MB+ monsters and what I've seen is that the size doesn't vary. But certainly Xpdf's pdfinfo notes the change in the «Optimized» field: before after pdfopt Pages: 294294 Encrypted: no no File size: 112858067 bytes112838493 bytes Optimized: no yes PDF version:1.51.5 Another task in our TODO list... Thanks, Ferran
Re: Fwd: Trac
Hello Tibor, Travis et al, On Tue, 20 Apr 2010, Brooks, Travis C. wrote: So, shall we set up a parallel INSPIRE Trac instance? Yes, I think so. Let's use the Invenio Trac instance for a few days, configure it to the will, and I'll then clone it for INSPIRE. Sorry to jump in without being asked. A couple of years ago we evaluated Trac for internal use for our DDD site and other library related projectS. The last S is important: we, like many others, are inveolved in more than one project. Doing it in Trac was not possible at that time, but rather, you have to set several Trac instances, and duplicate users, permissions, and so on. Then, it is not possible to know all tickets (tasks, or whatever) assigned to a specific person, or priorise them across projects. There are a couple of pages in the Trac site about this: http://trac.edgewall.org/wiki/TracMultipleProjects and the infamous http://trac.edgewall.org/ticket/130. In this last ticked (http://trac.edgewall.org/ticket/130#comment:52) I learned about DrProject (https://www.drproject.org/), a Trac fork with multiproject support, that we are very pleased to use here at UAB. The trouble is that DrProject is now a dead project. The students at the University of Toronto that wrote them under the leadership of Dr Greg Wilson are rewriting it under the Django framework and it will be called Basie (https://basieproject.org/), but it is not finished yet. However, there are a couple of alternatives since then: Retrospectiva (http://retrospectiva.org/) and Redmine (http://www.redmine.org/) both in Ruby. Redmine seems to be more mature and it is already packaged for Debian, a big plus for us (http://packages.debian.org/redmine). It is also multidatabase, multilingual, multi DVCS (including git), etc. etc. We have not decided yet which one we will migrate to, since we are not in a hurry, but my question is: are you sure to want to migrate to a isolated tool where will be no relation between your (at least) two different instances? A couple of posts about Trac vs Redmine: http://changelog.complete.org/archives/696-thoughts-on-redmine http://changelog.complete.org/archives/701-at-long-last-softwarecompleteorg-migrated-to-redmine Again, please excuse my 0.2 cents here, Ferran
Re: Fwd: Trac
Hello Roman, Thank you for the links, it was very interesting reading. Your experience also seems to caution against similar situations. In my previous job, we were also using Trac for multiple projects (namely 4), but they were really a separate projects so I can't say a lot about MultiProjects settings - but on the other hand, I can easily imagine it. But the basic question IMHO is whether CDS and INSPIRE need to be separate - whether they are INDEPENDENT - I don't think anyone can answer yes to that question, at least for inspire. Or if they are SEPARATE - answer might be yes, but separate in which way? In our case, the gravity point was people rather than projects. *I* want to know which high priority tickets do *I* have, regardless how (un)related the projects are. We want to know how many tickets are open since whatever, regardless the project. You can't do that with multiple Trac instances. If not technically, then is there a real need for MultiProjects setup? By this simple consideration array of technical nightmares might be gone, and if that is solved, there is no need to solve other stuff. Especially when 7 days ago Trac rolled out their MultiRepository support: http://trac.edgewall.org/ticket/130#comment:145 I didn't know about this new development, but they insist so much that their target is single-instance-single-project, and the #130 ticked is so old, that I fear that it might not be very clean. There is something more about Trac that confused me and my coworkers, and it is shared with Retrospectiva, I think (we haven't evaluated them thoroughly): this confusion between Login (which is unneeded to fill tickets) and Preferences. In our internal working scenario, we thought it was clearer to force a login before adding tickets or edit wiki pages. In both DrProject and Redmine you have to login and then you can set your preferences. I don't know enough about Basie. Ferran
Re: What does 'holding pen' mean?
Hello all, sorry to bother, but I don't know what 'holding pen' means, and how to translate it into Catalan and Spanish. Does anybody have a tentative translation into similar languages like French, Italian or Portuguese? In French: File des notices en attente? I like this one. And a free translation could be: «Registres a revisar» (Catalan) or «Registros para revisar» (Spanish)? Funny we do accomplish this function using a couple (or more, it depends) of collections not attached to the main collection. So librarians can visit them modify records, decide, and then we move either the records or the whole collection to the public tree. They are not really secret, as the pull down list of collections show them, but it accomplish well the job. Thanks, Ferran
What does 'holding pen' mean?
Hello, sorry to bother, but I don't know what 'holding pen' means, and how to translate it into Catalan and Spanish. Does anybody have a tentative translation into similar languages like French, Italian or Portuguese? Thanks, Ferran
Thank you for 0.99.1
Hello developers team, after our long overdue 0.99.1 migration, we would like to thank you Invenio developers for your fine work. Some of the improvements I (we) have appreciated the most during this migration are: - CFG_ACCESS_CONTROL_LEVEL_SITE, to put the old system readonly, - the newer, faster, faster! bibindex, - debugging bibformat information from bibformatadmin, - reuse of old bibsched numbers, - etc, etc. Again, our deepest appreciation to your wonderful job, Ferran
[patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)
Hello Tibor, this thread has put me more pressure to fix the non-working fulltext indexing in my sites. Our new 0.99.1 is in this week in beta (http://traces.uab.cat/). We'll announce it later on. I have seen that fulltext does not work in our case because the second indicator of 856 is hardcoded to be _. This is not necessarily so, according to the LC (http://www.loc.gov/marc/bibliographic/bd856.html). The following patch fixes it. What I don't understand is why the engine stops after (several) not found URLs. How can I convince it to ignore those URLs and keep with the other ones? Thanks, Ferran BibIndex: allow second indicator of 856 to be any value * Second indicator of 856 can have several values, not just _ (http://www.loc.gov/marc/bibliographic/bd856.html)
Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)
Oops, I forgot to refresh it. Here it comes in full. Ferran
Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)
I'm sorry about the noise. Here it comes the patch. Ferran BibIndex: allow second indicator of 856 to be any value * Second indicator of 856 can have several values, not just _ (http://www.loc.gov/marc/bibliographic/bd856.html) --- modules/bibindex/lib/bibindex_engine.py | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Index: cds-invenio/modules/bibindex/lib/bibindex_engine.py === --- cds-invenio.orig/modules/bibindex/lib/bibindex_engine.py 2009-12-03 13:09:53.0 +0100 +++ cds-invenio/modules/bibindex/lib/bibindex_engine.py 2009-12-03 13:10:24.0 +0100 @@ -463,7 +463,7 @@ def get_nothing_from_phrase(phrase, stemming_language=None): A dump implementation of get_words_from_phrase to be used when when a tag should not be indexed (such as when trying to extract phrases from -8564_u). +8565%u). return [] @@ -768,7 +768,7 @@ @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF @parm default_get_words_fnc: the default function called to extract words from a metadata @param tag_to_words_fnc_map: a mapping to specify particular function to -extract words from particular metdata (such as 8564_u) +extract words from particular metdata (such as 8565%u) self.index_id = index_id self.tablename = table_name_pattern % index_id @@ -1475,7 +1475,7 @@ if task_get_option(cmd) == check: wordTables = get_word_tables(task_get_option(windex)) for index_id, index_tags in wordTables: -wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext}) +wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8565%u': get_words_from_fulltext}) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) @@ -1485,7 +1485,7 @@ if task_get_option(cmd) == check: wordTables = get_word_tables(task_get_option(windex)) for index_id, index_tags in wordTables: -wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) +wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8565%u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) @@ -1499,7 +1499,7 @@ if task_get_option(reindex): reindex_prefix = tmp_ init_temporary_reindex_tables(index_id, reindex_prefix) -wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext}) +wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8565%u': get_words_from_fulltext}) _last_word_table = wordTable wordTable.report_on_table_consistency() try: @@ -1555,7 +1555,7 @@ task_sleep_now_if_required(can_stop_too=True) # Let's work on phrases now -wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) +wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8565%u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() try:
Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)
Well, well, well. 8565%u should be 564%u, of course. This mess is happening because I tested it on a installed 0.99.1 system and I ported it to my git updated tree. Again, sorry for that. Ferran BibIndex: allow second indicator of 856 to be any value * Second indicator of 856 can have several values, not just _ (http://www.loc.gov/marc/bibliographic/bd856.html) --- modules/bibindex/lib/bibindex_engine.py | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) Index: cds-invenio/modules/bibindex/lib/bibindex_engine.py === --- cds-invenio.orig/modules/bibindex/lib/bibindex_engine.py 2009-12-03 13:27:27.0 +0100 +++ cds-invenio/modules/bibindex/lib/bibindex_engine.py 2009-12-03 13:28:10.0 +0100 @@ -463,7 +463,7 @@ def get_nothing_from_phrase(phrase, stemming_language=None): A dump implementation of get_words_from_phrase to be used when when a tag should not be indexed (such as when trying to extract phrases from -8564_u). +8564%u). return [] @@ -768,7 +768,7 @@ @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF @parm default_get_words_fnc: the default function called to extract words from a metadata @param tag_to_words_fnc_map: a mapping to specify particular function to -extract words from particular metdata (such as 8564_u) +extract words from particular metdata (such as 8564%u) self.index_id = index_id self.tablename = table_name_pattern % index_id @@ -1476,7 +1476,7 @@ if task_get_option(cmd) == check: wordTables = get_word_tables(task_get_option(windex)) for index_id, index_tags in wordTables: -wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext}) +wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564%u': get_words_from_fulltext}) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) @@ -1486,7 +1486,7 @@ if task_get_option(cmd) == check: wordTables = get_word_tables(task_get_option(windex)) for index_id, index_tags in wordTables: -wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) +wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564%u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() task_sleep_now_if_required(can_stop_too=True) @@ -1500,7 +1500,7 @@ if task_get_option(reindex): reindex_prefix = tmp_ init_temporary_reindex_tables(index_id, reindex_prefix) -wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext}) +wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564%u': get_words_from_fulltext}) _last_word_table = wordTable wordTable.report_on_table_consistency() try: @@ -1556,7 +1556,7 @@ task_sleep_now_if_required(can_stop_too=True) # Let's work on phrases now -wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False) +wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564%u': get_nothing_from_phrase}, False) _last_word_table = wordTable wordTable.report_on_table_consistency() try:
[patch] Capture WebSubmit bibconvert errors
Hello Tibor, please adapt or add this patch if you think that could help others like it helped me. It applies cleanly to current git sources. Thanks, Ferran WebSubmit: capture stderr bibconvert messages * After submitting a form, errors go silent unless captured. --- modules/websubmit/lib/functions/Make_Record.py |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: cds-invenio/modules/websubmit/lib/functions/Make_Record.py === --- cds-invenio.orig/modules/websubmit/lib/functions/Make_Record.py 2009-11-18 15:18:52.0 +0100 +++ cds-invenio/modules/websubmit/lib/functions/Make_Record.py 2009-11-18 15:19:29.0 +0100 @@ -39,7 +39,7 @@ source = parameters['sourceTemplate'].replace( ,) create = parameters['createTemplate'].replace( ,) # We use bibconvert to create the xml record -call_uploader_txt = %s/bibconvert -l1 -d'%s' -Cs'%s/%s' -Ct'%s/%s' %s/recmysql % (CFG_BINDIR,curdir,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,source,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,create,curdir) +call_uploader_txt = %s/bibconvert -l1 -d'%s' -Cs'%s/%s' -Ct'%s/%s' %s/recmysql 2%s/recmysql.err % (CFG_BINDIR,curdir,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,source,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,create,curdir,curdir) os.system(call_uploader_txt) # Then we have to format this record (turn into amp; and into lt; # After all we know nothing about the text entered by the users at submission time
[task #12167] Allow the selection of full and brief format from the WebSearch collection interface
This is an automated notification sent by LCG Savannah. It relates to: task #12167, project CDS Invenio == OVERVIEW of task #12167: == URL: http://savannah.cern.ch/task/?12167 Summary: Allow the selection of full and brief format from the WebSearch collection interface Project: CDS Invenio Submitted by: fjorba Submitted on: 2009-10-27 11:06 Should Start On: 2009-10-27 00:00 Should be Finished on: 2009-10-27 00:00 Category: BibFormat Priority: 5 - Normal Status: None Privacy: Public Percent Complete: 0% Assigned to: None Open/Closed: Open Discussion Lock: Any Effort: 0.00 ___ The tools to create formats (.bft and .py) are powerful and programmer oriented. However, there is no easy way to assign those brief and full formats to a given collection. It would be nice to integrate those options to the WebSearch collection definition, like portalboxes, search options and other collection specific preferences. Maybe this could make .bfo files obsolete? I understand that HB and HD should be the default, good enough for most collection and sites, and it should remain the valid ones unless a specific choice has been made. Please note that this preference should be taken into account also when creating the first, cached page (http://cdsware.cern.ch/repo/?p=cds-invenio.gita=searchh=HEADst=greps=def+create_latest_additions_info) ___ Carbon-Copy List: CC Address | Comment +- 2111| -SUB- == This item URL is: http://savannah.cern.ch/task/?12167 ___ Message sent via/by LCG Savannah http://savannah.cern.ch/
[patch] Websearch: add a newline before printing Marc tags
Hi Tibor, could you please apply this tiny patch? I've refreshed it to apply cleanly to current git tree. Thanks, Ferran Websearch: add a newline before printing Marc tags The output of Marc format in search_engine.py is useful for offline processing. Adding a newline before pre simplifies it. --- Index: cds-invenio/modules/websearch/lib/search_engine.py === --- cds-invenio.orig/modules/websearch/lib/search_engine.py 2009-10-15 10:22:58.0 +0200 +++ cds-invenio/modules/websearch/lib/search_engine.py 2009-10-15 10:29:38.0 +0200 @@ -3371,16 +3371,16 @@ elif format == hm: if record_exist_p == -1: -out += pre + cgi.escape(get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980])) + /pre +out += \npre + cgi.escape(get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980])) + /pre else: -out += pre + cgi.escape(get_fieldvalues_alephseq_like(recID, ot)) + /pre +out += \npre + cgi.escape(get_fieldvalues_alephseq_like(recID, ot)) + /pre elif format.startswith(h) and ot: ## user directly asked for some tags to be displayed only if record_exist_p == -1: -out += pre + get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980]) + /pre +out += \npre + get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980]) + /pre else: -out += pre + get_fieldvalues_alephseq_like(recID, ot) + /pre +out += \npre + get_fieldvalues_alephseq_like(recID, ot) + /pre elif format == hd: # HTML detailed format
Re: [patch] Websearch: add a newline before printing Marc tags
Hello Tibor, Please note that for offline non-XML MARC processing, you may want to prefer the `Text MARC' output format (tm) rather than the `HTML MARC' (hm) one. $ wget -O z.txt 'http://invenio-demo.cern.ch/search?p=ellisof=tm' Oh, that's great! I didn't know this one. I see that it even works with 0.92.1. Thanks again, Ferran
Re: bibupload permission issue
Hello Tibor, On Thu, 09 Jul 2009, Theodoropoulos Theodoros wrote: I uploaded several files (with bibupload, using FFT syntax) and I realized that the actual files/directories were created with root:root permissions (probably because it was root user that run bibupload). This is OK in general, but later web-submitted actions with SRV/bibdocfile for that record's docfile produce errors (permission denied). Yes; the dirs had better be owned by (or made writable by) Apache. In the git/master version of CDS Invenio, we have improved user checking so as to strictly enforce that bibsched tasks (including bibupload) would run under the same user identity as the Apache application. (See also CFG_BIBSCHED_PROCESS_USER.) Maybe we could go one step further and, as part of the installation process, instruct how to create a new user called `invenio' and to run the whole shebang under its identity... That's what I have been doing since I started with Invenio a while ago [1] following a suggestion your suggestion [2]. Otherwise the whole issue of files and dirs ownership became a mess. And now, while (still) working on two different instances of Invenio in a single server, this solution has been more clear. Today we are working on how to make two instances of Apache running as two different users while still keeping as much as Debian original configuration as possible. If you are thinking about this `invenio' user, please don't make it hardcoded, because in my case I prefer having the specific name of each of my instances. Thanks, Ferran [1] http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00355.shtml [2] http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00109.shtml
[patch] Allow no apache password and group files
Hello, am I the only one? If I don't configure CFG_APACHE_{PASSWORD,GROUP}_FILE, I'm getting this error: $ ~/invenio/bin/inveniocfg --create-tables Going to create and fill tables... Testing DB connection... ok Testing Python/MySQL/MySQLdb UTF-8 chain... ok Going to reset CFG_SITE_NAME and CFG_SITE_NAME_INTL... You may want to restart Apache now. CFG_SITE_NAME and CFG_SITE_NAME_INTL* reset successfully. Going to reset CFG_SITE_ADMIN_EMAIL... You may want to restart Apache now. CFG_SITE_ADMIN_EMAIL reset successfully. Going to reset I18N field names... I18N field names reset successfully. Traceback (most recent call last): File /home/invenio/invenio/bin/webaccessadmin, line 28, in module from invenio.webaccessadmin_lib import main File /usr/local/lib/python2.5/site-packages/invenio/webaccessadmin_lib.py, l ine 48, in module import invenio.access_control_engine as acce File /usr/local/lib/python2.5/site-packages/invenio/access_control_engine.py , line 31, in module from invenio import webuser File /usr/local/lib/python2.5/site-packages/invenio/webuser.py, line 1049, i n module _apache_passwords = _load_apache_password_file() File /usr/local/lib/python2.5/site-packages/invenio/webuser.py, line 1044, i n _load_apache_password_file for row in open(os.path.join(CFG_TMPDIR, apache_password_file)): IOError: [Errno 21] Is a directory I attach a trivial and obvious solution, unless I'm missing something. (BTW, it is relative to 0.99.1) Best regards, Ferran WebUser: allow no apache password and group files * Check whether apache password and group file exists before trying to open the file to prevent an error when creating tables. --- lib/python/traces/webuser.py | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) Index: invenio/lib/python/traces/webuser.py === --- invenio.orig/lib/python/traces/webuser.py 2009-05-20 16:21:34.0 +0200 +++ invenio/lib/python/traces/webuser.py 2009-05-20 16:29:03.0 +0200 @@ -1041,10 +1041,11 @@ def _load_apache_password_file(apache_password_file=CFG_APACHE_PASSWORD_FILE): ret = {} -for row in open(os.path.join(CFG_TMPDIR, apache_password_file)): -row = row.split(':') -if len(row) == 2: -ret[row[0].strip()] = row[1].strip() +if apache_password_file: +for row in open(os.path.join(CFG_TMPDIR, apache_password_file)): +row = row.split(':') +if len(row) == 2: +ret[row[0].strip()] = row[1].strip() return ret _apache_passwords = _load_apache_password_file() @@ -1060,16 +1061,17 @@ def _load_apache_group_file(apache_group_file=CFG_APACHE_GROUP_FILE): ret = {} -for row in open(os.path.join(CFG_TMPDIR, apache_group_file)): -row = row.split(':') -if len(row) == 2: -group = row[0].strip() -users = row[1].strip().split(' ') -for user in users: -user = user.strip() -if user not in ret: -ret[user] = [] -ret[user].append(group) +if apache_group_file: +for row in open(os.path.join(CFG_TMPDIR, apache_group_file)): +row = row.split(':') +if len(row) == 2: +group = row[0].strip() +users = row[1].strip().split(' ') +for user in users: +user = user.strip() +if user not in ret: +ret[user] = [] +ret[user].append(group) return ret _apache_groups = _load_apache_group_file()
Re: [patch] Allow no apache password and group files
Replying to myself, there were a leftover of my test paths, so the patch could not be applied. Please discard the previous patch. I attach my correction. Sorry, Ferran WebUser: allow no apache password and group files * Check whether apache password and group file exists before trying to open the file to prevent an error when creating tables. --- lib/python/invenio/webuser.py | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) Index: invenio/lib/python/invenio/webuser.py === --- invenio.orig/lib/python/invenio/webuser.py 2009-05-20 16:21:34.0 +0200 +++ invenio/lib/python/invenio/webuser.py 2009-05-20 16:29:03.0 +0200 @@ -1041,10 +1041,11 @@ def _load_apache_password_file(apache_password_file=CFG_APACHE_PASSWORD_FILE): ret = {} -for row in open(os.path.join(CFG_TMPDIR, apache_password_file)): -row = row.split(':') -if len(row) == 2: -ret[row[0].strip()] = row[1].strip() +if apache_password_file: +for row in open(os.path.join(CFG_TMPDIR, apache_password_file)): +row = row.split(':') +if len(row) == 2: +ret[row[0].strip()] = row[1].strip() return ret _apache_passwords = _load_apache_password_file() @@ -1060,16 +1061,17 @@ def _load_apache_group_file(apache_group_file=CFG_APACHE_GROUP_FILE): ret = {} -for row in open(os.path.join(CFG_TMPDIR, apache_group_file)): -row = row.split(':') -if len(row) == 2: -group = row[0].strip() -users = row[1].strip().split(' ') -for user in users: -user = user.strip() -if user not in ret: -ret[user] = [] -ret[user].append(group) +if apache_group_file: +for row in open(os.path.join(CFG_TMPDIR, apache_group_file)): +row = row.split(':') +if len(row) == 2: +group = row[0].strip() +users = row[1].strip().split(' ') +for user in users: +user = user.strip() +if user not in ret: +ret[user] = [] +ret[user].append(group) return ret _apache_groups = _load_apache_group_file()
[patch] draft: add xpdf and ghostscript as alternatives to Acrobat
Hi Tibor, when configuring 0.99.1, I've noticed that you only consider Adobe Reader and Distiller as converters to/from PDF and PS. Fortunately, both xpdf (with pdftops and pstopdf) and ghoscript (pdf2ps and ps2pdf) are valid alternatives. I started to draft a patch, but I'm ignorant enough about autotools and friends that at this moment I cannot spend the time I need to complete it. So I've done the trivial part of the easy half, leaving the rest to you ;-) BTW, do you prefer the patches sent as attachments or inline? Thanks, Ferran WebSubmit: add xpdf and ghostscript as alternatives to Acrobat * This is just an early draft * Both xpdf and ghostscript provide converters from/to PDF and PS. * TODO: add them in the INSTALL, configure, etc. --- modules/websubmit/lib/functions/Shared_Functions.py | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) Index: cds-invenio-0.99.1/modules/websubmit/lib/functions/Shared_Functions.py === --- cds-invenio-0.99.1.orig/modules/websubmit/lib/functions/Shared_Functions.py 2009-05-18 11:58:05.0 +0200 +++ cds-invenio-0.99.1/modules/websubmit/lib/functions/Shared_Functions.py 2009-05-18 12:01:18.0 +0200 @@ -41,8 +41,16 @@ filename, extension = os.path.splitext(filename) extension = extension.lower() if extension == .pdf: +conversion = # Create PostScript -os.system(%s -toPostScript %s % (CFG_PATH_ACROREAD, fullpath)) +if (CFG_PATH_PDFTOPS): +conversion = %s %s % (CFG_PATH_PDFTOPS, fullpath) +elif (CFG_PATH_PDF2PS): +conversion = %s %s % (CFG_PATH_PDF2PS, fullpath) +elif (CFG_PATH_ACROREAD): +conversion = %s -toPostScript %s % (CFG_PATH_ACROREAD, fullpath) +if conversion: +os.system(conversion) if os.path.exists(%s/%s.ps % (basedir, filename)): os.system(%s %s/%s.ps % (CFG_PATH_GZIP, basedir, filename)) createdpaths.append(%s/%s.ps.gz % (basedir, filename))
Support for Canonical Link in Invenio?
Hello, I just came across this information, new to me, and before I forget, I thought to share it with you: http://www.dullest.com/blog/canonical-link-tag/ It'd say Invenio would benefit from it. Best regards, Ferran
[patch] bibupload --correct documentation fix
Hi, yesterday I ran 'bibupload --correct' for the first time in my test machine, and it works great *except* that it is not stated in the public documentation that both the tags *and* the indicators must be identical in order to be replaced. It is documented in the source code: http://cdsware.cern.ch/repo/?p=cds-invenio.git;a=blob;f=modules/bibupload/lib/bibupload.py;h=47c88a583f73462d5200479b132ac7ac5e8d8964;hb=bbd312365602dcee3ba2904b1b3851c6b3f8e604#l1668 But not in the documentation http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide#3.2 This patch (relative to current git sources) may fix it. Thanks, Ferran BibUpload: improve --correct documentation Make clear that both the tags and the indicators must match when the --correct option is used. --- modules/bibupload/doc/admin/bibupload-admin-guide.webdoc |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: cds-invenio/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc === --- cds-invenio.orig/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc 2009-04-29 08:51:21.0 +0200 +++ cds-invenio/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc 2009-04-29 08:58:04.0 +0200 @@ -151,8 +151,9 @@ -c, --correct Correct fields of existing records by those from XML MARC file. The original record - content is modified only in the fields from - the XML MARC file: the original fields are + content is modified only on those fields from + the XML MARC file where both the tags and the + indicators match: the original fields are removed and replaced by those from the XML MARC file. Fields not present in XML MARC file are not changed (unlike the -r option).
Mailing list public archives not updated since June
Hello, please note that the public archives haven't been updated since June: http://cdsware.cern.ch/lists/project-cdsware-users/archive/date.shtml#01220 Thanks, Ferran
Re: post-release fun, episode 1
Hello Tibor et al, guess you have also found this link, courtesy of a Google alert for 'cds invenio': http://www.ohloh.net/projects/invenio Enjoy, Ferran
Re: post-release fun, episode 1
Hi Samuele, was me who have added Invenio to Ohloh for experimenting (since we're an OpenSource project) :-) Good, then But it Ohloh has good features in order to discover implicit links between CDS Invenio and the surrounding software environment... Bye! Samuele Or comparing it with others: http://www.ohloh.net/projects/compare?metric=Codebaseproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace http://www.ohloh.net/projects/compare?metric=Activityproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace http://www.ohloh.net/projects/compare?metric=Contributorsproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace So far about Java verbosity... Thanks, Guido :-) ! Ferran
Re: post-release fun, episode 1
Hello Tibor, [...] P.S. I do agree with your Java remark; my favourite example is anonymous functions: Myself, with some white hairs in my beard, Java verbosity reminds me this so much: 001000$control optimize 001100 identification division. 001200 program-id. 001300 staglist. 001400 author. 001500 Ferran Jorba. 001600 installation. 001700 UAB. 001800 date-written. 001900 January 1991. 002000 Rewritten and translated into English, August, 1992. 002100 Change default-tags, 31 Jan 1994 002200 remarks. 002300 This program takes the output of MARC records 002400 from MARCPRT or HOLDPRT and reformats the output 002500 to one physical record per field or tag, for wide- 002600 paper printer (132 chars). 002700 The output record consists on the control number 002800 (BIB-ID, AUTH-ID or HOLDINGS-ID), level information, 002900 tag number, indicators and the textual data. 003000 It also allows the selection of a the set of tags 003100 to be printed. 003200 003300 003400 environment division. 003500 003600 configuration section. 003700 003800 source-computer. 003900 HP-3000. 004000 object-computer. 004100 HP-3000 sequence is hpascii. 004200 004300 special-names. 004400 hpascii is standard-1. etc..., before the meat begins. For me, Java is the new Cobol, necktie included ;-( Ferran
Re: RFC: replacing CVS
Hello Tibor et al, I know that my contribution to this discussion is absolutely marginal, because I'm not a core contributor, and for my little patches, I'm perfectly happy with quilt. But, like you, I have been following all this SCM-DSCM business during the last few years, and I have a need myself to keep track of my scripts and utilities. My only real experience (read+write) with SCM was with CVS many years ago, and I have used Subversion for downloading a couple of software packages and have some tests with tla, Git and Mercurial. I can subscribe your original conclussions, and I have not much to add to what the others have said, except some random thoughts. As we have settled with DrProject[1], a trac fork that can handle multiple projects our wiki and task handling, and both trac and DrProject have good integration with Subversion, in principle I was favouring this option. But my readonly experiences with Subversion has been quite disappointing when I saw that, when downloading software, it cannot keep the original timestamps of the files, something that CVS does perfectly. This annoys me so much! My tests with Git and Mercurial have been driven by a quest to find a nice tool for distributed storage for digital preservation[2]. We did some stress-test ingesting both with 500 Gb of fat (25-75 MB) Tiff files + some technical metadata[3]. What we found is that both are similarly capable and both handled this load with similar timings and overhead. We were specially interested in Git's content-addressable-filesystem[4] concept and the hash-tree ability to check if two repositories are identical. We havent' concluded anything yet except that, compared with the mighty Git, Mercurial is no toy either. What else? As there is no attractive (to me) centralised SCM option, and being myself a low-profile developer, I'd be happy to go to a DSCM (it can be fun), but the easier the better. I'd be more than happy with Mercurial. Moreover, Mercurial has reached 1.0 this week, it has enthusiastic followers[5], and has an Emacs frontend[6]. My cent, Ferran [1] http://www.drproject.org/ [2] See a summary of our preliminary findings at http://www.cesca.es/promocio/congressos/tsiuc2007/FerranJorba.pdf [3] This .info file contains the md5sum and the output of ImageMagick's `identify -verbose' (http://www.imagemagick.org/script/identify.php). [4] http://en.wikipedia.org/wiki/Content-addressable_storage [5] For example, the first comment at http://lwn.net/Articles/274823/ [6] http://freehg.org/u/agriggio/ahg/
Index browsing doesn't work when default language is not English
Hi again, here is my report. When Invenio (0.92.1) is installed with a default language other than English, index browsing doesn't work well; specifically, when going to next page. Example: http://ddd.uab.cat/search?p=smithf=authoraction_browse=Llistac=sf=so=drm=rg=10sc=0of=hb and click 'Següent' (= next). BTW, you can also see that some texts are in Catalan even when changing language. Author browsing works fine in other 0.92.1 installations: http://cdsweb.cern.ch/search?p=smithf=authoraction_browse=Browsec=sf=so=drm=rg=10sc=1of=hb http://romdoc.upb.ro/search?sc=1p=smithf=authoraction_browse=Browsec=Articles+%26+Preprintsc=Books+%26+Reportsc=Conferencesc=Multimedia+%26+Artsc=Periodicalsc=Presentationsc=UPB+Museumc=Workshops http://sysdoc.com.dtu.dk/search?sc=1p=smithf=authoraction_browse=Browsec=Journal+and+Conference+Articles+c=Thesesc=Books+and+Book+Chaptersc=Teaching+Materialc=Reportsc=Multimedia etc. Thanks, Ferran
Re: Index browsing doesn't work when default language is not English
Pomoc Tibor, The CVS version has the same problem, but it is easy to fix. I hope to get to it in the afternoon... Done. I fixed a couple of other browse issues at the same time too. Vďaka, Ferran
Re: Index browsing doesn't work when default language is not English
Hi Tibor, here is my report. When Invenio (0.92.1) is installed with a default language other than English, index browsing doesn't work well; specifically, when going to next page. The CVS version has the same problem, but it is easy to fix. I hope to get to it in the afternoon... Oh, great! Thanks so much, Ferran
Re: RFC: Invenio/Indico release numbering
Hello Tibor. We have not yet proof-read English messages, so they may still change, so better to do nothing still. Possibly we shall release 0.99.0 with old PO files, and incorporate translations into 0.99.1 only. Now that our large codebase changes are mostly over, we do plan on releasing more frequently. ;-) Ok with that 0.99.1. I have bug report/enhancement request that I'd really like to see it fixed before next release. Where should I post it? -users or -developers list? Better to -developers. If the patches are big, then just send them to my personal address. No, sorry I don't have any patch. I'll write another mail to the -developers list, anyway. Also, since no one responded to your concrete ADMINEMAIL vs SUPPORTEMAIL scenario, please feel free to adapt every send_email() call to your needs. ;-) No problem, thanks, Ferran
Re: RFC: Invenio/Indico release numbering
Hello Tibor, [...] 4) If agreed, I'll change our docs accordingly. We plan to release Agreed. CDS Invenio 0.99.0 pretty soon, possibly in a week or two. Ooops! Please give us (the translators) a little more margin. I do that part of my job volunteer from home (like now), but I'm really short of time. Moreover, a couple of weeks ago you wrote that we (the translators) should do nothing about some change I forgot ;-) Please tell us what you think. I have bug report/enhancement request that I'd really like to see it fixed before next release. Where should I post it? -users or -developers list? Thanks, Ferran
Re: CVS Commit Overview for 2008-02-15
[...] Moved important runtime parameters (URLs, DB credentials) from configure options into invenio.conf. This permits to install Invenio without thinking too much about them in advance, and to change them later after initial demo tests are successful, without having to reinstall. Great! Thank you! Ferran
FYI: Slope One Collaborative filtering algorithm
Hi, as I still don't have 'People who viewed this page also viewed' in my site due to being intolerably slow (http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00742.shtml), I've accidentally found this entry that might help: http://en.wikipedia.org/wiki/Slope_One (Not me, unfortunately. I'm overwhelmed with the fascinating work of migrating our ILS...) Ferran
Re: CVS Commit Overview for 2008-01-16
Hello Samuele, 2008-01-16 Samuele Kaplun samuele.kap...@cern.ch * modules/websubmit/lib/bibdocfile.py: Updated normalize format to transform '.jpg' into '.jpeg'. Does anybody of any similar aliases? Hmmm. What's the point? Why this 'normalisation' but not, for example, turning any uppercase to lowercase? And why only for jpeg files and not PDFs, png, or any other one? As far as I know, .jpg is as correct as .jpeg. For me, please don't. Or at least, don't hardcode it. I don't think it should be part of Invenio code to rename a single case of discrepancies. And, I know it is a sort of hack, but I have in the oven a visual revamp of our installation (http://ddd.uab.cat/pub/demo/hispania/) and I use different extensions, or having some of them lowercase or uppercase to have different kind of thumbnails. Let us enjoy this flexibility, please. My 2 cents, Ferran
Re: RFC: new Invenio config file
Hi Tibor, The next step in the process of removing the dependency on WML is to replace the config.wml configuration technique we've been using so far. Here is a brief outline of a schema I've been thinking about. [...] I like it; I like specially this magic inveniocfg tool! I have nothing to add, really. The only grip is rather aesthetic; I'm getting tired about so much XML everywhere. And yes, I know that Invenio is heavy consumer and producer of XML, so... But if the config file would be written in something simpler (for example JSON, see http://en.wikipedia.org/wiki/JSON) instead, I think it would be easier both for humans and programs. But again, I'd be more than happy if this roadmap gets implemented. Thanks for asking, Ferran
Re: test, sorry [3] -- cannot send with Gnus
Hello again, Sorry for that test; I'm havig problems posting to the -users list from gnus+local-exim-config-mail-send-by-smarthost (it used to work before holidays); trying to see if the -developers works. take two. Does anybody receive this? take three, now with thunderbird (oh, yes, icedove). After several months happily feeling more confortable with Gnus, now I cannot send to the cdsware lists with Gnus; the mails seem to get lost (I only get my own BCC copy). This problem doesn't happen (a) using my Gnus to other destinations (inside or outside UAB) or (b) sending mails from Icedove to the cdsware lists. Tibor, may I ask for some ideas to help a Gnus newbie? Thanks, Ferran
Re: test, sorry [3] -- cannot send with Gnus
[Writing from Icedove, as otherwise I'm unsure it will reach you] Thank you Tibor, and sorry for this unwanted noise. Hmm, was your user-mail-address generated as ferran.jo...@uab.cat which is the identity that the CERN listserver knows you under? Did you send both mails from the same computer via the same SMTP MTA chain? (CERN blacklists some public outgoing mail servers...) I didn't modify my Gnus setup. Relevant .gnus.el lines are: (setq gnus-select-method '(nntp news.uab.cat)) (add-to-list 'gnus-secondary-select-methods '(nnimap uab (nnimap-address imap.uab.cat) (imap-username 141) (nnimap-port 143) (nnimap-list-pattern (INBOX */*)) ) '(nntp news.gmane.org) ) (setq user-mail-address ferran.jo...@uab.cat) (setq user-full-name Ferran Jorba) (setq message-default-mail-headers Bcc: ferran.jo...@uab.cat\n) (setq smtpmail-smtp-server smtp.uab.cat) Unless it is due to my own backport of Emacs22 to Etch, and using the standard Gnus (User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)). But again, it works everywhere except to your CERN lists. Tibor, may I ask for some ideas to help a Gnus newbie? Sure. Please forward me the full message (all headers included) of an email sent from Gnus that did not pass through. I attach my own BCC of my previous mail. Thanks, Ferran Received: from istanbul.uab.es (localhost [127.0.0.1]) by istanbul.uab.es (Sun Java System Messaging Server 6.1 HotFix 0.10 (built Jan 6 2005)) with ESMTP id 0joi009blnpvz...@istanbul.uab.es for ferran.jo...@uab.cat; Mon, 17 Sep 2007 16:17:56 +0200 (CEST) Received: from pfff.si.uab.es ([158.109.165.130]) by istanbul.uab.es (Sun Java System Messaging Server 6.1 HotFix 0.10 (built Jan 6 2005)) with ESMTPS id 0joi009hrnpvf...@istanbul.uab.es for ferran.jo...@uab.cat; Mon, 17 Sep 2007 16:17:55 +0200 (CEST) Received: from fjorba by pfff.si.uab.es with local (Exim 4.63) (envelope-from fjorba@localhost.localdomain) id 1IXHMf-0006Uc-NH; Mon, 17 Sep 2007 16:14:17 +0200 List-Post: project-invenio-devel@cern.ch Date: Mon, 17 Sep 2007 16:14:17 +0200 From: Ferran Jorba ferran.jo...@uab.cat Subject: Re: test, sorry [2] In-reply-to: 87ejgx3h3s@pfff.si.uab.es To: project-cdsware-developers project-cdsware-develop...@cern.ch Message-id: 87odg1z8ee@pfff.si.uab.es Organization: Universitat Autonoma de Barcelona MIME-version: 1.0 Content-type: text/plain; charset=us-ascii References: 87ejgx3h3s@pfff.si.uab.es User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux) Hi, Sorry for that test; I'm havig problems posting to the -users list from gnus+local-exim-config-mail-send-by-smarthost (it used to work before holidays); trying to see if the -developers works. take two. Does anybody receive this? Thanks, Ferran
Re: test, sorry [3] -- cannot send with Gnus
Vďaka! (Ah, those helpful .po files!) Ferran
Re: CVS Commit Overview for 2007-07-11
Hello Samuele, 2007-07-11 Samuele Kaplun samuele.kap...@cern.ch * modules/websubmit/lib/websubmit_config.py: Added png to the list of recognized extension. Are there any other extensions missing? Do we need this list? I don't know about this one, but I have to maintain a parallel one about graphic files (currently jpg, gif and png), so I can apply a different logic for URLs if they are images (typically for icons) or fulltext; see an early implementation at http://ddd.uab.cat/record/17654. Thanks, Ferran
Re: Make test errors + config.wml settings disregarded after upgradingfrom 0.7.1 to 0.92.1
Hi Torger, I have an update to Problem 2 below: The default style only appears on the Search, Submit, Personalize, and Administration pages. For some strange reason, the Help pages appear in my custom style. Default style appears here: http://sysdoc.com.dtu.dk/ My custom style appears here: http://sysdoc.com.dtu.dk/help/index.en.html If you are talking about the logo, we have put our own logo using but using Invenio default filename (logo.gif). I see that this file does not exist in your installation (http://sysdoc.com.dtu.dk/img/logo.gif) as it does in ours (http://ddd.uab.cat/img/logo.gif). Your own logo is in a different path (http://sysdoc.com.dtu.dk/img/SYSDOC/COMDTU.png). Try to copy it to the expected place (or symlink it). Hope it helps, Ferran
Re: Implementing METS and PREMIS in Invenio. Ideas?
Hello Jerome, Unfortunately we have no resource available for implementing PREMIS or METS in CDS Invenio this year. Still we have been discussing internally about support for these standards in the past, and would be interested to collaborate as much as we can if you are willing to implement these standards on your side. I'll be happy to help, although I cannot be a full-time developer, that is not my job here at UAB. I'm glad to provide fixes, ideas, testing, small patches, as always (I do the translations at home, during my copious free time), but most of my time is trying to make things work, not to develop large projects. The implementation in CDS Invenio does seem feasible without big changes in the software, although a deeper analysis would be necessary. I agree with you. I'd like to see more real world examples about how it is implemented, because I cannot imagine how a METS record with descripive metadata (MARCXML), some rights and origin information, with strong structural information and MIX detail for each scanned page of a medium-size journal (ex., our http://ddd.uab.cat/record/17654). Retrieving such a METS record could put the system at its knees. Do you have some news about this project/petition since your your last email? Nope. Yours is the first. Thanks, Ferran
Sorting with diacritics patch
Hi all, please take this fix into consideration. It fixes this case: http://ddd.uab.cat/search?ln=enp=Enginyer+Qu%C3%ADmicf=titulacioaction_search=Searchc=sf=titleso=arm=rg=10sc=0of=hb Now Àlgebra is the first in the list; otherwise, it was the last. This patch is against current CVS; in 0.92.1 is around line 2243. Thanks, Ferran This fixes sorting words that have diacritics, specially in the first word like Agrave;lgebra, because otherwise they appear at the end of the list. --- modules/websearch/lib/search_engine.py |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) Index: cds-invenio/modules/websearch/lib/search_engine.py === --- cds-invenio.orig/modules/websearch/lib/search_engine.py 2007-06-11 12:52:08.352732646 +0200 +++ cds-invenio/modules/websearch/lib/search_engine.py 2007-06-11 12:58:37.810073415 +0200 @@ -2415,7 +2415,7 @@ else: # no sort pattern defined, so join them all together val = string.join(vals) -val = val.lower() +val = strip_accents(val.lower()) if recIDs_dict.has_key(val): recIDs_dict[val].append(recID) else:
Should I install v0.92.1.20070412?
Hi CERNers, I'm writing to the internal, developers, list due that I've read some info in the CVS messages. I'm planning to upgrade our public system on Monday April 30, due that 1st of May is holidays, and it will impact fewer users. I was planning to install 0.92.1, but this 'internal release' has intrigued me. Should I better ignore it and go for the official one? Ferran
Re: Should I install v0.92.1.20070412?
Hello Tibor, Putting Invenio v0.92.1 into production should be fine. There have been some fixes since that release, but nothing particularly show-stopping. Thank you for your advice, Ferran
Re: [patch] collection-management-wording.patch
Hello Tibor (and Nick), Thanks for the suggestion. We have now changed the wording of this box with Nick. It does not go exactly in the way you suggested, but the wording should be clearer than it was before. Hope you will like it. ;-) Of course I won't even try to discuss proper English usage with a native speaker! And yes, I've just seen and I like it. Now I don't think new users will find themselves as lost as I was when faced to the original dialog texts (http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00138.shtml). And those texts are specially important for new users. [...] Do not hesitate to keep alerting us if you find something especially striking. I'll do it. Thanks again, Ferran
[patch] collection-management-wording.patch
Hi Tibor, please consider this tiny patch for inclusion. Thanks, Ferran I hope those changes in the wording makes collection management a little bit more comprehensible. --- modules/websearch/lib/websearchadminlib.py |8 1 files changed, 4 insertions(+), 4 deletions(-) Index: cds-invenio/modules/websearch/lib/websearchadminlib.py === --- cds-invenio.orig/modules/websearch/lib/websearchadminlib.py 2007-02-15 17:25:52.0 +0100 +++ cds-invenio/modules/websearch/lib/websearchadminlib.py 2007-04-11 14:26:20.931396148 +0200 @@ -255,7 +255,7 @@ output = show_coll_not_in_tree(colID, ln, col_dict) text = -span class=adminlabelAttach which/span +span class=adminlabelCollection name/span select name=add_son class=admin_w200 option value=- select collection -/option @@ -264,9 +264,9 @@ text += option value=%s %s%s/option % (id, str(id)==str(add_son) and 'selected=selected' or '', name) text += /selectbr -span class=adminlabelAttach to/span +span class=adminlabelAttached to/span select name=add_dad class=admin_w200 -option value=- select parent collection -/option +option value=- choose parent collection -/option for (id, name) in col_list: @@ -278,7 +278,7 @@ text += span class=adminlabelRelationship/span select name=rtype class=admin_w200 -option value=- select relationship -/option +option value=- choose relationship -/option option value=r %sRegular (Narrow by...)/option option value=v %sVirtual (Focus on...)/option /select
[patch] portable-ps.patch
Hi Tibor, I've updated my portable patch implementation I posted some months ago to the users list. Ferran A more portable implementation that should work on Linux, Solaris and Digital Unix. Not tested on FreeBSD. It should also be faster than CDS Invenio 0.92.0, implementation, because it calls only one external command (ps), and not 3 (ps, grep --twice-- and sed). --- modules/bibsched/lib/bibsched.py | 19 +-- 1 files changed, 9 insertions(+), 10 deletions(-) Index: cds-invenio/modules/bibsched/lib/bibsched.py === --- cds-invenio.orig/modules/bibsched/lib/bibsched.py 2007-03-01 15:41:38.0 +0100 +++ cds-invenio/modules/bibsched/lib/bibsched.py 2007-04-11 14:26:46.301060324 +0200 @@ -44,6 +44,7 @@ import curses.panel from curses.wrapper import wrapper import signal +import pwd from invenio.config import \ CFG_PREFIX, \ @@ -75,16 +76,14 @@ return None def get_my_pid(process, args=''): -if sys.platform.startswith('freebsd'): -COMMAND = ps -o pid,args | grep '%s %s' | grep -v 'grep' | sed -n 1p % (process, args) -else: -COMMAND = ps -C %s o '%%p%%a' | grep '%s %s' | grep -v 'grep' | sed -n 1p % (process, process, args) -answer = string.strip(os.popen(COMMAND).read()) -if answer == '': -answer = 0 -else: -answer = answer[:string.find(answer,' ')] -return int(answer) +COMMAND = ps -fu %s % (pwd.getpwuid(os.getuid())[0]) +pslist = os.popen(COMMAND).readlines() +str = string.join([process,args]) +answer = 0 +for ps in pslist: +if ps.find(str) 0: +answer = int(ps.split()[1]) +return answer def get_output_channelnames(task_id): Construct and return filename for stdout and stderr for the task 'task_id'.
[patch] zero-record-collections-deserve-a-zero.patch
Hi Tibor, my sortest patch? Ferran When a collection has zero items and this zero is not displayed, there is no clear separator between this collection name and the next, and it confuses users because it seems a single one. --- modules/websearch/lib/websearch_templates.py |3 ++- 1 files changed, 2 insertions(+), 1 deletion(-) Index: cds-invenio/modules/websearch/lib/websearch_templates.py === --- cds-invenio.orig/modules/websearch/lib/websearch_templates.py 2007-04-11 14:24:21.057158259 +0200 +++ cds-invenio/modules/websearch/lib/websearch_templates.py 2007-04-11 14:26:24.861879334 +0200 @@ -809,7 +809,8 @@ ')/small') -if number is None: return '' +if number is None: +number = 0 if prolog is None: prolog = '''nbsp;small class=nbdoccoll('''
[patch] Add missing semicolon after some nbsp;
Hi Tibor, subject says it all. Thanks, Ferran Trivial: Add ; to some nbsp. I noted them when debugging some of my pages. Index: cds-invenio-0.92.0/modules/bibformat/lib/bibformat_templates.py === --- cds-invenio-0.92.0.orig/modules/bibformat/lib/bibformat_templates.py 2007-01-26 10:22:41.636461075 +0100 +++ cds-invenio-0.92.0/modules/bibformat/lib/bibformat_templates.py 2007-01-26 10:25:29.688622760 +0100 @@ -1455,19 +1455,19 @@ name = format_template['name'] filename = format_template['filename'] out += '''trtda href=format_template_show?bft=%(filename)samp;ln=%(ln)s%(name)s/a/td -tdnbsp/tdtdnbsp/td/tr''' % {'filename':filename, -'name':name, -'ln':ln} +tdnbsp;/tdtdnbsp;/td/tr''' % {'filename':filename, + 'name':name, + 'ln':ln} for format_element in format_template['elements']: name = format_element['name'] filename = format_element['filename'] -out += '''trtdnbsp/td +out += '''trtdnbsp;/td tda href=format_elements_doc?ln=%(ln)s#%(anchor)s%(name)s/a/td -tdnbsp/td/tr''' % {'anchor':name.upper(), - 'name':name, - 'ln':ln} +tdnbsp;/td/tr''' % {'anchor':name.upper(), + 'name':name, + 'ln':ln} for tag in format_element['tags']: -out += '''trtdnbsp/tdtdnbsp/td +out += '''trtdnbsp;/tdtdnbsp/td td%(tag)s/td/tr''' % {'tag':tag} out += ''' Index: cds-invenio-0.92.0/modules/bibharvest/lib/bibharvest_templates.py === --- cds-invenio-0.92.0.orig/modules/bibharvest/lib/bibharvest_templates.py 2007-01-26 10:22:41.663457567 +0100 +++ cds-invenio-0.92.0/modules/bibharvest/lib/bibharvest_templates.py 2007-01-26 10:26:35.580059372 +0100 @@ -73,7 +73,7 @@ guidetitle = _(See Guide) titlebar = a name=%s % title -titlebar += /a%snbspnbspnbspsmall % subtitle +titlebar += /a%snbsp;nbsp;nbsp;small % subtitle titlebar += [a title=%s href=%s/%s?/a]/small % (guidetitle, weburl, guideurl) return titlebar Index: cds-invenio-0.92.0/modules/bibindex/lib/bibindexadminlib.py === --- cds-invenio-0.92.0.orig/modules/bibindex/lib/bibindexadminlib.py 2007-01-26 10:22:41.690454060 +0100 +++ cds-invenio-0.92.0/modules/bibindex/lib/bibindexadminlib.py 2007-01-26 10:27:51.548185829 +0100 @@ -277,7 +277,7 @@ def perform_editindexes(ln=cdslang, callback='yes', content='', confirm=-1): show a list of indexes that can be edited. -subtitle = a name=2/a2. Edit indexnbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl) +subtitle = a name=2/a2. Edit indexnbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl) fin_output = '' idx = get_idx() @@ -311,7 +311,7 @@ def perform_editfields(ln=cdslang, callback='yes', content='', confirm=-1): show a list of all logical fields that can be edited. -subtitle = a name=5/a5. Edit logical fieldnbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl) +subtitle = a name=5/a5. Edit logical fieldnbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl) fin_output = '' @@ -385,7 +385,7 @@ idx_dict = dict(get_def_name('', idxINDEX)) if idxID and idx_dict.has_key(int(idxID)): idxID = int(idxID) -subtitle = a name=2/a2. Modify translations for index.nbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % weburl +subtitle = a name=2/a2. Modify translations for index.nbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % weburl if type(trans) is str: trans = [trans] @@ -464,7 +464,7 @@ fld_dict = dict(get_def_name('', field)) if fldID and fld_dict.has_key(int(fldID)): fldID = int(fldID) -subtitle = a name=3/a3. Modify translations for logical field '%s'nbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (fld_dict[fldID], weburl) +subtitle = a name=3/a3. Modify translations for logical field '%s'nbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (fld_dict[fldID], weburl) if
Re: [patch] Respect i18n collection names in search boxes
Hi Tibor, Committed, thanks. (I have modified your patch a bit in order to calculate international collection names only when needed.) Brilliant! Thanks! Ferran
Re: [patch] Respect i18n collection names in search boxes
Hi Tibor, I've seen that you still haven't applied my tiny patch below, and 0.92 is approaching. Could you consider it, please? Thanks, Ferran Display i18n collection names when an error message is shown. I've also added a cosmetic em/em in the list of collection names, because I feel it is easier to read, but you decide. Index: cds-invenio/modules/websearch/lib/search_engine.py === --- cds-invenio.orig/modules/websearch/lib/search_engine.py 2006-12-15 08:55:25.875425044 +0100 +++ cds-invenio/modules/websearch/lib/search_engine.py 2006-12-15 08:55:42.098305099 +0100 @@ -1562,11 +1562,13 @@ t1 = os.times()[4] results = {} results_nbhits = 0 +colls_printable = [] for coll in colls: results[coll] = HitSet() results[coll]._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(coll)._set) results[coll].calculate_nbhits() results_nbhits += results[coll]._nbhits +colls_printable.append(get_coll_i18nname(coll, ln)) if results_nbhits == 0: # no hits found, try to search in Home: results_in_Home = HitSet() @@ -1577,7 +1579,7 @@ if of.startswith(h): url = websearch_templates.build_search_url(req.argd, cc=cdsname, c=[]) print_warning(req, _(No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.) %\ - {'x_collection': string.join(colls, ','), + {'x_collection': 'em' + string.join(colls_printable, ',') + '/em', 'x_url_open': 'a class=nearestterms href=%s' % (url), 'x_nb_hits': results_in_Home._nbhits, 'x_url_close': '/a'})
Re: [patch] Respect i18n collection names in search boxes
Hi Tibor, Fixed in CVS. Thanks for the patch (note that I modified it, because you should not really replace c and c_name, only c_printable) [...] Great! Your fix was of course better. Testing it, I found we forgot another snippet, when an error message is displayed. I've also added a cosmetic em/em in the list of collection names, because I feel it is easier to read, but you decide. Thanks again, Ferran Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py === --- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py 2006-12-01 10:52:10.0 +0100 +++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py 2006-12-01 10:59:55.0 +0100 @@ -1562,11 +1562,13 @@ t1 = os.times()[4] results = {} results_nbhits = 0 +colls_printable = [] for coll in colls: results[coll] = HitSet() results[coll]._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(coll)._set) results[coll].calculate_nbhits() results_nbhits += results[coll]._nbhits +colls_printable.append(get_coll_i18nname(coll, ln)) if results_nbhits == 0: # no hits found, try to search in Home: results_in_Home = HitSet() @@ -1577,7 +1579,7 @@ if of.startswith(h): url = websearch_templates.build_search_url(req.argd, cc=cdsname, c=[]) print_warning(req, _(No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.) %\ - {'x_collection': string.join(colls, ','), + {'x_collection': 'em' + string.join(colls_printable, ',') + '/em', 'x_url_open': 'a class=nearestterms href=%s' % (url), 'x_nb_hits': results_in_Home._nbhits, 'x_url_close': '/a'})
[patch] Respect i18n collection names in search boxes
Hi Tibor, may you consider this patch for inclusion for next release? You can see what I'm trying to fix looking at our Search collections: pull down menu in, for example: http://ddd.uab.es/search.py?sc=1ln=enp=f=action=Searchcc=rcao Thanks, Ferran With this patch (tested on 0.91.0.20061116) i18n collection names are honored. It is specially important if default collection names are coded, because they appear short and cryptic for the end user. Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py === --- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py 2006-11-10 12:22:39.0 +0100 +++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py 2006-11-30 11:30:19.0 +0100 @@ -125,7 +125,7 @@ sre_unicode_uppercase_c = sre.compile(unicode(r(?u)[??], utf-8)) sre_unicode_uppercase_n = sre.compile(unicode(r(?u)[?], utf-8)) -def get_alphabetically_ordered_collection_list(level=0): +def get_alphabetically_ordered_collection_list(ln, level=0): Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box(). @@ -133,7 +133,8 @@ query = SELECT id,name FROM collection ORDER BY name ASC res = run_sql(query) for c_id, c_name in res: -# make a nice printable name (e.g. truncate c_printable for for long collection names): +# make a nice printable name (e.g. truncate c_printable for for long collection names in language ln): +c_name = get_coll_i18nname(c_id, ln) if len(c_name)30: c_printable = c_name[:30] + ... else: @@ -143,7 +144,7 @@ out.append([c_name, c_printable]) return out -def get_nicely_ordered_collection_list(collid=1, level=0): +def get_nicely_ordered_collection_list(ln, collid=1, level=0): Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box(). @@ -450,9 +451,9 @@ colls_nicely_ordered = [] if cfg_nicely_ordered_collection_list: -colls_nicely_ordered = get_nicely_ordered_collection_list() +colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln) else: -colls_nicely_ordered = get_alphabetically_ordered_collection_list() +colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln) colls_nice = [] for (cx, cx_printable) in colls_nicely_ordered:
Re: [patch] Respect i18n collection names in search boxes
Sorry, Tibor, forget the previous patch, it is old; I forgot a 'quilt refresh' ;-( may you consider this patch for inclusion for next release? You can see what I'm trying to fix looking at our Search collections: pull down menu in, for example: http://ddd.uab.es/search.py?sc=1ln=enp=f=action=Searchcc=rcao New one is attached. Ferran With this patch (tested on 0.91.0.20061116) i18n collection names are honored. It is specially important if default collection names are coded, because they appear short and cryptic for the end user. Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py === --- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py 2006-11-10 12:22:39.0 +0100 +++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py 2006-11-30 12:16:47.0 +0100 @@ -125,15 +125,16 @@ sre_unicode_uppercase_c = sre.compile(unicode(r(?u)[??], utf-8)) sre_unicode_uppercase_n = sre.compile(unicode(r(?u)[?], utf-8)) -def get_alphabetically_ordered_collection_list(level=0): +def get_alphabetically_ordered_collection_list(ln, level=0): Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box(). out = [] -query = SELECT id,name FROM collection ORDER BY name ASC +query = SELECT name FROM collection ORDER BY name ASC res = run_sql(query) -for c_id, c_name in res: -# make a nice printable name (e.g. truncate c_printable for for long collection names): +for name in res: +# make a nice printable name (e.g. truncate c_printable for long collection names in language ln): +c_name = get_coll_i18nname(name, ln) if len(c_name)30: c_printable = c_name[:30] + ... else: @@ -143,7 +144,7 @@ out.append([c_name, c_printable]) return out -def get_nicely_ordered_collection_list(collid=1, level=0): +def get_nicely_ordered_collection_list(ln, collid=1, level=0): Returns nicely ordered (score respected) list of collections, more exactly list of tuples (collection name, printable collection name). Suitable for create_search_box(). @@ -152,7 +153,8 @@ WHERE c.id=cc.id_son AND cc.id_dad='%s' ORDER BY score DESC % collid res = run_sql(query) for c, cid in res: -# make a nice printable name (e.g. truncate c_printable for for long collection names): +# make a nice printable name (e.g. truncate c_printable for long collection names in language ln): +c = get_coll_i18nname(c, ln) if len(c)30: c_printable = c[:30] + ... else: @@ -450,9 +452,9 @@ colls_nicely_ordered = [] if cfg_nicely_ordered_collection_list: -colls_nicely_ordered = get_nicely_ordered_collection_list() +colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln) else: -colls_nicely_ordered = get_alphabetically_ordered_collection_list() +colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln) colls_nice = [] for (cx, cx_printable) in colls_nicely_ordered:
Could we (the translators) get a pre-annouce of each new release?
Hi Tibor, as I try to catch up with 0.90.1 translations and I watch your CVS Commit messages, I was wondering wether it would be possible to know a few days in advance about new releases, so we can do something about it. Thanks, Ferran
[translation issues] Genre in messages
Hi developers, I'd like to point out that in the next release there are some code optimisations that clash with languages (like Catalan or Spanish) that have different genre (male or female) for objects. For example (webcomment_templates.py): out = _(Your %s was successfully added) + 'br /br /' out += 'a href=%s' % link out += _('Back to record') + '/a' out %= (reviews==1 and _('review') or _('comment')) return out It turns out that, for example in Catalan: - review: la cr?tica (female) - comment: el comentari (male) So, in my ca.po, I had to choose: #: modules/webcomment/lib/webcomment_templates.py:901 #, fuzzy, python-format msgid Your %s was successfully added msgstr El vostre %s ha estat afegit But it could well be: msgstr La vostra %s ha estat afegida That is, 3 word changes (El - La; vostre - vostra; afegit - afegida); and similarly in Spanish, at least. I know that gettext has sophisticated plural handling, but I don't remember reading anything about genre. Unless there is any solution, I'm afraid this message optimisation should be avoided. Thanks, Ferran