Re: Please allow any indicator in any field

2014-03-31 Thread Ferran Jorba
Hello Samuele,
 
 Dear Ferran, Alexander,

 I will try to explain you how Invenio is evolving and let's see if my
 understanding is correct, and if this will satisfy your MARC needs
 (Tibor, Esteban, please correct me anytime I am wrong).

Sorry, I disagree about the expression «your MARC needs».  I think that
the correct expression should be to change it to «Marc21 compliance».
Marc21 is a common agreement among a large, world-wide library
community.  UAB is a tiny, tiny fraction of this communitity, and UAB
choose to change to CATMARC to Marc21 some years ago, with all Catalan
libraries because we were following a general, world-wide movement to a
global standard, it was a strategic decission.  We could perfectly
disappear and the Marc21 community wouldn't notice at all.  We don't
have any special needs.  Marc21 allows for local fields and/or subfields
and we find some uses for it.

[...]
 Take a look at:
 https://github.com/inveniosoftware/invenio-demosite/blob/pu/invenio_demosite/recordext/fields/atlantis.cfg

 this is the default BibField configuration of Atlantis Demo
 Site. Note: this is not the default of Invenio, just Atlantis. And in
 Invenio core? Well no configuration is enforced there. It's up to
 you. You can start off Atlantis configuration and encode the whole
 MARC21 (or the part of MARC21 that you need), and Invenio will speak
 MARC21.

I still remember my first Invenio installation, back when it was called
CDSware.  I still remember the confusion to try a large and complex
piece of software (and then, it wasn't so large as it is now) that I
didn't fully understand.  I have helped some other local institutions
interested in trying Invenio, and I have seen that they are as confused
as I was.

What you call «Atlantis Demo Site» is the basis of those prospective new
Invenio users.  This is what they install and this is what they hope to
work.  My opinion is that you cannot expect prospective new Invenio
users to fiddle with Jinja2 templates just to make a 245 title with
indicators to appear as title, to be indexed as title and to be exported
in whatever format as title.  The default values (what you call
«Atlantis Demo Site») should comply with Marc21 as much as possible.  If
it doesn't, the barriers for newcomers to adopt Invenio are (a)
unnecessarily difficult to overcome and (b) you are asking each of them
to repeat the same exercice just to load a few records and to see the
result.

If you just correct the current, sub-standard records in your Atlantis
Demo Site to a more realistic ones, and make the default configuration
recognize them, the goal will be accomplished.  I sincerely think that
it is not difficult.  It is just that you are interested to make the
Invenio community grow.

Are you?

If there is some magical parameter to change it to something else like
Unimarc, the better.  But it should be easy, trivial and clear.  Please
help newcomers, or Invenio will always be something small, exotic and
marginal in the digital libraries and repositories landscape.

Best regards,

Ferran


Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Rob,

  CERN may use a subset of Marc21 where most of the indicators are __.
  Ok, that's CERNs library decission.  But if the Invenio developers
  claim that Invenio supports Marc21, you *must* allow other
  indicators there, and consider it valid.
 
 Then don't say it supports MARC21.  Simple solution.

Sorry, they do:

  Invenio complies with standards such as the ​Open Archives 
  metadata harvesting protocol (OAI-PMH) and uses ​MARC 21 as its
  underlying bibliographic format.
  http://invenio-software.org/

  Flexible metadata: Standard metadata format (MARC) 
  http://invenio-software.org/wiki/General/Features

  MARC format is the standard in the library world. It is well
  established and has been used since 1960s. [...]
  http://invenio-demo.cern.ch/help/admin/howto-marc

And those reasons were the ones we used to choose Invenio over the
other alternatives quite some years ago.

 The primary goal of invenio should be to meet the needs of its
 original institution(s).

«Primary goal» should not exclude others, specially when an easy
compatible solution exist: take any indicator as valid.  Not perfect,
but *much* better than now.

 If marc indicators are not necessary in the
 database functions of the originating institution, then feel free to
 ignore them. Avoid getting dragged back 40 years to the days of
 library catalogs by any mandates to follow every rule to the letter.
 Those rules may have made sense in 1970 but they don't always now.
 And MARC development has been under the control of library
 association committees, made up of librarians, who make decisions
 based on cataloging rules for description of items (as typed on paper
 cards) and not contemporary technology.

That's why some institutions, like CERN or FNAL may take a subset of
Marc21.  But it turns out that there are millions, hundreds of millions
Marc21 records out there, outside those HEP institutions, where
those indicators exist.  And if the library community has decided that
indicators are useful, please respect their decissions.  That's not
what I'm discussing here.

What I'm proposing is that, just changing the developers mind and
taking into account that indicators may have different values that __,
and changing the code for %% or whatever wildcard character is
relevant, those existing records are going to be recognized by any
institution testing Invenio with default values, without having to
patch it as some of us have done.  I have discuraged Invenio to more
than one institution for those reasons.

 It is important to remember that marc, was originally a U.S. Library
 of Congress file format designed for large main-frame machines in a
 day of top down programming, and magnetic tape reel storage.  Access
 was entirely sequential which explains some of the record
 architecture.  the format was intended to be used to generate paper
 file cards.

Yes.  For similar technological limitations Unix has those cryptic
abbreviations.  But both Marc an Unix, curiously both born at the
end of 1960s, have been proven much better than the alternatives, and
the reason that they still in use. And backwards compatibility with the
existing heritage has one of the reasons of its current value.
 
 Modern computing should have freedom to use marc in any way that
 makes it as suitable as possible for the job and not be hindered by
 any marc or cataloging rules that are no longer really applicable.

Yes, of course.  But as compatibility is not so difficult, I argue that
it should be a goal for Invenio.  Even more as the current users have
records *with* indicators.  And let the librarians decide whether those
cataloging rules are applicable or not.  Most of the times they are
right.

Best regards,

Ferran


Re: Please allow any indicator in any field

2014-03-29 Thread Ferran Jorba
Hello Alexander,

[...]
  I know Marc21 reasonably well, and I don't remember now any case
  where having different indicators mean something so different that
  has to be treat differently.
 
 Here I would be more careful. Basically, I would treat Marc fields and
 indicators not as 3 digits plus two other funny chars but consider the
 whole bunch as a 5 character wide filed designation. I think, here I'm
 in fact a bit more in line with Estebans approach. At least if I
 understand it correctly. (Though I agree with you that one might not
 come up with a complete bibfield list, but just with a set of most
 common usages.)

But «most common usages» won't cover them all, and so, you cannot load
arbitrary records coming from unknown sources and expect Invenio to do
the expected thing with them.  So, what I'm proposing is: let «any» be
the rule, and let's cover the exceptions later.  What I'm proposing is
to follow the Postel's rule:

 Be conservative in what you do, be liberal in what you accept from
 others.
 http://en.wikipedia.org/wiki/Robustness_principle

[...]
  The JSON structure that we create from Marc21 (or anything else)
  contains as much information from the master format as you want
  (even the indicators).  Meaning that if your data model is well
  written it is a lossless conversion and there is a one to one
  mapping that makes possible doing Marc21 to JSON to Marc21.
 
  I hope that.
 
 +1
 
 I share some concerns about this with Ferran and Martin and some
 others, and I'm very sure it's quite a task...

I don't think it is so difficult if the code just accepts 245%% for
title, 100%% for first author, etc.  With a 10% effort we could cover
more than 95% of the cases.

Alexander, would you accept to exchange the current Invenio default
behaviour with the default I'm proposing?  Knowing that it would not be
perfect, do you think that it would be better?

Best regards,

Ferran


Re: [pu jsonalchemy] Aggregation of several fields into now

2014-03-27 Thread Ferran Jorba
Hello Alexander, Esteban et al,

Alexander Wagner a.wag...@fz-juelich.de wrote:
 
 On 27.03.2014 15:38, Esteban Gabancho wrote:

 I think the second solution is the closest one to reality, the `None`
 express that the record doesn’t have a first author. And I also think
 that we could apply this solution for other cases where we have this
 kind of situation (like with the `110__` and `710__`).

 What do you think?

 If I may: as a librarian you have a 100. You may not have a 700, but
 in case you have only one author it is 100 by definition.

[to the non librarians in the crowd; Alexander knows it already]

Or a 110, or a 111, I'd like to remind.  An author is not only a 100
(personal author), but it may be a corporate one (110) or a conference
(111).  And please, don't forget that any of those tags may have
arbitrary values as indicators.

Best regards,

Ferran


Re: RFC unifying phrase search behaviour

2014-02-25 Thread Ferran Jorba
Hi,

Alexander Wagner a.wag...@fz-juelich.de wrote:
 
 On 24.02.2014 11:30, Tibor Simko wrote:

 Hi!

 People don't easily distinguish between the following queries:

 title:'some phrase'

 substring

 title:some phrase

 exact search

 [...]

Once more, I agree with Alexander.  The whole reply.

Danke,

Ferran


Re: Invenio i18n in crowdin?

2013-12-19 Thread Ferran Jorba
Hello Theodoros and Tibor,
 
 On Wed, 18 Dec 2013, th...@physics.auth.gr wrote:
 Crowdin seems nice with lots of features. For 'open' opensource
 projects (and also for academic/research(?) institutions) its
 completely free.

[...]
 Personally I prefer Emacs's po-mode.  Offline, ultra fast,
 auto-completion, one-key access to the phrase context, language syntax
 checking, etc.

I also appreciate Emacs po-mode, among those reasons Tibor observes, to
switch between the for or five languages I use: I translate into two,
but, after the English original, I often read the French and Italian
translations to look for inspiration or alternatives, just pressing the
'a' key.  Match that!

Ferran


Re: RFC: enable bibsched scripting capabilities [draft patch]

2013-06-28 Thread Ferran Jorba
Hi Samuele,

El Fri, 28 Jun 2013 09:26:03 +0200 Samuele Kaplun samuele.kap...@cern.ch 
escrigué:

[...]
  As I feel that it may be useful to somebody else, so I submit it for
  your consideration.
 
 Indeed it could be useful to manipulate tasks in the queue
 programmatically... so thanks for your idea! The only thing is that
 we are about to integrate into main Invenio e refactored bibsched
 which is the results of several improvements we have introduced for
 the INSPIRE usecase...
 
 We can see how easy it is to apply your suggested CLI improvements.

Good.

[...]
 You can have a peek at it here:
 
 http://invenio-software.org/repo/personal/invenio-adeiana/log/?h=bibsched-refactoring

I'll take a look in the next few days, thank you.

Ferran


Re: RFC: enable bibsched scripting capabilities [draft patch]

2013-06-28 Thread Ferran Jorba
Hi Samuele,

El Fri, 28 Jun 2013 13:38:00 +0200 Samuele Kaplun samuele.kap...@cern.ch 
escrigué:

 On Friday 28 June 2013 13:32:01 Alexander Wagner wrote:
  Or, the other way round: I had expected, that if the job as such
  has a problem it wouldn't reach running status /regardless/ if I
  hit R interactively, or the scheduler launches the same thing
  automatically.
 
 Maybe if Ferran would hit manually R at midnight then the task would
 actually not run? There is indeed this bug addressed here:
 
 http://invenio-software.org/ticket/1432

Except that, and this is valid also for the date parse hypothesis,
in my case it happens only in one Invenio for each server.  The other
one runs flawlessly.  And there are no code differences, I've checked
several times.

Ferran


RFC: enable bibsched scripting capabilities [draft patch]

2013-06-27 Thread Ferran Jorba
Hi all,

since our 1.1 migration we've had some misterious behaviours with
bibsched that, apparently, nobody else has.  It may be related to having
more than one Invenio in a single system, but I haven't been able to
prove it.

One very curious behaviour is that the first task after midnight
switches to SCHEDULED state but doesn't run.  On the test server it
happens to Traces, and on the production it happens to DDD.  No matter
how many hours I spend (and I have spent many!) finding why and how, the
mystery continues.

I get the friendly daily mail as such:

 Emergency from http://ddd.uab.cat: BibSched halted: Process bibsort
 (task_id: 140997) was launched but seems not to be able to reach
 RUNNING status.

Anyhow, I needed a mechanism to automate my daily manual task to put
bibsched into manual mode, know which is the task in SCHEDULED state,
run it and put bibsched back to automatic mode.

I have been patching bibsched to allow, at least, those basic scripting
capabilities.  I don't know how many more tasks do I need (acKnowledge
errors, maybe?).  I'm unsure on the names I have choosen.

As I feel that it may be useful to somebody else, so I submit it for
your consideration.

Comments welcome,

Ferran
BibSched: enable scripting commands [DRAFT]

BibSched is a curses commands with only a few command-line options.  This
first patch adds a new command (appropiately called command) with two
options:

 --mode=[automatic, manual]
 --key=k:task_id

The first one allows to swith from manual to automatic modes, and the second
allows to apply commads to tasks.  Currently only one is implemented: R for
run.
---
 modules/bibsched/lib/bibsched.py |  155 +
 1 files changed, 138 insertions(+), 17 deletions(-)

diff --git a/modules/bibsched/lib/bibsched.py b/modules/bibsched/lib/bibsched.py
index 01314ae..dfba9fb 100644
--- a/modules/bibsched/lib/bibsched.py
+++ b/modules/bibsched/lib/bibsched.py
@@ -31,6 +31,7 @@ import getopt
 from itertools import chain
 from socket import gethostname
 from subprocess import Popen
+from cStringIO import StringIO
 import signal
 
 from invenio.bibtask_config import \
@@ -281,9 +282,78 @@ def bibsched_send_signal(proc, task_id, sig):
 return False
 return False
 
+def parse_report_queue_status():
+'''Get queue status parting the output of report_queue_status.
+
+Returns: a dictionary with the numeric task_id as key and a
+dictionary for each value
+'''
+# print calling report_queue_status...
+out = StringIO()
+report_queue_status(verbose=True, status=('WAITING', 'RUNNING', 'SCHEDULED'), stream=out)
+report = out.getvalue()
+tasks = {}
+for line in report.split('\n'):
+fields = {}
+if '' in line:
+words = line.split('')
+while words:
+word = words.pop(0)
+if word.endswith('='):
+key = word[:-1].split()[-1]
+value = words.pop(0)
+fields[key] = value
+key = int(fields['ID'])
+tasks[key] = fields
+return tasks
+
+
+def command(opt=, arg=):
+'''Check command parameters and call Manager with the appropiate values'''
+
+print opt = [%s] arg = [%s] % (opt, arg)
+if opt in ('-m', '--mode'):
+arg = arg.upper()
+if 'AUTOMATIC'.startswith(arg):
+mode = 'A'
+elif 'MANUAL'.startswith(arg):
+mode = 'M'
+else:
+mode = None
+print sys.stderr,'Unknown mode: %s' % (arg)
+sys.exit(1)
+if mode:
+print 'Manager, mode = %s' % (mode)
+print 'redirect...'
+# old_stdout, old_stderr = redirect_stdout_and_stderr()
+old_stdout = sys.stdout
+Manager(old_stdout, mode)
+elif opt in ('-k', '--key'):
+if arg.count(':') != 1:
+print sys.stderr, Error: syntax: K:task_id
+sys.exit(1)
+else:
+(cmd, task_id) = arg.split(':')
+if len(cmd) == 1:
+cmd = cmd.upper()
+else:
+print sys.stderr, Error: Key must be single character
+sys.exit(1)
+if not task_id.isdigit():
+prit sys.stderr, Error: task id not numeric
+sys.exit(1)
+print 'Manager, command = %s. [%s] [%s]' % (arg, cmd, task_id)
+print 'redirect...'
+# old_stdout, old_stderr = redirect_stdout_and_stderr()
+old_stdout = sys.stdout
+task_id = int(task_id)
+if cmd == R:
+Manager(old_stdout, cmd, task_id)
+
+
 
 class Manager(object):
-def __init__(self, old_stdout):
+def __init__(self, old_stdout, key='', task_id=0):
 import curses
 import curses.panel
 from curses.wrapper import wrapper
@@ -316,8 +386,40 @@ class Manager(object):
 self.header_lines = 3
 except IOError:

Re: Exceptions due to attacks

2013-04-25 Thread Ferran Jorba
Hi Theodoros,
 
 Hello Ferran,

 My dev 1.0.1.1218 and latest maint-1.1 sites correctly display a 404
 not found page for either
 /record/xxx/files/wp-whatever
 /record/xxx/wp-whatever
 /record/wp-whatever

 without sending me an exception error
 The same applies if wp-whatever is replaced by ../../etc/passwd and
 the likes.

 I tried the same with cds.lib.auth.gr and it also displays a 404 error
 (i don't know if an error is logged)

but try an index.php or any other missing hit at http://cds.cern.ch.  It
is effectively handled by Invenio.

 So, I understand that we need a general solution to provide an (a)
 404 not found to the attacker, and/or (b) a digested summary to the
 admin.

 Aren't the other sites having this flood of attacks?  I doubt we are
 the only ones.

Ferran


Re: Exceptions due to attacks

2013-04-25 Thread Ferran Jorba
Hello Theodoros,
 
 On 25/4/2013 12:37 μμ, Ferran Jorba wrote:
 but try an index.php or any other missing hit at http://cds.cern.ch.  It
 is effectively handled by Invenio.
 My point exactly. I see that both my installations and CERN's
 correctly handle those 'attacks'. I even tried with .php and .py files
 and there is no exception raised and sent to the admin even if you set
 CFG_SITE_ADMIN_EMAIL_EXCEPTIONS = 2 in invenio(-local).conf

To tell you the truth, I wasn't aware of this variable.  I have the
default value:

 ## CFG_SITE_ADMIN_EMAIL_EXCEPTIONS -- set this to 0 if you do not want
 ## to receive any captured exception via email to CFG_SITE_ADMIN_EMAIL
 ## address.  Captured exceptions will still be available in
 ## var/log/invenio.err file.  Set this to 1 if you want to receive
 ## some of the captured exceptions (this depends on the actual place
 ## where the exception is captured).  Set this to 2 if you want to
 ## receive all captured exceptions.
 CFG_SITE_ADMIN_EMAIL_EXCEPTIONS = 1

 Unless I'm missing something here, I suspect something weird happening
 only with your installation...

Watching more carefully (thanks!) my installation, I see that it returns
a 404 Not found (http://traces.uab.cat/abc), even in the HTTP headers.

So, now, with a better understanding of what is happening, what I'd like
is a value for CFG_SITE_ADMIN_EMAIL_EXCEPTIONS that doesn't send me an
email for a 404 status.

I'll try to do a local fix and provide it upstream if there is interest.

Thanks for your feedback,

Ferran


Re: Exceptions due to attacks

2013-04-25 Thread Ferran Jorba
Hi Samuele,

[...]
 wait :-) This is already implemented, as Theodoros reported ;-) Just
 have a look in maint-1.1 to:

 commit 0aeb9fa7e8a6b6809be5d586bcdcf0e7a9784e05
 Author: Samuele Kaplun samuele.kap...@cern.ch
 Date:   Tue Oct 27 16:48:22 2009 +0100

 WebStyle: configurable alerts for HTTP errors
[...]
 Isn’t this commit available in your repo and already doing what you are 
 looking for?

I have it in my 1.1.0 production system!  I've just modified the value
(removed 404r), executed inveniocfg --update-config-py and no more
mailbombs!

Thanks to all,

Ferran


Re: Exceptions due to attacks

2013-04-24 Thread Ferran Jorba
Hello,

 In data mercoledì 20 marzo 2013 08:19:27, Johnny Mariéthoz ha scritto:
 every day I have some exceptions due to attacks such as: IOError:
 request data read error (webinterface_handler_wsgi.py:377:readline)
 an example of request is:
 /record/17041/files/wp-content/plugins/mm-forms-community/includes/doajaxfil
 eupload.php
 
 Is it possible to return a 404 status for such as request?
 
 which version of Invenio are you running? Depending on it this is
 indeed the default configuration. I will check the commit log, and
 point you out the missing patches...

Is there any progress on this issue?  Under 1.1 the missing pages
produce much more noise than the old mod_python.

Thanks,

Ferran


Re: Celery integration for next

2013-03-20 Thread Ferran Jorba
Hello Lars,

 I've finished initial integration of Invenio in Celery for next:
 http://invenio-software.org/repo/personal/invenio-lnielsen/commit/?h=next-celeryid=6d09ef545f03edfa6d7c77cd3a2447873b16c87e

 It basically follows what we discussed in DevForum
 (https://invenio-software.org/wiki/Tools/Celery/InvenioIntegration). Take
 a look if you have a minute and let me know if there's issues.
[...]

Yes, please, I have a doubt: at UAB, we have more than one Invenio
installation in the same host, installed as plain users (not root nor
www-data), and served by Apache (specifically, apache-itk) with virtual
hosts.  Will this celery integration be compatible with our setup?

Thanks,

Ferran

PS And congratulations for your zenodo branch.  It looks gorgeous!  We
   are constantly looking at it for inspiration, and we are taking some
   ideas for our forthcoming 1.1 upgrade.


Re: Celery integration for next

2013-03-20 Thread Ferran Jorba
Hello Lars,

 I think there should be no problem in your case. On your host, you
 would install one RabbitMQ server (the broker) - on the broker you
 would create 1 RabbitMQ virtual host per Apache virtual host. For each
 invenio installation you would start 1 worker.

Great, good to know it's so simple!

[...]
 Do you install each Invenio installation in virtual environments? If
 not, this might be the only issue, however I think at most a
 worker-start-script per invenio installation would need to be created.

No, we are not using virtual environments; I'm trying to keep it as flat
as possible, and I'm not using any extra layer that I don't really need.

 Alternatively, we are also thinking of a lite-solution, so you won't
 even need to install a broker (RabbitMQ) and start the Celery
 workers. Celery has a flag so that it can run tasks synchronously
 instead of asynchronously (so the lite version would seem slower, but
 still do the job in the end).

In which situations it «seems slower»?  The end user front-end or the
librarians-systems back-end?

 Currently there's an overlap between bibsched and Celery, which we
 haven't completely sorted out what goes where. For now, bibsched is
 still the master of bibupload and friends. In the short term it seems
 most natural that Celery would take over bibtasklets + new
 territories. On the long run, we'll have to get some experiences
 first.

Sure.  Thanks for answering so fast,

Ferran

 Cheers,
 Lars

 On 20/03/13 08:48, Ferran Jorba wrote:
 Hello Lars,

 I've finished initial integration of Invenio in Celery for next:
 http://invenio-software.org/repo/personal/invenio-lnielsen/commit/?h=next-celeryid=6d09ef545f03edfa6d7c77cd3a2447873b16c87e

 It basically follows what we discussed in DevForum
 (https://invenio-software.org/wiki/Tools/Celery/InvenioIntegration). Take
 a look if you have a minute and let me know if there's issues.
 [...]

 Yes, please, I have a doubt: at UAB, we have more than one Invenio
 installation in the same host, installed as plain users (not root nor
 www-data), and served by Apache (specifically, apache-itk) with virtual
 hosts.  Will this celery integration be compatible with our setup?

 Thanks,

 Ferran

 PS And congratulations for your zenodo branch.  It looks gorgeous!  We
 are constantly looking at it for inspiration, and we are taking some
 ideas for our forthcoming 1.1 upgrade.


Re: Celery integration for next

2013-03-20 Thread Ferran Jorba
Hello Lars,

[...]
 So definitely for most cases you do want to run celery, but for small
 installations without big requirements this can be an easy way to get
 up and running.

Sure, thanks again,

Ferran


Re: Websearch Collections

2013-01-22 Thread Ferran Jorba
Hello Yigit,

 I have a question about web search and collections. Is it possible to
 remove/ add some collections from/into the list under ?Search
 Collections? so that users are not confused with so many collection
 names? For instance, what should I do if I want to erase the parent
 collections, but keep all the children there. What I actually want is
 to be able to edit the select field in a flexible manner. Might we be
 able to do that via Invenio?s Admin Interface? If not, I will edit
 websearch_webinterface.py and websearch_templates.py, but I?m looking
 for the easiest way to do that. Does anyone have an idea on that?

That would be a nice idea, that I'd like you to share if you can
implement it.  IMHO, I did a more general proposal five years ago that
got some support but now is sleeping on the Invenio ticket database, and
I suspect that it addresses the same problem that you are talking about.
I suggested a method to restrict further searches to the current
collection (and thus, subcollections):

 http://invenio-software.org/ticket/450

I still think that it can be done with little effort.  The problem is
that I don't have the time to do it myself.

Sigh,

Ferran


Re: Results of Advanced Search: Missing Message

2013-01-16 Thread Ferran Jorba
Hello Alexander,

[...]
 Unfortunately, I fear I have to start the year with a bug report. One
 of our users just noted that if you use the advanced search and enter
 a query without any results you do not get any notification in certain
 circumstances. E.g. http://goo.gl/gsJ3p (searching müller and wert in
 the author index of JuSER) triggers this behaviour. However, if I fill
 in only one field it seems to report properly. I think this should be
 fixed.

We have also been bitten by this bug on our test 1.1 installation: the
same arguments give different results on simple and advanced search.  I
searched my mail and I've seen that there is no answer.  Yet ;-)

Thanks,

Ferran


Re: Add all search results to a basket?

2012-11-30 Thread Ferran Jorba
Hello Alexander,

Alexander Wagner a.wag...@fz-juelich.de wrote:
 
 Hi!

 Is there currently a way to add all records retrieved by a search to a
 basket? I mean without hooking on every hit and then add per page?
 Probably even some hook all records on this page function?

 I just got this question from our users and at the moment have to
 admit that the only answer I know would be to do it per record. (Note:
 I do not want to store the query in the basket but really the
 resulting records.)

+1

Ferran


Re: Add all search results to a basket?

2012-11-30 Thread Ferran Jorba
Hi Lars and Alexander,

 Would it be all records for an entire search or just all records
 displayed on one result page? In case of the latter, there's a toggle
 all button in the next-branch (see example on:
 http://invenio-demo-next.cern.ch/search?p=action_search=).

Wow, I like this next demo!  Congratulations!

For me it would be good enough, given that there is this «Search
settings» where the number of results can be increased.  Probably it
would be insufficient for some specific cases, but I'd say it is a great
solution for most users.

Thanks for your great work,

Ferran


Translations, branches, and a plea for a Translator's Corner

2012-11-29 Thread Ferran Jorba
Hello Tibor,

branches has made the translators job a little more confusing, specially
because there is no reference page for us to read.  Probably you have
written a mail to a list about which branch should we be working with,
but, frankly, I cannot find it.

One of the (many) great things about Debian is the Debian Developers'
Corner (http://www.debian.org/devel/), that sure you know much better
than me.  It would be very useful for us, translators, to have a
Translators' Corner in the Trac pages where to locate reference
information.

I'd be satisfied even if starts humble and small, like which branch (or
branches?) should be work with.  There is a lot more to be added, like
.po and webdoc and specific hints, etc., but as Laozi said, even the
longest journey begins with a single step :)

OTOH, my reformatting of the websearch webdoc pages is still waititing
for your approval, and decissions like how to organize sections and
languages should go to this page as well.

Thanks,

Ferran


Re: Formatting numbers for display

2012-04-20 Thread Ferran Jorba
Hello,

[...]
 We used websearch_templates.tmpl_nice_number instead. Shouldn't the
 locale aware version be used?
 I need to use it outside of websearch_templates. It seems like this
 functionality should be in miscutils instead.

Yes, please!

Ferran


Re: [INSPIRE-ADMIN] Queue blocked

2012-03-05 Thread Ferran Jorba
Hello Thorsten and Tibor,
 
 On Fri, 02 Mar 2012, Thorsten S wrote:
 however, why are individual bad input records not simply skipped *and
 logged as such for later inspection* and the queue continues otherwise
 unaffected? a single bad edit shouldn't stall everything afterwards,
 unless there is a convincing reason that I am not aware of

 Briefly put, upload jobs are usually run sequentially, since in general
 a future upload job may depend on the result of a previous upload job.
 So, if an upload job ends in an error, manual inspection and resolution
 may be required.  Historically, in this case, the upload queue was
 stopped and an SMS alert was sent to the admin support person, who then
 logged in and inspected the problem and unblocked the queue.  This `easy
 technical solution' works, but requires human intervention, which is
 definitely sub-optimal.

After several years of checking that, in my installations, I never found
an error that required, so to say, a general stop-panic-and-see
situation, I wrote a crude yes-I-know script that I run every 20 minutes
and leaves a couple of logs for later inspection.  I can leave the
office for whatever reason with confidence that the systems will keep
running.

You may have to adjust some Invenio paths, but it may help you to decide
to roll your own.

Hope it helps,

Ferran
# -*- coding: utf-8 -*-
# Time-stamp: 2010.07.19 08:59:38 error2ack.sh d...@homs.uab.es

# This scripts acknowledges bibtasks errors and resumes the scheduler

what=error2ack

cd ~/tmp

filerotate() {
if [ -s $filename ]; then
	test -s $filename.8  mv $filename.8 $filename.9
	test -s $filename.7  mv $filename.7 $filename.8
	test -s $filename.6  mv $filename.6 $filename.7
	test -s $filename.5  mv $filename.5 $filename.6
	test -s $filename.4  mv $filename.4 $filename.5
	test -s $filename.3  mv $filename.3 $filename.4
	test -s $filename.2  mv $filename.2 $filename.3
	test -s $filename.1  mv $filename.1 $filename.2
	test -s $filename.0  mv $filename.0 $filename.1
	mv $filename $filename.0
fi
}

for filename in $what.log $what.sql; do
filerotate
done


echo select id,  status, runtime, proc, arguments
from schTASK
   where status in ('ERROR', 'DONE WITH ERRORS'); | \
~/invenio/bin/dbexec $what.log

if [ -s $what.log ]; then
awk 'NR  1 { q = sprintf(%c,39); \
  printf(update schTASK set status=%s%s%s where id=%s;\n, \
  q, ACK ERROR, q, $1) }' $what.log $what.sql
cat $what.sql | ~/invenio/bin/dbexec
~/invenio/bin/bibsched restart
fi


Re: Time based po snapshots?

2012-01-09 Thread Ferran Jorba
Hello Tibor,

Tibor Simko tibor.si...@cern.ch wrote:
 
 On Wed, 21 Dec 2011, Ferran Jorba wrote:
 How often and, specially, *when* are you plan to update them?  Have you
 thought on a periodicity, as I suggested (ex: first day of each month)?

 I thought about 4 times a year.  Considering how many updates the PO
 files currently get, 12 times a year would probably be an overkill.
 Anyway, the update frequency should be rather tightly coupled with
 release frequency model, so we'll see later this year.

Fair enough.  But please let us know a few days (a week, perhaps?) in
advance so we can do the final sprint.

Thanks,

Ferran


Re: Slow MySQL queries with large data structures in memory

2011-12-22 Thread Ferran Jorba
Hello Benoit,

[...]
 In [4]: %time res = run_sql(SELECT id_bibrec FROM bibrec_bib03x
 LIMIT 100)CPU times: user 1.96 s, sys: 0.06 s, total: 2.02 s
 Wall time: 2.30 s

 Any idea about why we're seeing this and how we can fix it? It is
 quite a big problem for us as our citation dictionaries are so big.

I have noticed in more than one case that for some minimally complex
(?!)  operations the bottleneck is MySQL, not Python, so if can move
part of the manipulation from one the other you have surprises.  I
cannot remember the exact case, but the equivalent with yours should be
changing:

 res = run_sql(SELECT id_bibrec FROM bibrec_bib03x LIMIT 100)
 
to:

 res = run_sql(SELECT id_bibrec FROM bibrec_bib03x)
 res = res[:100]

I remember gains of 10x.  YMMV, but you can try it.

Ferran


Re: Time based po snapshots?

2011-12-21 Thread Ferran Jorba
Hello Tibor,

a couple of more questions:

 I've updated PO files not only for the maintenance branch (v0.99.4), but
 also in the master and next branches, see for e.g. commit b85da374b1e.

How often and, specially, *when* are you plan to update them?  Have you
thought on a periodicity, as I suggested (ex: first day of each month)?
This way we, the translators could know when to send you the updated
files before they re-syncronize with the source code, without worrying
about working on an obsolete file.

 You can checkout latest Invenio master and work on the PO files there
 and send me patches as described in:

So, do you recommend us to work only the master branch, and forget about
next or maint?

Thanks again,

Ferran


Re: Time based po snapshots?

2011-12-20 Thread Ferran Jorba
Hello Tibor,

 On Mon, 31 Oct 2011, Tibor Simko wrote:
 We plan to enter a soft feature freeze for the v1.0 branch by next
 week, so this will be a perfect opportunity to release updated PO
 files at the same time.  So stay tuned.

 Took a while, but the new PO files are available now.

Ok, I see the new files for the 0.99.4 release.  But how are they
updated on the next or maint branches?  How do I get and patch them?
Have you thought on a workflow?  Please assume no deep git mastering on
my side; I have no experience yet on remote branches, sorry ;-(

Thanks,

Ferran


Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hello Alexander,

 Currently, bfe_authors.py uses

 authors = []
 authors_1 = bfo.fields('100__')
 authors_2 = bfo.fields('700__')

 This enforces authors to be considered only if both indicators are
 blank. However, you may notice that from the authoritative definition at

http://www.loc.gov/marc/bibliographic/bd100.html

this is an example of the default values of Invenio not being Marc21
compliant.  I've complained about this a few times, and I should have
filled a task about those defaults, and sent a few patches, although I
haven't yet ;-(.

The problem of the default values not being correct has those
consequences: if you import records from another catalog, those values
mishave in Invenio.  Librarians (and library-educated computer people)
expect those records to behave like in the other system.  So it is
interoperability and economy.

The problem arises not only in those Python bibformat snippets, but also
in bibindex definition, and all export formats.  For example, my
bfe_author.py has those fields:
 
if (authors_type in ['','personal']):
authors.extend(bfo.fields('100%%'))
if (authors_type in ['','corporate']):
authors.extend(bfo.fields('110%%'))
if (authors_type in ['','meeting']):
authors.extend(bfo.fields('111%%'))
if (authors_type in ['','personal']):
authors.extend(bfo.fields('700%%'))
if (authors_type in ['','corporate']):
authors.extend(bfo.fields('710%%'))
if (authors_type in ['','meeting']):
authors.extend(bfo.fields('711%%'))
if (authors_type in ['','personal','corporate','meeting']):
authors.extend(bfo.fields('720%%'))

(In my case I have the need of show sometimes personal or corporate
authors, depending of the collection; I understand that it is not always
the case).

In bibindex (/admin/bibindex/bibindexadmin.py/field), you have also to
add those fields:

 author 100%, 110%, 111%, 700%, 710%, 711%, 720%

In the Marcxml to DC xls xls as well:

  xsl:for-each select=datafield[(@tag=100 or @tag=110 or @tag=111)]
   dc:creator
xsl:call-template name=subfieldSelect
 xsl:with-param name=codesab/xsl:with-param
/xsl:call-template
   /dc:creator
  /xsl:for-each

  xsl:for-each select=datafield[(@tag=700 or @tag=710 or @tag=711 or 
@tag=720)]
   dc:contributor
xsl:call-template name=subfieldSelect
 xsl:with-param name=codesab/xsl:with-param
/xsl:call-template
   /dc:contributor
  /xsl:for-each

I borrowed the following xsl function from somewhere (LC, I think):

  !--- Added FJ 5-feb-2010 to resolve template --
  xsl:template name=subfieldSelect
xsl:param name=codesabcdefghijklmnopqrstuvwxyz/xsl:param
xsl:param name=delimeter
  xsl:text /xsl:text
/xsl:param
xsl:variable name=str
  xsl:for-each select=subfield
xsl:if test=contains($codes, @code)
  xsl:value-of select=text()/
  xsl:value-of select=$delimeter/
/xsl:if
  /xsl:for-each
/xsl:variable
xsl:value-of 
select=substring($str,1,string-length($str)-string-length($delimeter))/
  /xsl:template

And so on.  It is a major task, but much needed.  Newcomers are likely
to feel frustated due to the system not behaving as espected.

Ferran


Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hello Alexander,

 this is an example of the default values of Invenio not being Marc21
 compliant.

 Right. And then these are bad defaults.

 I've complained about this a few times, and I should have
 filled a task about those defaults, and sent a few patches, although I
 haven't yet ;-(.

The reasons why I haven't done it myself, besides the lack-of-time issue
(bad excuse) are that on my instances I have a mix of
better-than-default values and local ones; I don't have (or I don't have
the resources to have) a reasonably recent Invenio instance running
anywhere (we are stilll at 0.99.1), so I'd be patching something old;
and, even with those restrictions, when I tried, I found those example
records (modules/miscutil/sql/tabfill.sql) and the testing
infrastructure that I didn't know how to handle.  So I feel overwhelmed
each time I try ;-(

But idealy one should be able to go, for example, to
http://www.archive.org/details/ol_data and get and load all University
of Toronto Library catalog in the local Invenio and use it, maybe just
adjusting some valid collection field value.

Now it is not the case.  And it is a pity, because after the suitable
adjustments, Invenio is very able to handle them.  It is even possible
to have something like authority records in it (at least we have them
more-or-less working at http://traces.uab.cat/).

Best regards,

Ferran


Re: Author handling bfe_authors.py et al

2011-11-25 Thread Ferran Jorba
Hi again,

 The reasons why I haven't done it myself, besides the lack-of-time issue
 (bad excuse) are that on my instances I have a mix of
 better-than-default values and local ones;

 Well, it would be great if you could drop me some sort of list in case
 your previous post was not complete. We're about to roll out some
 installation here based on recent Invenio so we might work that in if
 it's not already done.

Let me publish my logical fields list here on the list, because it is
easy and likely to be useful to most readers (I've left off a few local
fields):

 4. Logical fields overview
  _
 |Field__|MARC_Tags|Translations___|
 |   |00%, 01%, 02%, 03%, 04%, 05%,|   |
 |   |06%, 07%, 08%, 09%, 10%, 11%,|   |
 |   |12%, 13%, 14%, 15%, 16%, 17%,|   |
 |   |18%, 19%, 20%, 21%, 22%, 23%,|   |
 |   |24%, 25%, 26%, 27%, 28%, 29%,|   |
 |   |30%, 31%, 32%, 33%, 34%, 35%,|   |
 |   |36%, 37%, 38%, 39%, 40%, 41%,|   |
 |   |42%, 43%, 44%, 45%, 46%, 47%,|ca, cs, de, el, en, es, fr, it,|
 |any_field  |48%, 49%, 50%, 51%, 52%, 53%,|no, pt, ru, sk, sv, uk |
 |   |54%, 55%, 56%, 57%, 58%, 59%,|   |
 |   |60%, 61%, 62%, 63%, 64%, 65%,|   |
 |   |66%, 67%, 68%, 69%, 70%, 71%,|   |
 |   |72%, 73%, 74%, 75%, 76%, 77%,|   |
 |   |78%, 79%, 80%, 81%, 82%, 83%,|   |
 |   |84%, 85%, 86%, 87%, 88%, 89%,|   |
 |   |90%, 91%, 92%, 93%, 94%, 95%,|   |
 |___|96%,_97%,_98%|___|
 |title  |130%, 210%, 222%, 240%, 245%,|ca, cs, de, el, en, es, fr, it,|
 |___|246%,_247%,_730%,_740%___|no,_pt,_ru,_sk,_sv,_uk_|
 |author |100%, 110%, 111%, 700%, 710%,|ca, cs, de, el, en, es, fr, it,|
 |___|711%,_720%___|no,_pt,_ru,_sk,_sv,_uk_|
 |abstract   |520% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |keyword|653% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |series_|830%,_440%,_490%_|ca,_en,_es_|
 |subject|600%, 610%, 611%, 650%, 651% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |fulltext   |8564%u   |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |collection |980% |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |year   |260%c, 973%y |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |record_ID  |001  |ca, cs, de, el, en, es, fr, it,|
 |___|_|no,_pt,_ru,_sk,_sv,_uk_|
 |issn___|773%x,_022%a_|ca,_en,_es,_fr_|


The indexes is one-to-one with this one *except* for keyword.  What we
have done is to keep the proper subject tags on the official 600, 610,
611, 650 and 653 and keyword as 653, but merge them as *indexes*, so the
index for keyword (Pàgina inicial  Admin Area  Manage Indexes) has
both the subject and the keyword fields.  That's the solution we've come
with.

And about the bibformat and friends, that is:

 lib/python/invenio/bibformat_elements/
 etc/bibformat/format_templates/
 etc/bibformat/output_formats/

I keep them under guilt patches (http://repo.or.cz/w/guilt.git or
http://packages.debian.org/guilt), but they would only apply to a 0.99.1
release.  I can happily send you a tarball for each; but please
understand there is a mix of better, worse and bad solutions, as I have
been learning to tame the beast over those years.

I'll come to you back in a while.

Cheers,

Ferran


Re: Time based po snapshots?

2011-10-31 Thread Ferran Jorba
Hello Tibor,

 On Tue, 25 Oct 2011, Ferran Jorba wrote:
 Could the project create official updated po files?  Monthly (better),
 bimonthly or quarterly (worse) would be fine for me.

 We plan to enter a soft feature freeze for the v1.0 branch by next
 week, so this will be a perfect opportunity to release updated PO
 files at the same time.  So stay tuned.

Sorry I wrote my other mail at the time you were writing yours.

 As for the future periodical updates, I agree with you that we can
 indeed be updating PO files more frequently.  Many services are still
 being run off master, but with the maint-vX.Y/master/next branch
 policy now firmly in place, the whole release model of Invenio can be
 turned more towards time-oriented, not feature-oriented, model.  A bit
 like the Django guys are doing.  This is where we are heading with
 stable/unstable features, so PO files can come along.

Good.  Thanks,

Ferran


Re: Re-implementation of OAI repository in Invenio

2011-10-05 Thread Ferran Jorba
Hello Alexander,

 Good morning!

[I think that your intention was to respond to the whole list, so let me
continue the disussion here.]

 I agree with Ferran that 024 should be taken into account. From a
 marcish perspective it seems better suited than 035. You may want to

I was suggesting 024 instead of a 909, not 035.  035 is intended for
external identifiers, so an external OAI id, a handle, a LCCN, a local
library catalog number, whatever other identifier should definitively go
to 035.

 http://www.loc.gov/marc/bibliographic/bd035.html

The question is where we do store our own OAI id when generated by the
oaiarchive daemon, and we think that 024 is correct.

 note the marc mechanism using 024 7_ $2src. If I get your reqs
 correctly, this is ecactly what is required, as you can store
 provenance together with the Id in question.  (Cf. stotage of doi
 according to LoC.)

The use of a $2src seems to be allowed only if your src belong to an
approved list maintained by the Library of Congress.  Or at least this
is what we understood following the links from 

 http://www.loc.gov/marc/bibliographic/bd024.html
 http://www.loc.gov/standards/sourcelist/standard-identifier.html
 http://www.loc.gov/standards/sourcelist/index.html

so we thought that a first indicator 8 (Unspecified type of standard
number or code) was a better choice.

 Unfortunately, and I think this as a reason for its missuse', dupe
 checking is currently bound to 035. For compatibility with forghein
 data this should, however, be expanded to 024.

I don't follow you here.  If I harvest your
http://example.de/record/1234, I think that it should have a 024.8x $a
oai:example.de:1234, value, and I'll be storing your oai:example.de:1234
in my 035 $a, with an $9 of my own choice.  And I can re-expose it as
oai:example.cat:2345, keeping your oai:example.de:1234 as dc:identifier
(in our Marc21 to DC XSL we export them both).  Example:

 http://ddd.uab.cat/record/77021/export/hm
 http://ddd.uab.cat/record/77021/export/xd

Best regards,

Ferran


Re: Re-implementation of OAI repository in Invenio

2011-10-05 Thread Ferran Jorba
Hello Samuele,

[...]
 I don't follow you here.  If I harvest your
 http://example.de/record/1234, I think that it should have a 024.8x $a
 oai:example.de:1234, value, and I'll be storing your oai:example.de:1234
 in my 035 $a, with an $9 of my own choice.  And I can re-expose it as
 oai:example.cat:2345, keeping your oai:example.de:1234 as dc:identifier
 (in our Marc21 to DC XSL we export them both).  Example:
 
  http://ddd.uab.cat/record/77021/export/hm
  http://ddd.uab.cat/record/77021/export/xd

 And this is precisely what is finally supported by the new branch that
 will soon be merged. OTOH you use the default shipped invenio.conf

Well, no, sorry for the poor example.  What happens is that this
particular record is no re-exposed, and I didn't find a quick example of
a record with an external OAI id *and* a local 024.  This is better,
because it has a few 035 (but no OAI ids due to a recent external
migration) and a local 024:

 http://ddd.uab.cat/record/70053/export/hm

 (which is trying to be as much backward compatible as possible, and
 therefore keeping on using 909CO as default), but with the more OAI-PMH

I'm afraid I'll challenge you here ;-) Soon I'll be opening a task in
invenio-software.org trac requesting that the default Marc values of
Invenio should match the standard, for the benefit of all, specially
newcomers.  Experienced Invenio admins already know how to tune, change,
etc.  But we should be more friendly to newcommers, and if we say that
we follow Marc21, we should comply much better than now, don't you
think?

Thanks,

Ferran


Re: Re-implementation of OAI repository in Invenio

2011-10-03 Thread Ferran Jorba
Hello Samuele,

[...]
 In Invenio there is a special treatment for this 035 field, namely that
 the couple OAIID_TAG + OAIID_PROVENANCE_TAG is used to identify uniquely
 a record.

 So shall I simply add by default to 035 the above mentioned attributes?

 E.g. 

 * baseURL - $u (different than $9 which is a semantic string. The
 baseURL might change because of technical reasons, and therefore the $9
 subfield, when present will receive priority in identify a record).
 * identifier - $a (as per CFG_BIBUPLOAD_EXTERNAL_OAIID_TAG)
 * datestamp - $d
 * metadataNamespace - $m
 * originDescription - $o
 * harvestDate - $h
 * altered - $a

 Is there anyone in the Invenio community whose system is harvesting
 record (putting external IDs in 035) and is trying to expose them?

We also do it.  After backporting your
2fb7275849e83f5afbb7915000e208a3e053889a patch to 0.99.1, now we store
in 035 $a all kinds of external identifiers, including external OAI ids,
with $9 acting as a kind of «namespace identifier» to avoid conflicts.

As per our own OAI id, we spent some time to conclude that, instead of a
local 9XX field, it should go to 024.8_:

 http://www.loc.gov/marc/bibliographic/bd024.html

 CFG_OAI_SET_FIELD = 0248_9
 CFG_OAI_ID_FIELD = 0248_a

So, even if we are re-exposing a small part of our harvested holdings,
at this moment we don't reuse the same tag for both uses.

I understand the need of your suggested fields ($d, $m, etc.), but
please don't hurry up adding non standard subfields to 035.  The more
your default values depart from Marc21 standard, the more difficulties
you are posing to interchange records with other databases, and more
troubles to potential Invenio newcomers.  I don't have the solution
right now, but your fields don't appear in the standard:

 http://www.loc.gov/marc/bibliographic/bd035.html

Maybe you can ask to some librarian before deciding them.

Thanks,

Ferran


Re: How are Invenio Trac tickets priorised and assigned?

2011-09-02 Thread Ferran Jorba
Hi Samuele,

[...]
 Is there anything I can do to increase the interest for some of them?

 your mail is helping a lot, in this sense :-)

Glad to hear it, thanks,

Ferran


Re: Invenio and INSPIRE code swarm movie

2011-04-29 Thread Ferran Jorba
Hello Tibor and Samuele,

 Il giorno lun, 18/04/2011 alle 13.46 +0200, Tibor Simko ha scritto: 
 As we have discussed some weeks ago, here are two short animations with
 visual representation of Invenio and INSPIRE commit history:
 
http://invenio-software.org/download/invenio-code-swarm.avi
http://invenio-software.org/download/inspire-code-swarm.avi

 what tool have you used to generate these? Is it Gource?
 http://code.google.com/p/gource/

It seems to me that it is code_swarm, right?

 http://code.google.com/p/codeswarm/

Thanks for sharing it, Tibor.  I've been imagining many possibilities
since: processing Apache logs, with the names being either the
collections or the submitters, collection growth (with different
variations data from either sbmSUBMISSIONS or bibrec tables, maybe
crossing it with guess_primary_collection_of_a_record), etc. etc.

But wandering at the http://www.michaelogawa.com/research/ site, I feel
that the evolines (or storylines) visualisation prototype is also
fascinating:

 http://www.michaelogawa.com/research/storylines/

Ars longa, vita brevis,

Ferran


Is it enough to submit a ticket and leave as new? (re #425)

2011-01-11 Thread Ferran Jorba
Hello Tibor,

I'm really interested in ticket http://invenio-software.org/ticket/425,
because I hope it can speed up our indexing time, as well as provide a
more general way to remove diacritics.  As we have a bunch of
digitalisation of old material, with OCR provided by different
softwares, we have lots of funny combinations of characters.

I'm sorry I don't have the time infrastructure to build and test a patch
for it, but given that the test units Invenio provides, it sould not be
difficult to test.

Can I do anything else?  Is it up to you to assign it to somebody else?
Do you think it is feasible to get into the 1.0 release?

Thanks,

Ferran


Re: BibClassify with RDF and MySQL store

2010-12-20 Thread Ferran Jorba
Hello,

 On Sat, Dec 18, 2010 at 12:33 PM, Samuele Kaplun samuele.kap...@cern.ch 
 wrote:
 Hi Roman,

 Il giorno sab, 18/12/2010 alle 12.17 +0100, Roman Chyla ha scritto:
 I agree this is cool, but something doesn't fit, at least I don't
 understand how this could be used for the task of bibclassify, the
 dict is good if you know (more or less) what you are looking for, but
 the task of bibclassify is to find entities inside the fulltext - and
 to find that out, bibclassify has to search for it - and it is not
 exactly the same thing as the spell checking. I must be missing
 something, could you explain to me what advantage at all there would
 be in using the dict? As a fast cache of single level entries? I could
 see how it would be more useful for the cache, citation links etc.,
 but not for bibclassify.

I suggested to look at dict for those reasons:

1. I doesn't neet 24 GB of RAM to start working, regardless of the size
   of the corpora. ;-)
2. It easily permits shared and reused corpora.
3. The protocol itself is easy to understand, not unlike HTTP.
4. The *meaning* of the returned value is up to the client, not unlike
   the Unix way of doing things.  In dict, you just return data, it is
   up to you to interpret it.  You can tag relations, codes, etc.
5. Integrating a dict client in the python Invenio code has a small cost 
   (dicoclient.py from the dico client, see
   http://packages.debian.org/dico or
   http://puszcza.gnu.org.ua/software/dico/) is only 13 KB and doesn't
   have dependencies other than standard Python libs.

 I am not that aware of how BibClassify works right now, but if its final
 goal is to look for the most frequent keywords (from a controlled set)
 inside a fulltext, then, post-poning the issue of the grammar (plural,
 genders, conjugations :-S), I think that it would be indeed possible to
 use dictd in a orthogonal way than we currently do with ontologies.

 Currently for each word in the ontology (correct me if I am wrong) we
 look how many times it appears in the text.

 On the other hand with dict, we might simply take all the words in the
 text, and filter them against the dictionary (which is built after the
 ontology), and then sum up the occurencies of repeated words.

 OK, I see what you mean - could work, but would work mean 'improved'?

 If you take an average of 3000 words times the real time reported for
 lookup above:

 3000 * 0.004 = 12s
 or
 3000 * 0.006 = 18s

 that is two to three times slower than the current bibclassify
 implementation (in case of HEP).

 It could be faster for bigger dictionaries, like Eurovoc, because
 bibclassify will slow down -- or if we manage to cut down the lookup
 time (by making it local process?)

With dict you can use stateless connection (like HTTP) but also you can
reuse an already opened session, so the latency should be better.

 The two methods should accomplish the same goal (if I am not wrong on
 BibClassify algorithm) but the latter should be in principle extremely
 fast, unless the grammar issue is the bottleneck.

 in principle, direct lookups must be replaced by some approximate
 lookups (btw, I think dictd could handle grammar variations better
 than the current regex pattersn, so that would be a gain) - but it
 will return more entries in many cases, then it is necessary to choose
 the right one. Might be easy for limited domains - for Eurovoc, you
 will need some sort of disambiguation

 Another interesting problem is the single keyword made of several
 tokens, like 'search engine' in the sentence:

 Invenio comes with its own search engine implementation?

 will you ask for:
 1. invenio
 2. comes
 
 6 search
 7 engine
 8 implementation

  -- somehow combine 6+7 based on the responses?

 or create collocations and ask for them (will double the number of
 lookups, and does not skip inserted words)
 Invenio comes
 comes with
 ...
 search engine


 Don't get me wrong, dictd is cool. I am just saying it is tiny bit
 more complicated.

Maybe.  I don't know the details of BibClassify, sorry, and I wasn't
advocating to rewrite all BibClassify using dict, of course.  What I'm
suggesting is that those dictionaries, ontologies, etc that you need for
BibClassify to work could be served and easily reused with dict,
cheaper, faster and maybe better that SQL or Solr or whatever other
alternative.

Thanks,

Ferran

PS Eric Lease Morgan did some experiments a while ago using dict for
   serving LC Authorities Catalog, see
   http://serials.infomotions.com/code4lib/archive/2008/200803/0557.html


Re: BibClassify with RDF and MySQL store

2010-12-17 Thread Ferran Jorba
Hello Samuele,

[warning: I may be way off-road]

 I am starting to play a bit with the EuroVoc 

 http://eurovoc.europa.eu/

 ontology in order to integrate it into OpenAIRE Orphan Record
 Repository, for automatic keyword extraction for EU documents.

 This ontology is *big*! and multilingual. I can't even load it with
 RDFLIB on my laptop (4GB of RAM).
[...]

Blame XML bloat (again).

For dictionaries and such, that is, a large corpus of data that doesn't
change so much, in other words, that it is not transactional, why don't
you use specialised software?

Enter http://dict.org, a protocol (http://www.dict.org/rfc2229.txt) and
a canonical implementation for dictionaries, blazingly fast, veteran and
well known (see for example http://packages.debian.org/dictd and
http://packages.debian.org/dict), plus several other implementations
(http://www.dict.org/w/software/start), among others in Python (even
curl is also a dict client)

Creating and indexing a dict server or about half a milion entries using
the standard dict.org utilities takes less than a minute, and the
searches are resolved in miliseconds, for postitive and negative or
approximate answers, for example:

 $ time dict -h localhost 00075743be0748c4965848c62c2f5a70
 1 definition found

 From unknown [md5sums]:

  00075743be0748c4965848c62c2f5a70
 00075743be0748c4965848c62c2f5a70  
/mnt/VOLUM-I/3-12/ddd/veterinaria/revhigsanvet/tif/revhigsanvet_a1915m11t5n8/revhigsanvet_a1915m11t5n8_21.tif
 00075743be0748c4965848c62c2f5a70  
/mnt/VOLUM-Ib/3-12/ddd/veterinaria/revhigsanvet/tif/revhigsanvet_a1915m11t5n8/revhigsanvet_a1915m11t5n8_21.tif

 real   0m0.004s
 user   0m0.000s
 sys0m0.000s

 $ time dict -h localhost 00075743be0748c4
 No definitions found for 00075743be0748c4

 real   0m0.004s
 user   0m0.000s
 sys0m0.000s

 $ time dict -h localhost 00075743be0748c4965848c62c2f5a7
 No definitions found for 00075743be0748c4965848c62c2f5a7, perhaps you mean:
 md5sums:  00075743be0748c4965848c62c2f5a70

 real   0m0.006s
 user   0m0.000s
 sys0m0.000s

 $ dict -h localhost -I
  dictd 1.10.11/rf on Linux 2.6.26-2-amd64
  On nuix.uab.es: up 21+03:06:13, 813 forks (1.6/hour)
  
  Database  Headwords Index Data  Uncompressed
  md5sums  580225  23 MB32 MB165 MB


 $ dict -h dict.org -I
  dictd 1.9.15/rf on Linux 2.6.30-bpo.1-686
  On miranda.org: up 51+17:54:33, 16914217 forks (13619.5/hour)
  
  Database  Headwords Index  Data  Uncompressed
  gcide  203645   3859 kB 12 MB 38 MB
  wn 154563   3089 kB   8744 kB 26 MB
  moby-thes   30263528 kB 10 MB 28 MB
  elements  130  2 kB 14 kB 45 kB
  vera 9203103 kB160 kB558 kB
  jargon   2374 42 kB621 kB   1430 kB
  [...]


Part of this fast speed is that the input file for creating the
dictionary is sorted, and then it does binary searches on a mmapped
file.

As the protocol is inherently client-server, the same ontology
(dictionary) can be (re-)used among different Invenio instances.  It is
not a toy.  I haven't been able to make any noticeable use in my
instance even massively querying it.  You can follow part of my
experiments here:
http://news.gmane.org/gmane.network.protocols.dict.user

Sorry, I had to say it,

Ferran


[patch] BibHarvest: sort remote set names

2010-12-09 Thread Ferran Jorba
Hello,

please consider this tiny patch for inclusion.  It applies cleanly to
current git tree.

Thanks,

Ferran

PS Is it ok to send ot to Jerome CCing the list, or should I address it
   to Tibor instead?
BibHarvest: sort remote set names

* When remote OAI site has a large number of sets, showing them in random
  order makes very difficult to make a sensible selection.  Sort them.
---
 modules/bibharvest/lib/oai_harvest_admin.py |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/modules/bibharvest/lib/oai_harvest_admin.py b/modules/bibharvest/lib/oai_harvest_admin.py
index 2684122..4a06472 100644
--- a/modules/bibharvest/lib/oai_harvest_admin.py
+++ b/modules/bibharvest/lib/oai_harvest_admin.py
@@ -208,6 +208,7 @@ def perform_request_editsource(oai_src_id=None, oai_src_name='',
 
 sets = findSets(oai_src_baseurl)
 if sets:
+sets.sort()
 # Show available sets to users
 sets_specs = [set[0] for set in sets]
 sets_names = [set[1] for set in sets]


Re: [patch] BibHarvest: sort remote set names

2010-12-09 Thread Ferran Jorba
Hi Tibor,

 Thanks; I have added sort() to one more place so that the order would
 be the same also when adding new OAI sources, and committed.

You are right, I've been bitten myself afterwards.  And also, I see that
probably the .sort() lines should be *after* the '# Show available sets
to users' comment line, not before.  Sorry!

 PS Is it ok to send ot to Jerome CCing the list, or should I address
 it to Tibor instead?

 For smaller patches of this kind, it is perfectly OK to use the list or
 to send them to us privately.

 For bigger patches, it would be better to make a ticket at
 http://invenio-software.org/ and attach the patch there.

Understood.

Thanks again,

Ferran


Freshness of po files (was: Re: [patch + tar.gz] I18N: updates to Catalan and Spanish translations)

2010-10-14 Thread Ferran Jorba
Hello Tibor,

[CCing the list, as somebody else may be interested in my question]

 On Wed, 13 Oct 2010, Ferran Jorba wrote:
 a few more transations.

 Thanks, committed.

Fine.

 So I'm sending the patch and tar.gz as safety measure.

 Typically I just use tar.gz, the PO files there are always perfect :)

Good to know.  However, I have question: how fresh are the PO files I
get from git?  Sometimes I hit S in emacs po-mode to see the context and
I don't see the string in the code snipped.

Does anybody, either you in the repository, or me at home, have to do
anything to update them?

Thanks,

Ferran


Where to create a new ticket (was: Recommend lynx instead of html2text)

2010-05-27 Thread Ferran Jorba
Hello Tibor et al,

a few weeks ago, I found out that html2text is inadequate to create
plain text from HTML, because it only knows about iso-8859-1.  I suggest
lynx instead, as I explained in this mail (attached fragment).

As I haven't seen any news about this, maybe I should create a ticket
for it.  But I don't know where, because I'm not up to date about your
trac migration.

Should I do it myself?  Your guidance will be appreciated.

Ferran

---BeginMessage---
[...]
A second issue I'm having is that, in our site, we have a lot of HTML
documents, and a bunch of them are in non-utf8 charset (mostly
iso-8859-1 and windows-1251).  I have been watching and debugging it the
whole morning.  In a word, bibindex_engine expect everything in utf8,
and when it is not, it complains loudly.  Adding the exception to the
message, I got:


 2010-03-15 09:47:05 -- Error: Cannot put word num??riques with sign 1 for 
recID 10 (exception: 'utf8' codec can't decode bytes in position 9-11: invalid 
data).


How to get utf8 clean text from any HTML document, from any charset?
html2text has the -ascii option to output unaccented text, but it didn't
do anything good in my files.  Fortunately, lynx does it cleanly.  This
quick-and-dirty patch allows me to do some progress:


@@ -417,6 +417,8 @@ def get_words_from_fulltext(url_direct_or_indirect, 
stemming_language=None):
 elif os.path.basename(conv_program) == html2text:
 cmd = %s %s  %s % \
   (conv_program, tmp_name, tmp_dst_name)
+cmd = lynx -dump -display_charset=utf8 %s %s % \
+(tmp_name, tmp_dst_name)
 else:
 write_message(Error: Do not know how to handle %s 
conversion program. % conv_program, sys.stderr)
 # try to run it:
[...]
---End Message---


Re: The invasion of the XML entities

2010-05-05 Thread Ferran Jorba
Hello Benoit,

[...]
 However in the example attached, we have XML entities both in the
 title and in the abstract. The abstract seems to be correctly
 unescaped but the title remains escaped leading to some bad results
 (Title is
 #1069;#1083;#1077;#1082;#1090;#1088;#1086;#1085;#1085;#1072;
 #1103;j#1090;#1077;#1086;#1088;#1080;#1103; #1084; ...).

 I don't seem to be able to find the cause of this weird
 behavior. Maybe one of you can?

Maybe this record was imported from somewhere?  I've had similar cases
from sources with a mix of different encodings.  Recode is your friend.
I've included this step in my problematic workflow:

 $ recode --diacritics html..utf8 %s %s'

In your case, it produces this Cyrillic result:

 Электронна
 яjтеория м ...

Hope it helps,

Ferran


Re: The Multivio project

2010-05-03 Thread Ferran Jorba
Hello Miguel,

 I hope you'll excuse me for using the list, but I have an announcement
 that might be of interest to you.

 There's a project going on here at RERO called Multivio whose goal
 is to provide a presentation layer for archives of digital documents:

 https://www.multivio.org/

[...]
 Please don't hesitate to take a look at the project site
 https://www.multivio.org/, try some examples, try it with your own
 documents and send us some feedback at i...@multivio.org.

It certainly looks interesting!  I've tested a couple of PDF files from
our site.  The first one happened to weight 11 MB (an old scanned
journal, from http://ddd.uab.cat/record/53804), and it took so long that
I had to abort it:

 
http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/garbanzo/garbanzo_a1873n47.pdf

The second one, a modern native PDF (from http://ddd.uab.cat/record/5),
was sligtly better:

 
http://demo.multivio.org/client/#geturl=http://ddd.uab.cat/pub/autonoma/autonoma_a2010m3n233.pdf

I'd certainly choose Multivio instead of our Flash based equivalent (no,
it's no my fault, I give it to you so you can see a propietary
alternative):

 http://www.uab.es/revista-autonoma/

However, I'd say that the quality of the thumbnails can be improved.
Which tool are you using?  In my case, I've found that, by far, the
fastest and best results are using a combination of Xpdf's pdftoppm and
Imagemagick's convert.  We create the thumbnails of the first page of
our PDFs this way (simplified):

 $ pdftoppm -f $page -l $page $file.pdf $file
 $ convert -thumbnail 85 $file-0$page.ppm $file.png
 $ rm $file-0$page.ppm

pdftoppm converts all pages to ppm if no -f or -l parameters are given.
Relying on ImageMagick's own PDF to PNG (or any other graphic format)
conversion, the route goes through Ghostscript, and it brings any system
to its knees, and the quality is worses.

Hope it helps,

Ferran


Re: The Multivio project

2010-05-03 Thread Ferran Jorba
Hello Samuele,

 In data lunedì 3 maggio 2010 11:19:23, Ferran Jorba ha scritto:
 It certainly looks interesting!  I've tested a couple of PDF files from
 our site.  The first one happened to weight 11 MB (an old scanned
 journal, from http://ddd.uab.cat/record/53804), and it took so long that
 I had to abort it:

 just for reference, in case it's needed also by other users, the
 pdfopt utils (from ghostscript) can transform any PDF into a
 linearized PDF (also called fast web view mode), that will add hints
 to the PDF to reference single pages without downloading the full
 file. I guess this would make the multivio able to open your 11Mb
 scanned document without any problem.

Thanks for thee suggestion.  I've tried it on one of our 100 MB+
monsters and what I've seen is that the size doesn't vary.  But
certainly Xpdf's pdfinfo notes the change in the «Optimized» field:

 before after pdfopt

 Pages:  294294
 Encrypted:  no no
 File size:  112858067 bytes112838493 bytes
 Optimized:  no yes
 PDF version:1.51.5

Another task in our TODO list...

Thanks,

Ferran


Re: Fwd: Trac

2010-04-21 Thread Ferran Jorba
Hello Tibor, Travis et al,
 
 On Tue, 20 Apr 2010, Brooks, Travis C. wrote:
 So, shall we set up a parallel INSPIRE Trac instance?

 Yes, I think so.  Let's use the Invenio Trac instance for a few days,
 configure it to the will, and I'll then clone it for INSPIRE.

Sorry to jump in without being asked.  A couple of years ago we
evaluated Trac for internal use for our DDD site and other library
related projectS.  The last S is important: we, like many others, are
inveolved in more than one project.

Doing it in Trac was not possible at that time, but rather, you have to
set several Trac instances, and duplicate users, permissions, and so
on.  Then, it is not possible to know all tickets (tasks, or whatever)
assigned to a specific person, or priorise them across projects.

There are a couple of pages in the Trac site about this:
http://trac.edgewall.org/wiki/TracMultipleProjects and the infamous
http://trac.edgewall.org/ticket/130.  In this last ticked
(http://trac.edgewall.org/ticket/130#comment:52) I learned about
DrProject (https://www.drproject.org/), a Trac fork with multiproject
support, that we are very pleased to use here at UAB.

The trouble is that DrProject is now a dead project.  The students at
the University of Toronto that wrote them under the leadership of Dr
Greg Wilson are rewriting it under the Django framework and it will be
called Basie (https://basieproject.org/), but it is not finished yet.
However, there are a couple of alternatives since then: Retrospectiva
(http://retrospectiva.org/) and Redmine (http://www.redmine.org/) both
in Ruby.  Redmine seems to be more mature and it is already packaged for
Debian, a big plus for us (http://packages.debian.org/redmine).  It is
also multidatabase, multilingual, multi DVCS (including git), etc. etc.

We have not decided yet which one we will migrate to, since we are not
in a hurry, but my question is: are you sure to want to migrate to a
isolated tool where will be no relation between your (at least) two
different instances?

A couple of posts about Trac vs Redmine:

 http://changelog.complete.org/archives/696-thoughts-on-redmine
 
http://changelog.complete.org/archives/701-at-long-last-softwarecompleteorg-migrated-to-redmine

Again, please excuse my 0.2 cents here,

Ferran


Re: Fwd: Trac

2010-04-21 Thread Ferran Jorba
Hello Roman,

 Thank you for the links, it was very interesting reading. Your
 experience also seems to caution against similar situations. In my
 previous job, we were also using Trac for multiple projects (namely
 4), but they were really a separate projects so I can't say a lot
 about MultiProjects settings - but on the other hand, I can easily
 imagine it.

 But the basic question IMHO is whether CDS and INSPIRE need to be
 separate - whether they are INDEPENDENT - I don't think anyone can
 answer yes to that question, at least for inspire. Or if they are
 SEPARATE - answer might be yes, but separate in which way?

In our case, the gravity point was people rather than projects.  *I*
want to know which high priority tickets do *I* have, regardless how
(un)related the projects are.  We want to know how many tickets are open
since whatever, regardless the project.  You can't do that with multiple
Trac instances.

 If not technically, then is there a real need for MultiProjects
 setup? By this simple consideration array of technical nightmares
 might be gone, and if that is solved, there is no need to solve other
 stuff.  Especially when 7 days ago Trac rolled out their
 MultiRepository support:
 http://trac.edgewall.org/ticket/130#comment:145

I didn't know about this new development, but they insist so much that
their target is single-instance-single-project, and the #130 ticked is
so old, that I fear that it might not be very clean.

There is something more about Trac that confused me and my coworkers,
and it is shared with Retrospectiva, I think (we haven't evaluated them
thoroughly): this confusion between Login (which is unneeded to fill
tickets) and Preferences.  In our internal working scenario, we thought
it was clearer to force a login before adding tickets or edit wiki
pages.  In both DrProject and Redmine you have to login and then you can
set your preferences.  I don't know enough about Basie.

Ferran


Re: What does 'holding pen' mean?

2010-03-02 Thread Ferran Jorba
Hello all,

 sorry to bother, but I don't know what 'holding pen' means, and how
 to translate it into Catalan and Spanish.  Does anybody have a
 tentative translation into similar languages like French, Italian or
 Portuguese?

 In French: File des notices en attente?

I like this one.  And a free translation could be: «Registres a revisar»
(Catalan) or «Registros para revisar» (Spanish)?

Funny we do accomplish this function using a couple (or more, it
depends) of collections not attached to the main collection.  So
librarians can visit them modify records, decide, and then we move
either the records or the whole collection to the public tree.

They are not really secret, as the pull down list of collections show
them, but it accomplish well the job.

Thanks,

Ferran


What does 'holding pen' mean?

2010-03-02 Thread Ferran Jorba
Hello,

sorry to bother, but I don't know what 'holding pen' means, and how to
translate it into Catalan and Spanish.  Does anybody have a tentative
translation into similar languages like French, Italian or Portuguese?

Thanks,

Ferran


Thank you for 0.99.1

2010-03-02 Thread Ferran Jorba
Hello developers team,

after our long overdue 0.99.1 migration, we would like to thank you
Invenio developers for your fine work.

Some of the improvements I (we) have appreciated the most during this
migration are:

- CFG_ACCESS_CONTROL_LEVEL_SITE, to put the old system readonly,
- the newer, faster, faster! bibindex,
- debugging bibformat information from bibformatadmin,
- reuse of old bibsched numbers,
- etc, etc.

Again, our deepest appreciation to your wonderful job,

Ferran


[patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)

2009-12-03 Thread Ferran Jorba
Hello Tibor,

this thread has put me more pressure to fix the non-working fulltext
indexing in my sites.  Our new 0.99.1 is in this week in beta
(http://traces.uab.cat/).  We'll announce it later on.

I have seen that fulltext does not work in our case because the second
indicator of 856 is hardcoded to be _.  This is not necessarily so,
according to the LC (http://www.loc.gov/marc/bibliographic/bd856.html).

The following patch fixes it.

What I don't understand is why the engine stops after (several) not
found URLs.  How can I convince it to ignore those URLs and keep with
the other ones?

Thanks,

Ferran
BibIndex: allow second indicator of 856 to be any value

 * Second indicator of 856 can have several values, not just _
   (http://www.loc.gov/marc/bibliographic/bd856.html)



Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)

2009-12-03 Thread Ferran Jorba
Oops,

I forgot to refresh it.  Here it comes in full.

Ferran


Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)

2009-12-03 Thread Ferran Jorba
I'm sorry about the noise.

Here it comes the patch.

Ferran
BibIndex: allow second indicator of 856 to be any value

 * Second indicator of 856 can have several values, not just _
   (http://www.loc.gov/marc/bibliographic/bd856.html)

---
 modules/bibindex/lib/bibindex_engine.py |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: cds-invenio/modules/bibindex/lib/bibindex_engine.py
===
--- cds-invenio.orig/modules/bibindex/lib/bibindex_engine.py	2009-12-03 13:09:53.0 +0100
+++ cds-invenio/modules/bibindex/lib/bibindex_engine.py	2009-12-03 13:10:24.0 +0100
@@ -463,7 +463,7 @@
 def get_nothing_from_phrase(phrase, stemming_language=None):
  A dump implementation of get_words_from_phrase to be used when
 when a tag should not be indexed (such as when trying to extract phrases from
-8564_u).
+8565%u).
 return []
 
 
@@ -768,7 +768,7 @@
 @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF
 @parm default_get_words_fnc: the default function called to extract words from a metadata
 @param tag_to_words_fnc_map: a mapping to specify particular function to
-extract words from particular metdata (such as 8564_u)
+extract words from particular metdata (such as 8565%u)
 
 self.index_id = index_id
 self.tablename = table_name_pattern % index_id
@@ -1475,7 +1475,7 @@
 if task_get_option(cmd) == check:
 wordTables = get_word_tables(task_get_option(windex))
 for index_id, index_tags in wordTables:
-wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
+wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8565%u': get_words_from_fulltext})
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 task_sleep_now_if_required(can_stop_too=True)
@@ -1485,7 +1485,7 @@
 if task_get_option(cmd) == check:
 wordTables = get_word_tables(task_get_option(windex))
 for index_id, index_tags in wordTables:
-wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
+wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8565%u': get_nothing_from_phrase}, False)
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 task_sleep_now_if_required(can_stop_too=True)
@@ -1499,7 +1499,7 @@
 if task_get_option(reindex):
 reindex_prefix = tmp_
 init_temporary_reindex_tables(index_id, reindex_prefix)
-wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
+wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8565%u': get_words_from_fulltext})
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 try:
@@ -1555,7 +1555,7 @@
 task_sleep_now_if_required(can_stop_too=True)
 
 # Let's work on phrases now
-wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
+wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8565%u': get_nothing_from_phrase}, False)
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 try:


Re: [patch] Enable fulltext indexing for 8564% (was: Re: Config option (or hack) to disable fulltext indexing?)

2009-12-03 Thread Ferran Jorba
Well, well, well.  8565%u should be 564%u, of course.

This mess is happening because I tested it on a installed 0.99.1 system
and I ported it to my git updated tree.

Again, sorry for that.

Ferran
BibIndex: allow second indicator of 856 to be any value

 * Second indicator of 856 can have several values, not just _
   (http://www.loc.gov/marc/bibliographic/bd856.html)

---
 modules/bibindex/lib/bibindex_engine.py |   12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: cds-invenio/modules/bibindex/lib/bibindex_engine.py
===
--- cds-invenio.orig/modules/bibindex/lib/bibindex_engine.py	2009-12-03 13:27:27.0 +0100
+++ cds-invenio/modules/bibindex/lib/bibindex_engine.py	2009-12-03 13:28:10.0 +0100
@@ -463,7 +463,7 @@
 def get_nothing_from_phrase(phrase, stemming_language=None):
  A dump implementation of get_words_from_phrase to be used when
 when a tag should not be indexed (such as when trying to extract phrases from
-8564_u).
+8564%u).
 return []
 
 
@@ -768,7 +768,7 @@
 @param table_name_pattern: i.e. idxWORD%02dF or idxPHRASE%02dF
 @parm default_get_words_fnc: the default function called to extract words from a metadata
 @param tag_to_words_fnc_map: a mapping to specify particular function to
-extract words from particular metdata (such as 8564_u)
+extract words from particular metdata (such as 8564%u)
 
 self.index_id = index_id
 self.tablename = table_name_pattern % index_id
@@ -1476,7 +1476,7 @@
 if task_get_option(cmd) == check:
 wordTables = get_word_tables(task_get_option(windex))
 for index_id, index_tags in wordTables:
-wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
+wordTable = WordTable(index_id, index_tags, 'idxWORD%02dF', get_words_from_phrase, {'8564%u': get_words_from_fulltext})
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 task_sleep_now_if_required(can_stop_too=True)
@@ -1486,7 +1486,7 @@
 if task_get_option(cmd) == check:
 wordTables = get_word_tables(task_get_option(windex))
 for index_id, index_tags in wordTables:
-wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
+wordTable = WordTable(index_id, index_tags, 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564%u': get_nothing_from_phrase}, False)
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 task_sleep_now_if_required(can_stop_too=True)
@@ -1500,7 +1500,7 @@
 if task_get_option(reindex):
 reindex_prefix = tmp_
 init_temporary_reindex_tables(index_id, reindex_prefix)
-wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564_u': get_words_from_fulltext})
+wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxWORD%02dF', get_words_from_phrase, {'8564%u': get_words_from_fulltext})
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 try:
@@ -1556,7 +1556,7 @@
 task_sleep_now_if_required(can_stop_too=True)
 
 # Let's work on phrases now
-wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564_u': get_nothing_from_phrase}, False)
+wordTable = WordTable(index_id, index_tags, reindex_prefix + 'idxPHRASE%02dF', get_phrases_from_phrase, {'8564%u': get_nothing_from_phrase}, False)
 _last_word_table = wordTable
 wordTable.report_on_table_consistency()
 try:


[patch] Capture WebSubmit bibconvert errors

2009-11-18 Thread Ferran Jorba
Hello Tibor,

please adapt or add this patch if you think that could help others like
it helped me.  It applies cleanly to current git sources.

Thanks,

Ferran
WebSubmit: capture stderr bibconvert messages

 * After submitting a form, errors go silent unless captured.

---
 modules/websubmit/lib/functions/Make_Record.py |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: cds-invenio/modules/websubmit/lib/functions/Make_Record.py
===
--- cds-invenio.orig/modules/websubmit/lib/functions/Make_Record.py	2009-11-18 15:18:52.0 +0100
+++ cds-invenio/modules/websubmit/lib/functions/Make_Record.py	2009-11-18 15:19:29.0 +0100
@@ -39,7 +39,7 @@
 source = parameters['sourceTemplate'].replace( ,)
 create = parameters['createTemplate'].replace( ,)
 # We use bibconvert to create the xml record
-call_uploader_txt = %s/bibconvert -l1 -d'%s'  -Cs'%s/%s' -Ct'%s/%s'  %s/recmysql % (CFG_BINDIR,curdir,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,source,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,create,curdir)
+call_uploader_txt = %s/bibconvert -l1 -d'%s'  -Cs'%s/%s' -Ct'%s/%s' %s/recmysql 2%s/recmysql.err % (CFG_BINDIR,curdir,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,source,CFG_WEBSUBMIT_BIBCONVERTCONFIGDIR,create,curdir,curdir)
 os.system(call_uploader_txt)
 # Then we have to format this record (turn  into amp; and  into lt;
 # After all we know nothing about the text entered by the users at submission time


[task #12167] Allow the selection of full and brief format from the WebSearch collection interface

2009-10-27 Thread noreply [Ferran Jorba]

This is an automated notification sent by LCG Savannah.
It relates to:
task #12167, project CDS Invenio

==
 OVERVIEW of task #12167:
==

URL:
  http://savannah.cern.ch/task/?12167

 Summary: Allow the selection of full and brief format from
the WebSearch collection interface
 Project: CDS Invenio
Submitted by: fjorba
Submitted on: 2009-10-27 11:06
 Should Start On: 2009-10-27 00:00
   Should be Finished on: 2009-10-27 00:00
Category: BibFormat
Priority: 5 - Normal
  Status: None
 Privacy: Public
Percent Complete: 0%
 Assigned to: None
 Open/Closed: Open
 Discussion Lock: Any
  Effort: 0.00

___


The tools to create formats (.bft and .py) are powerful and programmer
oriented.

However, there is no easy way to assign those brief and full formats to a
given collection.  It would be nice to integrate those options to the
WebSearch collection definition, like portalboxes, search options and other
collection specific preferences.  Maybe this could make .bfo files obsolete?

I understand that HB and HD should be the default, good enough for most
collection and sites, and it should remain the valid ones unless a specific
choice has been made.

Please note that this preference should be taken into account also when
creating the first, cached page
(http://cdsware.cern.ch/repo/?p=cds-invenio.gita=searchh=HEADst=greps=def+create_latest_additions_info)




___

Carbon-Copy List:

CC Address  | Comment
+-
2111| -SUB-




==

This item URL is:
  http://savannah.cern.ch/task/?12167

___
  Message sent via/by LCG Savannah
  http://savannah.cern.ch/



[patch] Websearch: add a newline before printing Marc tags

2009-10-15 Thread Ferran Jorba
Hi Tibor,

could you please apply this tiny patch?  I've refreshed it to apply
cleanly to current git tree.

Thanks,

Ferran
Websearch: add a newline before printing Marc tags

 The output of Marc format in search_engine.py is useful for offline
 processing.  Adding a newline before pre simplifies it.
---

Index: cds-invenio/modules/websearch/lib/search_engine.py
===
--- cds-invenio.orig/modules/websearch/lib/search_engine.py	2009-10-15 10:22:58.0 +0200
+++ cds-invenio/modules/websearch/lib/search_engine.py	2009-10-15 10:29:38.0 +0200
@@ -3371,16 +3371,16 @@
 
 elif format == hm:
 if record_exist_p == -1:
-out += pre + cgi.escape(get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980])) + /pre
+out += \npre + cgi.escape(get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980])) + /pre
 else:
-out += pre + cgi.escape(get_fieldvalues_alephseq_like(recID, ot)) + /pre
+out += \npre + cgi.escape(get_fieldvalues_alephseq_like(recID, ot)) + /pre
 
 elif format.startswith(h) and ot:
 ## user directly asked for some tags to be displayed only
 if record_exist_p == -1:
-out += pre + get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980]) + /pre
+out += \npre + get_fieldvalues_alephseq_like(recID, [001, CFG_OAI_ID_FIELD, 980]) + /pre
 else:
-out += pre + get_fieldvalues_alephseq_like(recID, ot) + /pre
+out += \npre + get_fieldvalues_alephseq_like(recID, ot) + /pre
 
 elif format == hd:
 # HTML detailed format


Re: [patch] Websearch: add a newline before printing Marc tags

2009-10-15 Thread Ferran Jorba
Hello Tibor,

 Please note that for offline non-XML MARC processing, you may want to
 prefer the `Text MARC' output format (tm) rather than the `HTML MARC'
 (hm) one.

 $ wget -O z.txt 'http://invenio-demo.cern.ch/search?p=ellisof=tm'

Oh, that's great!  I didn't know this one.  I see that it even works
with 0.92.1.

Thanks again,

Ferran


Re: bibupload permission issue

2009-07-27 Thread Ferran Jorba
Hello Tibor,

 On Thu, 09 Jul 2009, Theodoropoulos Theodoros wrote:
 I uploaded several files (with bibupload, using FFT syntax) and I 
 realized that the actual files/directories were created with root:root 
 permissions (probably because it was root user that run bibupload). This 
 is OK in general, but later web-submitted actions with SRV/bibdocfile 
 for that record's docfile produce errors (permission denied).

 Yes; the dirs had better be owned by (or made writable by) Apache.  In
 the git/master version of CDS Invenio, we have improved user checking so
 as to strictly enforce that bibsched tasks (including bibupload) would
 run under the same user identity as the Apache application.  (See also
 CFG_BIBSCHED_PROCESS_USER.)

 Maybe we could go one step further and, as part of the installation
 process, instruct how to create a new user called `invenio' and to run
 the whole shebang under its identity...

That's what I have been doing since I started with Invenio a while ago
[1] following a suggestion your suggestion [2].  Otherwise the whole
issue of files and dirs ownership became a mess.  And now, while (still)
working on two different instances of Invenio in a single server, this
solution has been more clear.  Today we are working on how to make two
instances of Apache running as two different users while still keeping
as much as Debian original configuration as possible.

If you are thinking about this `invenio' user, please don't make it
hardcoded, because in my case I prefer having the specific name of each
of my instances.

Thanks,

Ferran

[1] http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00355.shtml
[2] http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00109.shtml


[patch] Allow no apache password and group files

2009-05-20 Thread Ferran Jorba
Hello,

am I the only one?  If I don't configure
CFG_APACHE_{PASSWORD,GROUP}_FILE, I'm getting this error:


$ ~/invenio/bin/inveniocfg  --create-tables
 Going to create and fill tables...
Testing DB connection... ok
Testing Python/MySQL/MySQLdb UTF-8 chain... ok
 Going to reset CFG_SITE_NAME and CFG_SITE_NAME_INTL...
You may want to restart Apache now.
 CFG_SITE_NAME and CFG_SITE_NAME_INTL* reset successfully.
 Going to reset CFG_SITE_ADMIN_EMAIL...
You may want to restart Apache now.
 CFG_SITE_ADMIN_EMAIL reset successfully.
 Going to reset I18N field names...
 I18N field names reset successfully.
Traceback (most recent call last):
  File /home/invenio/invenio/bin/webaccessadmin, line 28, in module
from invenio.webaccessadmin_lib import main
  File /usr/local/lib/python2.5/site-packages/invenio/webaccessadmin_lib.py, l
ine 48, in module
import invenio.access_control_engine as acce
  File /usr/local/lib/python2.5/site-packages/invenio/access_control_engine.py
, line 31, in module
from invenio import webuser
  File /usr/local/lib/python2.5/site-packages/invenio/webuser.py, line 1049, i
n module
_apache_passwords = _load_apache_password_file()
  File /usr/local/lib/python2.5/site-packages/invenio/webuser.py, line 1044, i
n _load_apache_password_file
for row in open(os.path.join(CFG_TMPDIR, apache_password_file)):
IOError: [Errno 21] Is a directory


I attach a trivial and obvious solution, unless I'm missing something.
(BTW, it is relative to 0.99.1)

Best regards,

Ferran
WebUser: allow no apache password and group files

* Check whether apache password and group file exists before trying to
  open the file to prevent an error when creating tables.

---
 lib/python/traces/webuser.py |   30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

Index: invenio/lib/python/traces/webuser.py
===
--- invenio.orig/lib/python/traces/webuser.py	2009-05-20 16:21:34.0 +0200
+++ invenio/lib/python/traces/webuser.py	2009-05-20 16:29:03.0 +0200
@@ -1041,10 +1041,11 @@
 
 def _load_apache_password_file(apache_password_file=CFG_APACHE_PASSWORD_FILE):
 ret = {}
-for row in open(os.path.join(CFG_TMPDIR, apache_password_file)):
-row = row.split(':')
-if len(row) == 2:
-ret[row[0].strip()] = row[1].strip()
+if apache_password_file:
+for row in open(os.path.join(CFG_TMPDIR, apache_password_file)):
+row = row.split(':')
+if len(row) == 2:
+ret[row[0].strip()] = row[1].strip()
 return ret
 _apache_passwords = _load_apache_password_file()
 
@@ -1060,16 +1061,17 @@
 
 def _load_apache_group_file(apache_group_file=CFG_APACHE_GROUP_FILE):
 ret = {}
-for row in open(os.path.join(CFG_TMPDIR, apache_group_file)):
-row = row.split(':')
-if len(row) == 2:
-group = row[0].strip()
-users = row[1].strip().split(' ')
-for user in users:
-user = user.strip()
-if user not in ret:
-ret[user] = []
-ret[user].append(group)
+if apache_group_file:
+for row in open(os.path.join(CFG_TMPDIR, apache_group_file)):
+row = row.split(':')
+if len(row) == 2:
+group = row[0].strip()
+users = row[1].strip().split(' ')
+for user in users:
+user = user.strip()
+if user not in ret:
+ret[user] = []
+ret[user].append(group)
 return ret
 _apache_groups = _load_apache_group_file()
 


Re: [patch] Allow no apache password and group files

2009-05-20 Thread Ferran Jorba
Replying to myself,

there were a leftover of my test paths, so the patch could not be
applied.  Please discard the previous patch.  I attach my correction.

Sorry,

Ferran
WebUser: allow no apache password and group files

* Check whether apache password and group file exists before trying to
  open the file to prevent an error when creating tables.

---
 lib/python/invenio/webuser.py |   30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

Index: invenio/lib/python/invenio/webuser.py
===
--- invenio.orig/lib/python/invenio/webuser.py	2009-05-20 16:21:34.0 +0200
+++ invenio/lib/python/invenio/webuser.py	2009-05-20 16:29:03.0 +0200
@@ -1041,10 +1041,11 @@
 
 def _load_apache_password_file(apache_password_file=CFG_APACHE_PASSWORD_FILE):
 ret = {}
-for row in open(os.path.join(CFG_TMPDIR, apache_password_file)):
-row = row.split(':')
-if len(row) == 2:
-ret[row[0].strip()] = row[1].strip()
+if apache_password_file:
+for row in open(os.path.join(CFG_TMPDIR, apache_password_file)):
+row = row.split(':')
+if len(row) == 2:
+ret[row[0].strip()] = row[1].strip()
 return ret
 _apache_passwords = _load_apache_password_file()
 
@@ -1060,16 +1061,17 @@
 
 def _load_apache_group_file(apache_group_file=CFG_APACHE_GROUP_FILE):
 ret = {}
-for row in open(os.path.join(CFG_TMPDIR, apache_group_file)):
-row = row.split(':')
-if len(row) == 2:
-group = row[0].strip()
-users = row[1].strip().split(' ')
-for user in users:
-user = user.strip()
-if user not in ret:
-ret[user] = []
-ret[user].append(group)
+if apache_group_file:
+for row in open(os.path.join(CFG_TMPDIR, apache_group_file)):
+row = row.split(':')
+if len(row) == 2:
+group = row[0].strip()
+users = row[1].strip().split(' ')
+for user in users:
+user = user.strip()
+if user not in ret:
+ret[user] = []
+ret[user].append(group)
 return ret
 _apache_groups = _load_apache_group_file()
 


[patch] draft: add xpdf and ghostscript as alternatives to Acrobat

2009-05-20 Thread Ferran Jorba
Hi Tibor,

when configuring 0.99.1, I've noticed that you only consider Adobe
Reader and Distiller as converters to/from PDF and PS.  Fortunately,
both xpdf (with pdftops and pstopdf) and ghoscript (pdf2ps and ps2pdf)
are valid alternatives.

I started to draft a patch, but I'm ignorant enough about autotools and
friends that at this moment I cannot spend the time I need to complete
it.  So I've done the trivial part of the easy half, leaving the
rest to you ;-)

BTW, do you prefer the patches sent as attachments or inline?

Thanks,

Ferran
WebSubmit: add xpdf and ghostscript as alternatives to Acrobat

* This is just an early draft
* Both xpdf and ghostscript provide converters from/to PDF and PS.
* TODO: add them in the INSTALL, configure, etc.

---
 modules/websubmit/lib/functions/Shared_Functions.py |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

Index: cds-invenio-0.99.1/modules/websubmit/lib/functions/Shared_Functions.py
===
--- cds-invenio-0.99.1.orig/modules/websubmit/lib/functions/Shared_Functions.py	2009-05-18 11:58:05.0 +0200
+++ cds-invenio-0.99.1/modules/websubmit/lib/functions/Shared_Functions.py	2009-05-18 12:01:18.0 +0200
@@ -41,8 +41,16 @@
 filename, extension = os.path.splitext(filename)
 extension = extension.lower()
 if extension == .pdf:
+conversion = 
 # Create PostScript
-os.system(%s -toPostScript %s % (CFG_PATH_ACROREAD, fullpath))
+if (CFG_PATH_PDFTOPS):
+conversion = %s %s % (CFG_PATH_PDFTOPS, fullpath)
+elif (CFG_PATH_PDF2PS):
+conversion = %s %s % (CFG_PATH_PDF2PS, fullpath)
+elif (CFG_PATH_ACROREAD):
+conversion = %s -toPostScript %s % (CFG_PATH_ACROREAD, fullpath)
+if conversion:
+os.system(conversion)
 if os.path.exists(%s/%s.ps % (basedir, filename)):
 os.system(%s %s/%s.ps % (CFG_PATH_GZIP, basedir, filename))
 createdpaths.append(%s/%s.ps.gz % (basedir, filename))


Support for Canonical Link in Invenio?

2009-05-05 Thread Ferran Jorba
Hello,

I just came across this information, new to me, and before I forget, I
thought to share it with you:

http://www.dullest.com/blog/canonical-link-tag/

It'd say Invenio would benefit from it.

Best regards,

Ferran


[patch] bibupload --correct documentation fix

2009-04-29 Thread Ferran Jorba
Hi,

yesterday I ran 'bibupload --correct' for the first time in my test
machine, and it works great *except* that it is not stated in the public
documentation that both the tags *and* the indicators must be identical
in order to be replaced.  It is documented in the source code:

http://cdsware.cern.ch/repo/?p=cds-invenio.git;a=blob;f=modules/bibupload/lib/bibupload.py;h=47c88a583f73462d5200479b132ac7ac5e8d8964;hb=bbd312365602dcee3ba2904b1b3851c6b3f8e604#l1668

But not in the documentation
http://invenio-demo.cern.ch/help/admin/bibupload-admin-guide#3.2

This patch (relative to current git sources) may fix it.

Thanks,

Ferran
BibUpload: improve --correct documentation

Make clear that both the tags and the indicators must match when the
--correct option is used.

---
 modules/bibupload/doc/admin/bibupload-admin-guide.webdoc |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: cds-invenio/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc
===
--- cds-invenio.orig/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc	2009-04-29 08:51:21.0 +0200
+++ cds-invenio/modules/bibupload/doc/admin/bibupload-admin-guide.webdoc	2009-04-29 08:58:04.0 +0200
@@ -151,8 +151,9 @@
 
 -c, --correct Correct fields of existing records by those
   from XML MARC file.  The original record
-  content is modified only in the fields from
-  the XML MARC file: the original fields are
+  content is modified only on those fields from
+  the XML MARC file where both the tags and the
+  indicators match: the original fields are
   removed and replaced by those from the XML
   MARC file.  Fields not present in XML MARC
   file are not changed (unlike the -r option).


Mailing list public archives not updated since June

2008-12-02 Thread Ferran Jorba
Hello,

please note that the public archives haven't been updated since June:

 http://cdsware.cern.ch/lists/project-cdsware-users/archive/date.shtml#01220

Thanks,

Ferran



Re: post-release fun, episode 1

2008-04-02 Thread Ferran Jorba
Hello Tibor et al,

guess you have also found this link, courtesy of a Google alert for 'cds
invenio':

http://www.ohloh.net/projects/invenio

Enjoy,

Ferran



Re: post-release fun, episode 1

2008-04-02 Thread Ferran Jorba
Hi Samuele,

 was me who have added Invenio to Ohloh for experimenting (since we're an 
 OpenSource project) :-) 

Good, then

 But it Ohloh has good features in order to discover
 implicit links between CDS Invenio and the surrounding software
 environment...  Bye!  Samuele

Or comparing it with others:

http://www.ohloh.net/projects/compare?metric=Codebaseproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace
http://www.ohloh.net/projects/compare?metric=Activityproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace
http://www.ohloh.net/projects/compare?metric=Contributorsproject_0=EPrintsproject_1=CDS+Invenioproject_2=DSpace

So far about Java verbosity...  Thanks, Guido :-) !

Ferran



Re: post-release fun, episode 1

2008-04-02 Thread Ferran Jorba
Hello Tibor,

[...]
 P.S. I do agree with your Java remark; my favourite example is
  anonymous functions:

Myself, with some white hairs in my beard, Java verbosity reminds me
this so much:


 001000$control optimize
 001100 identification division.
 001200 program-id.
 001300   staglist.
 001400 author.
 001500   Ferran Jorba.
 001600 installation.
 001700   UAB.
 001800 date-written.
 001900   January 1991.
 002000   Rewritten and translated into English, August, 1992.
 002100   Change default-tags, 31 Jan 1994
 002200 remarks.
 002300   This program takes the output of MARC records
 002400   from MARCPRT or HOLDPRT and reformats the output
 002500   to one physical record per field or tag, for wide-
 002600   paper printer (132 chars).
 002700   The output record consists on the control number
 002800   (BIB-ID, AUTH-ID or HOLDINGS-ID), level information,
 002900   tag number, indicators and the textual data.
 003000   It also allows the selection of a the set of tags
 003100   to be printed.
 003200
 003300
 003400 environment division.
 003500
 003600 configuration section.
 003700
 003800 source-computer.
 003900   HP-3000.
 004000 object-computer.
 004100   HP-3000 sequence is hpascii.
 004200
 004300 special-names.
 004400 hpascii is standard-1.


etc..., before the meat begins.  For me, Java is the new Cobol, necktie
included ;-(

Ferran



Re: RFC: replacing CVS

2008-03-27 Thread Ferran Jorba
Hello Tibor et al,

I know that my contribution to this discussion is absolutely marginal,
because I'm not a core contributor, and for my little patches, I'm
perfectly happy with quilt.

But, like you, I have been following all this SCM-DSCM business during
the last few years, and I have a need myself to keep track of my
scripts and utilities.  My only real experience (read+write) with SCM
was with CVS many years ago, and I have used Subversion for
downloading a couple of software packages and have some tests with
tla, Git and Mercurial.

I can subscribe your original conclussions, and I have not much to
add to what the others have said, except some random thoughts.

As we have settled with DrProject[1], a trac fork that can handle
multiple projects our wiki and task handling, and both trac and
DrProject have good integration with Subversion, in principle I was
favouring this option.

But my readonly experiences with Subversion has been quite
disappointing when I saw that, when downloading software, it cannot
keep the original timestamps of the files, something that CVS does
perfectly.  This annoys me so much!

My tests with Git and Mercurial have been driven by a quest to find a
nice tool for distributed storage for digital preservation[2].  We did
some stress-test ingesting both with 500 Gb of fat (25-75 MB) Tiff
files + some technical metadata[3].  What we found is that both are
similarly capable and both handled this load with similar timings and
overhead.  We were specially interested in Git's
content-addressable-filesystem[4] concept and the hash-tree ability to
check if two repositories are identical.  We havent' concluded
anything yet except that, compared with the mighty Git, Mercurial is
no toy either.

What else?  As there is no attractive (to me) centralised SCM option,
and being myself a low-profile developer, I'd be happy to go to a DSCM
(it can be fun), but the easier the better.  I'd be more than happy
with Mercurial.

Moreover, Mercurial has reached 1.0 this week, it has enthusiastic
followers[5], and has an Emacs frontend[6].

My cent,

Ferran


[1] http://www.drproject.org/
[2] See a summary of our preliminary findings at
http://www.cesca.es/promocio/congressos/tsiuc2007/FerranJorba.pdf
[3] This .info file contains the md5sum and the output of
ImageMagick's `identify -verbose'
(http://www.imagemagick.org/script/identify.php).
[4] http://en.wikipedia.org/wiki/Content-addressable_storage
[5] For example, the first comment at http://lwn.net/Articles/274823/
[6] http://freehg.org/u/agriggio/ahg/



Index browsing doesn't work when default language is not English

2008-02-28 Thread Ferran Jorba
Hi again,

here is my report.  When Invenio (0.92.1) is installed with a default
language other than English, index browsing doesn't work well;
specifically, when going to next page.

Example:

 
http://ddd.uab.cat/search?p=smithf=authoraction_browse=Llistac=sf=so=drm=rg=10sc=0of=hb

and click 'Següent' (= next).  BTW, you can also see that some texts are
in Catalan even when changing language.

Author browsing works fine in other 0.92.1 installations:

 
http://cdsweb.cern.ch/search?p=smithf=authoraction_browse=Browsec=sf=so=drm=rg=10sc=1of=hb

 
http://romdoc.upb.ro/search?sc=1p=smithf=authoraction_browse=Browsec=Articles+%26+Preprintsc=Books+%26+Reportsc=Conferencesc=Multimedia+%26+Artsc=Periodicalsc=Presentationsc=UPB+Museumc=Workshops

 
http://sysdoc.com.dtu.dk/search?sc=1p=smithf=authoraction_browse=Browsec=Journal+and+Conference+Articles+c=Thesesc=Books+and+Book+Chaptersc=Teaching+Materialc=Reportsc=Multimedia

etc.

Thanks,

Ferran



Re: Index browsing doesn't work when default language is not English

2008-02-28 Thread Ferran Jorba
Pomoc Tibor,

 The CVS version has the same problem, but it is easy to fix.  I hope
 to get to it in the afternoon...

 Done.  I fixed a couple of other browse issues at the same time too.

Vďaka,

Ferran



Re: Index browsing doesn't work when default language is not English

2008-02-28 Thread Ferran Jorba
Hi Tibor,

 here is my report.  When Invenio (0.92.1) is installed with a
 default language other than English, index browsing doesn't work
 well; specifically, when going to next page.

 The CVS version has the same problem, but it is easy to fix.  I hope
 to get to it in the afternoon...

Oh, great!

Thanks so much,

Ferran



Re: RFC: Invenio/Indico release numbering

2008-02-28 Thread Ferran Jorba
Hello Tibor.

 We have not yet proof-read English messages, so they may still change,
 so better to do nothing still.  Possibly we shall release 0.99.0 with
 old PO files, and incorporate translations into 0.99.1 only.  Now that
 our large codebase changes are mostly over, we do plan on releasing
 more frequently. ;-)

Ok with that 0.99.1.

 I have bug report/enhancement request that I'd really like to see it
 fixed before next release.  Where should I post it?  -users or
 -developers list?

 Better to -developers.  If the patches are big, then just send them to
 my personal address.

No, sorry I don't have any patch.  I'll write another mail to the
-developers list, anyway.

 Also, since no one responded to your concrete ADMINEMAIL vs
 SUPPORTEMAIL scenario, please feel free to adapt every send_email()
 call to your needs. ;-)

No problem, thanks,

Ferran



Re: RFC: Invenio/Indico release numbering

2008-02-27 Thread Ferran Jorba

Hello Tibor,

[...]

4) If agreed, I'll change our docs accordingly.  We plan to release


Agreed.


CDS Invenio 0.99.0 pretty soon, possibly in a week or two.


Ooops!  Please give us (the translators) a little more margin.  I do that
part of my job volunteer from home (like now), but I'm really short of time.
Moreover, a couple of weeks ago you wrote that we (the translators) should 
do nothing about some change I forgot ;-)



Please tell us what you think.


I have bug report/enhancement request that I'd really like to see it fixed
before next release.  Where should I post it?  -users or -developers list?

Thanks,

Ferran



Re: CVS Commit Overview for 2008-02-15

2008-02-18 Thread Ferran Jorba

[...]
  Moved
   important runtime parameters (URLs, DB credentials) from configure
   options into invenio.conf.  This permits to install Invenio without
   thinking too much about them in advance, and to change them later
   after initial demo tests are successful, without having to
   reinstall.

Great!  Thank you!

Ferran



FYI: Slope One Collaborative filtering algorithm

2008-02-12 Thread Ferran Jorba
Hi,

as I still don't have 'People who viewed this page also viewed' in my
site due to being intolerably slow
(http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00742.shtml),
I've accidentally found this entry that might help:

http://en.wikipedia.org/wiki/Slope_One

(Not me, unfortunately.  I'm overwhelmed with the fascinating work of
migrating our ILS...)

Ferran



Re: CVS Commit Overview for 2008-01-16

2008-01-17 Thread Ferran Jorba
Hello Samuele,

 2008-01-16  Samuele Kaplun samuele.kap...@cern.ch

   * modules/websubmit/lib/bibdocfile.py: Updated normalize format to
   transform '.jpg' into '.jpeg'. Does anybody of any similar aliases?

Hmmm.  What's the point?  Why this 'normalisation' but not, for example,
turning any uppercase to lowercase?  And why only for jpeg files and not
PDFs, png, or any other one?  As far as I know, .jpg is as correct as
.jpeg.

For me, please don't.  Or at least, don't hardcode it.  I don't think it
should be part of Invenio code to rename a single case of discrepancies.
And, I know it is a sort of hack, but I have in the oven a visual revamp
of our installation (http://ddd.uab.cat/pub/demo/hispania/) and I use
different extensions, or having some of them lowercase or uppercase to
have different kind of thumbnails.

Let us enjoy this flexibility, please.

My 2 cents,

Ferran



Re: RFC: new Invenio config file

2007-12-11 Thread Ferran Jorba
Hi Tibor,

 The next step in the process of removing the dependency on WML is to
 replace the config.wml configuration technique we've been using so
 far.  Here is a brief outline of a schema I've been thinking about.
[...]

I like it; I like specially this magic inveniocfg tool!  I have nothing
to add, really.

The only grip is rather aesthetic; I'm getting tired about so much XML
everywhere.  And yes, I know that Invenio is heavy consumer and producer
of XML, so...  But if the config file would be written in something
simpler (for example JSON, see http://en.wikipedia.org/wiki/JSON)
instead, I think it would be easier both for humans and programs.

But again, I'd be more than happy if this roadmap gets implemented.

Thanks for asking,

Ferran



Re: test, sorry [3] -- cannot send with Gnus

2007-09-18 Thread Ferran Jorba

Hello again,


Sorry for that test; I'm havig problems posting to the -users list from
gnus+local-exim-config-mail-send-by-smarthost (it used to work before
holidays); trying to see if the -developers works.


take two.  Does anybody receive this?


take three, now with thunderbird (oh, yes, icedove).

After several months happily feeling more confortable with Gnus, now I 
cannot send to the cdsware lists with Gnus; the mails seem to get lost 
(I only get my own BCC copy).  This problem doesn't happen (a) using my 
Gnus to other destinations (inside or outside UAB) or (b) sending mails 
from Icedove to the cdsware lists.


Tibor, may I ask for some ideas to help a Gnus newbie?

Thanks,

Ferran



Re: test, sorry [3] -- cannot send with Gnus

2007-09-18 Thread Ferran Jorba

[Writing from Icedove, as otherwise I'm unsure it will reach you]

Thank you Tibor, and sorry for this unwanted noise.


Hmm, was your user-mail-address generated as ferran.jo...@uab.cat
which is the identity that the CERN listserver knows you under?  Did
you send both mails from the same computer via the same SMTP MTA
chain?  (CERN blacklists some public outgoing mail servers...)


I didn't modify my Gnus setup.  Relevant .gnus.el lines are:


(setq gnus-select-method
  '(nntp news.uab.cat))

(add-to-list 'gnus-secondary-select-methods
 '(nnimap uab
  (nnimap-address imap.uab.cat)
  (imap-username 141)
  (nnimap-port 143)
  (nnimap-list-pattern (INBOX */*))
  )
 '(nntp news.gmane.org)
 )

(setq user-mail-address ferran.jo...@uab.cat)
(setq user-full-name Ferran Jorba)
(setq message-default-mail-headers Bcc: ferran.jo...@uab.cat\n)
(setq smtpmail-smtp-server smtp.uab.cat)


Unless it is due to my own backport of Emacs22 to Etch, and using the 
standard Gnus (User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 
(gnu/linux)).  But again, it works everywhere except to your CERN lists.



Tibor, may I ask for some ideas to help a Gnus newbie?


Sure.  Please forward me the full message (all headers included) of an
email sent from Gnus that did not pass through.


I attach my own BCC of my previous mail.

Thanks,

Ferran


Received: from istanbul.uab.es (localhost [127.0.0.1])
 by istanbul.uab.es (Sun Java System Messaging Server 6.1 HotFix 0.10 
(built

 Jan  6 2005)) with ESMTP id 0joi009blnpvz...@istanbul.uab.es for
 ferran.jo...@uab.cat; Mon, 17 Sep 2007 16:17:56 +0200 (CEST)
Received: from pfff.si.uab.es ([158.109.165.130])
 by istanbul.uab.es (Sun Java System Messaging Server 6.1 HotFix 0.10 
(built

 Jan  6 2005)) with ESMTPS id 0joi009hrnpvf...@istanbul.uab.es for
 ferran.jo...@uab.cat; Mon, 17 Sep 2007 16:17:55 +0200 (CEST)
Received: from fjorba by pfff.si.uab.es with local (Exim 4.63)
 (envelope-from fjorba@localhost.localdomain)   id 1IXHMf-0006Uc-NH; Mon,
 17 Sep 2007 16:14:17 +0200
List-Post: project-invenio-devel@cern.ch
Date: Mon, 17 Sep 2007 16:14:17 +0200
From: Ferran Jorba ferran.jo...@uab.cat
Subject: Re: test, sorry [2]
In-reply-to: 87ejgx3h3s@pfff.si.uab.es
To: project-cdsware-developers project-cdsware-develop...@cern.ch
Message-id: 87odg1z8ee@pfff.si.uab.es
Organization: Universitat Autonoma de Barcelona
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
References: 87ejgx3h3s@pfff.si.uab.es
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (gnu/linux)

Hi,

 Sorry for that test; I'm havig problems posting to the -users list from
 gnus+local-exim-config-mail-send-by-smarthost (it used to work before
 holidays); trying to see if the -developers works.

take two.  Does anybody receive this?

Thanks,

Ferran




Re: test, sorry [3] -- cannot send with Gnus

2007-09-18 Thread Ferran Jorba
Vďaka!

(Ah, those helpful .po files!)

Ferran



Re: CVS Commit Overview for 2007-07-11

2007-07-23 Thread Ferran Jorba
Hello Samuele,

 2007-07-11  Samuele Kaplun samuele.kap...@cern.ch

   * modules/websubmit/lib/websubmit_config.py: Added png to the list
   of recognized extension. Are there any other extensions missing? Do
   we need this list?

I don't know about this one, but I have to maintain a parallel one about
graphic files (currently jpg, gif and png), so I can apply a different
logic for URLs if they are images (typically for icons) or fulltext; see
an early implementation at http://ddd.uab.cat/record/17654.

Thanks,

Ferran



Re: Make test errors + config.wml settings disregarded after upgradingfrom 0.7.1 to 0.92.1

2007-07-04 Thread Ferran Jorba
Hi Torger,

 I have an update to Problem 2 below: The default style only appears on
 the Search, Submit, Personalize, and Administration pages. For
 some strange reason, the Help pages appear in my custom style.

 Default style appears here: http://sysdoc.com.dtu.dk/
 My custom style appears here:
 http://sysdoc.com.dtu.dk/help/index.en.html

If you are talking about the logo, we have put our own logo using but
using Invenio default filename (logo.gif).  I see that this file does
not exist in your installation (http://sysdoc.com.dtu.dk/img/logo.gif)
as it does in ours (http://ddd.uab.cat/img/logo.gif).  Your own logo is
in a different path (http://sysdoc.com.dtu.dk/img/SYSDOC/COMDTU.png).
Try to copy it to the expected place (or symlink it).

Hope it helps,

Ferran



Re: Implementing METS and PREMIS in Invenio. Ideas?

2007-06-28 Thread Ferran Jorba
Hello Jerome,

 Unfortunately we have no resource available for implementing PREMIS or
 METS in CDS Invenio this year.

 Still we have been discussing internally about support for these
 standards in the past, and would be interested to collaborate as much
 as we can if you are willing to implement these standards on your
 side.

I'll be happy to help, although I cannot be a full-time developer, that
is not my job here at UAB.  I'm glad to provide fixes, ideas, testing,
small patches, as always (I do the translations at home, during my
copious free time), but most of my time is trying to make things work,
not to develop large projects.

 The implementation in CDS Invenio does seem feasible without big
 changes in the software, although a deeper analysis would be
 necessary.

I agree with you.  I'd like to see more real world examples about how it
is implemented, because I cannot imagine how a METS record with
descripive metadata (MARCXML), some rights and origin information, with
strong structural information and MIX detail for each scanned page of a
medium-size journal (ex., our http://ddd.uab.cat/record/17654).
Retrieving such a METS record could put the system at its knees.

 Do you have some news about this project/petition since your your last
 email?

Nope.  Yours is the first.

Thanks,

Ferran



Sorting with diacritics patch

2007-06-11 Thread Ferran Jorba
Hi all,

please take this fix into consideration.  It fixes this case:

http://ddd.uab.cat/search?ln=enp=Enginyer+Qu%C3%ADmicf=titulacioaction_search=Searchc=sf=titleso=arm=rg=10sc=0of=hb

Now Àlgebra is the first in the list; otherwise, it was the last.  This
patch is against current CVS; in 0.92.1 is around line 2243.

Thanks,

Ferran
This fixes sorting words that have diacritics, specially in the first word
like Agrave;lgebra, because otherwise they appear at the end of the list.

---
 modules/websearch/lib/search_engine.py |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

Index: cds-invenio/modules/websearch/lib/search_engine.py
===
--- cds-invenio.orig/modules/websearch/lib/search_engine.py	2007-06-11 12:52:08.352732646 +0200
+++ cds-invenio/modules/websearch/lib/search_engine.py	2007-06-11 12:58:37.810073415 +0200
@@ -2415,7 +2415,7 @@
 else:
 # no sort pattern defined, so join them all together
 val = string.join(vals)
-val = val.lower()
+val = strip_accents(val.lower())
 if recIDs_dict.has_key(val):
 recIDs_dict[val].append(recID)
 else:


Should I install v0.92.1.20070412?

2007-04-13 Thread Ferran Jorba
Hi CERNers,

I'm writing to the internal, developers, list due that I've read some
info in the CVS messages.

I'm planning to upgrade our public system on Monday April 30, due that
1st of May is holidays, and it will impact fewer users.  I was planning
to install 0.92.1, but this 'internal release' has intrigued me.  Should
I better ignore it and go for the official one?

Ferran



Re: Should I install v0.92.1.20070412?

2007-04-13 Thread Ferran Jorba
Hello Tibor,

 Putting Invenio v0.92.1 into production should be fine.  There have
 been some fixes since that release, but nothing particularly
 show-stopping.

Thank you for your advice,

Ferran



Re: [patch] collection-management-wording.patch

2007-04-13 Thread Ferran Jorba
Hello Tibor (and Nick),

 Thanks for the suggestion.  We have now changed the wording of this
 box with Nick.  It does not go exactly in the way you suggested, but
 the wording should be clearer than it was before.  Hope you will like
 it. ;-)

Of course I won't even try to discuss proper English usage with a native
speaker!  And yes, I've just seen and I like it.  Now I don't think new
users will find themselves as lost as I was when faced to the original
dialog texts
(http://cdsware.cern.ch/lists/project-cdsware-users/archive/msg00138.shtml).
And those texts are specially important for new users.

[...]
 Do not hesitate to keep alerting us if you find something especially
 striking.

I'll do it.  Thanks again,

Ferran



[patch] collection-management-wording.patch

2007-04-11 Thread Ferran Jorba
Hi Tibor,

please consider this tiny patch for inclusion.

Thanks,

Ferran
I hope those changes in the wording makes collection management a little bit
more comprehensible.

---
 modules/websearch/lib/websearchadminlib.py |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

Index: cds-invenio/modules/websearch/lib/websearchadminlib.py
===
--- cds-invenio.orig/modules/websearch/lib/websearchadminlib.py	2007-02-15 17:25:52.0 +0100
+++ cds-invenio/modules/websearch/lib/websearchadminlib.py	2007-04-11 14:26:20.931396148 +0200
@@ -255,7 +255,7 @@
 
 output = show_coll_not_in_tree(colID, ln, col_dict)
 text = 
-span class=adminlabelAttach which/span
+span class=adminlabelCollection name/span
 select name=add_son class=admin_w200
 option value=- select collection -/option
 
@@ -264,9 +264,9 @@
 text += option value=%s %s%s/option % (id, str(id)==str(add_son) and 'selected=selected' or '', name)
 text += 
 /selectbr
-span class=adminlabelAttach to/span
+span class=adminlabelAttached to/span
 select name=add_dad class=admin_w200
-option value=- select parent collection -/option
+option value=- choose parent collection -/option
 
 
 for (id, name) in col_list:
@@ -278,7 +278,7 @@
 text += 
 span class=adminlabelRelationship/span
 select name=rtype class=admin_w200
-option value=- select relationship -/option
+option value=- choose relationship -/option
 option value=r %sRegular (Narrow by...)/option
 option value=v %sVirtual (Focus on...)/option
 /select


[patch] portable-ps.patch

2007-04-11 Thread Ferran Jorba
Hi Tibor,

I've updated my portable patch implementation I posted some months ago
to the users list.

Ferran
A more portable implementation that should work on Linux, Solaris and
Digital Unix.  Not tested on FreeBSD.

It should also be faster than CDS Invenio 0.92.0, implementation, because 
it calls only one external command (ps), and not 3 (ps, grep --twice-- and
sed).

---
 modules/bibsched/lib/bibsched.py |   19 +--
 1 files changed, 9 insertions(+), 10 deletions(-)

Index: cds-invenio/modules/bibsched/lib/bibsched.py
===
--- cds-invenio.orig/modules/bibsched/lib/bibsched.py	2007-03-01 15:41:38.0 +0100
+++ cds-invenio/modules/bibsched/lib/bibsched.py	2007-04-11 14:26:46.301060324 +0200
@@ -44,6 +44,7 @@
 import curses.panel
 from curses.wrapper import wrapper
 import signal
+import pwd
 
 from invenio.config import \
  CFG_PREFIX, \
@@ -75,16 +76,14 @@
 return None
 
 def get_my_pid(process, args=''):
-if sys.platform.startswith('freebsd'):
-COMMAND = ps -o pid,args | grep '%s %s' | grep -v 'grep' | sed -n 1p % (process, args)
-else:
-COMMAND = ps -C %s o '%%p%%a' | grep '%s %s' | grep -v 'grep' | sed -n 1p % (process, process, args)
-answer = string.strip(os.popen(COMMAND).read())
-if answer == '':
-answer = 0
-else:
-answer = answer[:string.find(answer,' ')]
-return int(answer)
+COMMAND = ps -fu %s % (pwd.getpwuid(os.getuid())[0])
+pslist = os.popen(COMMAND).readlines()
+str = string.join([process,args])
+answer = 0
+for ps in pslist:
+if ps.find(str)  0:
+answer = int(ps.split()[1])
+return answer
 
 def get_output_channelnames(task_id):
 Construct and return filename for stdout and stderr for the task 'task_id'.


[patch] zero-record-collections-deserve-a-zero.patch

2007-04-11 Thread Ferran Jorba
Hi Tibor,

my sortest patch?

Ferran
When a collection has zero items and this zero is not displayed, there is no
clear separator between this collection name and the next, and it confuses
users because it seems a single one.


---
 modules/websearch/lib/websearch_templates.py |3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

Index: cds-invenio/modules/websearch/lib/websearch_templates.py
===
--- cds-invenio.orig/modules/websearch/lib/websearch_templates.py	2007-04-11 14:24:21.057158259 +0200
+++ cds-invenio/modules/websearch/lib/websearch_templates.py	2007-04-11 14:26:24.861879334 +0200
@@ -809,7 +809,8 @@
 ')/small')
 
 
-if number is None: return ''
+if number is None:
+number = 0
 
 if prolog is None:
 prolog = '''nbsp;small class=nbdoccoll('''


[patch] Add missing semicolon after some nbsp;

2007-01-26 Thread Ferran Jorba

Hi Tibor,

subject says it all.

Thanks,

Ferran
Trivial: Add ; to some nbsp.  I noted them when debugging some of my pages.

Index: cds-invenio-0.92.0/modules/bibformat/lib/bibformat_templates.py
===
--- cds-invenio-0.92.0.orig/modules/bibformat/lib/bibformat_templates.py	2007-01-26 10:22:41.636461075 +0100
+++ cds-invenio-0.92.0/modules/bibformat/lib/bibformat_templates.py	2007-01-26 10:25:29.688622760 +0100
@@ -1455,19 +1455,19 @@
 name = format_template['name']
 filename = format_template['filename']
 out += '''trtda href=format_template_show?bft=%(filename)samp;ln=%(ln)s%(name)s/a/td
-tdnbsp/tdtdnbsp/td/tr''' % {'filename':filename,
-'name':name,
-'ln':ln}
+tdnbsp;/tdtdnbsp;/td/tr''' % {'filename':filename,
+  'name':name,
+  'ln':ln}
 for format_element in format_template['elements']:
 name = format_element['name']
 filename = format_element['filename']
-out += '''trtdnbsp/td
+out += '''trtdnbsp;/td
 tda href=format_elements_doc?ln=%(ln)s#%(anchor)s%(name)s/a/td
-tdnbsp/td/tr''' % {'anchor':name.upper(),
-  'name':name,
-  'ln':ln}
+tdnbsp;/td/tr''' % {'anchor':name.upper(),
+   'name':name,
+   'ln':ln}
 for tag in format_element['tags']:
-out += '''trtdnbsp/tdtdnbsp/td
+out += '''trtdnbsp;/tdtdnbsp/td
 td%(tag)s/td/tr''' % {'tag':tag}
 
 out += '''
Index: cds-invenio-0.92.0/modules/bibharvest/lib/bibharvest_templates.py
===
--- cds-invenio-0.92.0.orig/modules/bibharvest/lib/bibharvest_templates.py	2007-01-26 10:22:41.663457567 +0100
+++ cds-invenio-0.92.0/modules/bibharvest/lib/bibharvest_templates.py	2007-01-26 10:26:35.580059372 +0100
@@ -73,7 +73,7 @@
 guidetitle = _(See Guide)
 
 titlebar = a name=%s % title
-titlebar +=  /a%snbspnbspnbspsmall % subtitle
+titlebar +=  /a%snbsp;nbsp;nbsp;small % subtitle
 titlebar +=  [a title=%s href=%s/%s?/a]/small % (guidetitle, weburl, guideurl)
 return titlebar
 
Index: cds-invenio-0.92.0/modules/bibindex/lib/bibindexadminlib.py
===
--- cds-invenio-0.92.0.orig/modules/bibindex/lib/bibindexadminlib.py	2007-01-26 10:22:41.690454060 +0100
+++ cds-invenio-0.92.0/modules/bibindex/lib/bibindexadminlib.py	2007-01-26 10:27:51.548185829 +0100
@@ -277,7 +277,7 @@
 def perform_editindexes(ln=cdslang, callback='yes', content='', confirm=-1):
 show a list of indexes that can be edited.
 
-subtitle = a name=2/a2. Edit indexnbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl)
+subtitle = a name=2/a2. Edit indexnbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl)
 
 fin_output = ''
 idx = get_idx()
@@ -311,7 +311,7 @@
 def perform_editfields(ln=cdslang, callback='yes', content='', confirm=-1):
 show a list of all logical fields that can be edited.
 
-subtitle = a name=5/a5. Edit logical fieldnbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl)
+subtitle = a name=5/a5. Edit logical fieldnbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (weburl)
 
 fin_output = ''
 
@@ -385,7 +385,7 @@
 idx_dict = dict(get_def_name('', idxINDEX))
 if idxID and idx_dict.has_key(int(idxID)):
 idxID = int(idxID)
-subtitle = a name=2/a2. Modify translations for index.nbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % weburl
+subtitle = a name=2/a2. Modify translations for index.nbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % weburl
 
 if type(trans) is str:
 trans = [trans]
@@ -464,7 +464,7 @@
 fld_dict = dict(get_def_name('', field))
 if fldID and fld_dict.has_key(int(fldID)):
 fldID = int(fldID)
-subtitle = a name=3/a3. Modify translations for logical field '%s'nbspnbspnbspsmall[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (fld_dict[fldID], weburl)
+subtitle = a name=3/a3. Modify translations for logical field '%s'nbsp;nbsp;nbsp;small[a title=See guide href=%s/admin/bibindex/guide.html?/a]/small % (fld_dict[fldID], weburl)
 
 if 

Re: [patch] Respect i18n collection names in search boxes

2006-12-18 Thread Ferran Jorba

Hi Tibor,


Committed, thanks.  (I have modified your patch a bit in order to
calculate international collection names only when needed.)


Brilliant!  Thanks!

Ferran



Re: [patch] Respect i18n collection names in search boxes

2006-12-15 Thread Ferran Jorba

Hi Tibor,

I've seen that you still haven't applied my tiny patch below, and 0.92
is approaching.  Could you consider it, please?

Thanks,

Ferran
Display i18n collection names when an error message is shown.  I've also
added a cosmetic em/em in the list of collection names, because I feel
it is easier to read, but you decide.

Index: cds-invenio/modules/websearch/lib/search_engine.py
===
--- cds-invenio.orig/modules/websearch/lib/search_engine.py	2006-12-15 08:55:25.875425044 +0100
+++ cds-invenio/modules/websearch/lib/search_engine.py	2006-12-15 08:55:42.098305099 +0100
@@ -1562,11 +1562,13 @@
 t1 = os.times()[4]
 results = {}
 results_nbhits = 0
+colls_printable = []
 for coll in colls:
 results[coll] = HitSet()
 results[coll]._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(coll)._set)
 results[coll].calculate_nbhits()
 results_nbhits += results[coll]._nbhits
+colls_printable.append(get_coll_i18nname(coll, ln))
 if results_nbhits == 0:
 # no hits found, try to search in Home:
 results_in_Home = HitSet()
@@ -1577,7 +1579,7 @@
 if of.startswith(h):
 url = websearch_templates.build_search_url(req.argd, cc=cdsname, c=[])
 print_warning(req, _(No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.) %\
-  {'x_collection': string.join(colls, ','), 
+  {'x_collection': 'em' + string.join(colls_printable, ',') + '/em',
'x_url_open': 'a class=nearestterms href=%s' % (url),
'x_nb_hits': results_in_Home._nbhits,
'x_url_close': '/a'})


Re: [patch] Respect i18n collection names in search boxes

2006-12-01 Thread Ferran Jorba

Hi Tibor,


Fixed in CVS.  Thanks for the patch (note that I modified it, because
you should not really replace c and c_name, only c_printable)

[...]

Great!  Your fix was of course better.  Testing it, I found we forgot
another snippet, when an error message is displayed.  I've also added a
cosmetic em/em in the list of collection names, because I feel it
is easier to read, but you decide.

Thanks again,

Ferran
Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py
===
--- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py	2006-12-01 10:52:10.0 +0100
+++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py	2006-12-01 10:59:55.0 +0100
@@ -1562,11 +1562,13 @@
 t1 = os.times()[4]
 results = {}
 results_nbhits = 0
+colls_printable = []
 for coll in colls:
 results[coll] = HitSet()
 results[coll]._set = Numeric.bitwise_and(hitset_in_any_collection._set, get_collection_reclist(coll)._set)
 results[coll].calculate_nbhits()
 results_nbhits += results[coll]._nbhits
+colls_printable.append(get_coll_i18nname(coll, ln))
 if results_nbhits == 0:
 # no hits found, try to search in Home:
 results_in_Home = HitSet()
@@ -1577,7 +1579,7 @@
 if of.startswith(h):
 url = websearch_templates.build_search_url(req.argd, cc=cdsname, c=[])
 print_warning(req, _(No match found in collection %(x_collection)s. Other public collections gave %(x_url_open)s%(x_nb_hits)d hits%(x_url_close)s.) %\
-  {'x_collection': string.join(colls, ','), 
+  {'x_collection': 'em' + string.join(colls_printable, ',') + '/em',
'x_url_open': 'a class=nearestterms href=%s' % (url),
'x_nb_hits': results_in_Home._nbhits,
'x_url_close': '/a'})


[patch] Respect i18n collection names in search boxes

2006-11-30 Thread Ferran Jorba

Hi Tibor,

may you consider this patch for inclusion for next release?
You can see what I'm trying to fix looking at our Search
collections: pull down menu in, for example:

http://ddd.uab.es/search.py?sc=1ln=enp=f=action=Searchcc=rcao

Thanks,

Ferran
With this patch (tested on 0.91.0.20061116) i18n collection names
are honored.  It is specially important if default collection names
are coded, because they appear short and cryptic for the end user.


Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py
===
--- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py	2006-11-10 12:22:39.0 +0100
+++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py	2006-11-30 11:30:19.0 +0100
@@ -125,7 +125,7 @@
 sre_unicode_uppercase_c = sre.compile(unicode(r(?u)[??], utf-8))
 sre_unicode_uppercase_n = sre.compile(unicode(r(?u)[?], utf-8))
 
-def get_alphabetically_ordered_collection_list(level=0):
+def get_alphabetically_ordered_collection_list(ln, level=0):
 Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box().
@@ -133,7 +133,8 @@
 query = SELECT id,name FROM collection ORDER BY name ASC
 res = run_sql(query)
 for c_id, c_name in res:
-# make a nice printable name (e.g. truncate c_printable for for long collection names):
+# make a nice printable name (e.g. truncate c_printable for for long collection names in language ln):
+c_name = get_coll_i18nname(c_id, ln)
 if len(c_name)30:
 c_printable = c_name[:30] + ...
 else:
@@ -143,7 +144,7 @@
 out.append([c_name, c_printable])
 return out
 
-def get_nicely_ordered_collection_list(collid=1, level=0):
+def get_nicely_ordered_collection_list(ln, collid=1, level=0):
 Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box().
@@ -450,9 +451,9 @@
 
 colls_nicely_ordered = []
 if cfg_nicely_ordered_collection_list:
-colls_nicely_ordered = get_nicely_ordered_collection_list()
+colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln)
 else:
-colls_nicely_ordered = get_alphabetically_ordered_collection_list()
+colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln)
 
 colls_nice = []
 for (cx, cx_printable) in colls_nicely_ordered:


Re: [patch] Respect i18n collection names in search boxes

2006-11-30 Thread Ferran Jorba

Sorry, Tibor,

forget the previous patch, it is old; I forgot a 'quilt refresh' ;-(


may you consider this patch for inclusion for next release?
You can see what I'm trying to fix looking at our Search
collections: pull down menu in, for example:

http://ddd.uab.es/search.py?sc=1ln=enp=f=action=Searchcc=rcao


New one is attached.

Ferran
With this patch (tested on 0.91.0.20061116) i18n collection names
are honored.  It is specially important if default collection names
are coded, because they appear short and cryptic for the end user.


Index: cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py
===
--- cds-invenio-0.91.0.20061116.orig/modules/websearch/lib/search_engine.py	2006-11-10 12:22:39.0 +0100
+++ cds-invenio-0.91.0.20061116/modules/websearch/lib/search_engine.py	2006-11-30 12:16:47.0 +0100
@@ -125,15 +125,16 @@
 sre_unicode_uppercase_c = sre.compile(unicode(r(?u)[??], utf-8))
 sre_unicode_uppercase_n = sre.compile(unicode(r(?u)[?], utf-8))
 
-def get_alphabetically_ordered_collection_list(level=0):
+def get_alphabetically_ordered_collection_list(ln, level=0):
 Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box().
 out = []
-query = SELECT id,name FROM collection ORDER BY name ASC
+query = SELECT name FROM collection ORDER BY name ASC
 res = run_sql(query)
-for c_id, c_name in res:
-# make a nice printable name (e.g. truncate c_printable for for long collection names):
+for name in res:
+# make a nice printable name (e.g. truncate c_printable for long collection names in language ln):
+c_name = get_coll_i18nname(name, ln)
 if len(c_name)30:
 c_printable = c_name[:30] + ...
 else:
@@ -143,7 +144,7 @@
 out.append([c_name, c_printable])
 return out
 
-def get_nicely_ordered_collection_list(collid=1, level=0):
+def get_nicely_ordered_collection_list(ln, collid=1, level=0):
 Returns nicely ordered (score respected) list of collections, more exactly list of tuples
(collection name, printable collection name).
Suitable for create_search_box().
@@ -152,7 +153,8 @@
  WHERE c.id=cc.id_son AND cc.id_dad='%s' ORDER BY score DESC % collid
 res = run_sql(query)
 for c, cid in res:
-# make a nice printable name (e.g. truncate c_printable for for long collection names):
+# make a nice printable name (e.g. truncate c_printable for long collection names in language ln):
+c = get_coll_i18nname(c, ln)
 if len(c)30:
 c_printable = c[:30] + ...
 else:
@@ -450,9 +452,9 @@
 
 colls_nicely_ordered = []
 if cfg_nicely_ordered_collection_list:
-colls_nicely_ordered = get_nicely_ordered_collection_list()
+colls_nicely_ordered = get_nicely_ordered_collection_list(ln=ln)
 else:
-colls_nicely_ordered = get_alphabetically_ordered_collection_list()
+colls_nicely_ordered = get_alphabetically_ordered_collection_list(ln=ln)
 
 colls_nice = []
 for (cx, cx_printable) in colls_nicely_ordered:


Could we (the translators) get a pre-annouce of each new release?

2006-09-26 Thread Ferran Jorba

Hi Tibor,

as I try to catch up with 0.90.1 translations and I watch your
CVS Commit messages, I was wondering wether it would be possible
to know a few days in advance about new releases, so we can do
something about it.

Thanks,

Ferran



[translation issues] Genre in messages

2006-06-27 Thread Ferran Jorba

Hi developers,

I'd like to point out that in the next release there are some code
optimisations that clash with languages (like Catalan or Spanish)
that have different genre (male or female) for objects.  For
example (webcomment_templates.py):

out = _(Your %s was successfully added) + 'br /br /'
out += 'a href=%s' % link
out += _('Back to record') + '/a'
out %= (reviews==1 and _('review') or _('comment'))
return out

It turns out that, for example in Catalan:

- review: la cr?tica (female)
- comment: el comentari (male)

So, in my ca.po, I had to choose:

 #: modules/webcomment/lib/webcomment_templates.py:901
 #, fuzzy, python-format
 msgid Your %s was successfully added
 msgstr El vostre %s ha estat afegit

But it could well be:

 msgstr La vostra %s ha estat afegida

That is, 3 word changes (El - La; vostre - vostra; afegit - afegida);
and similarly in Spanish, at least.

I know that gettext has sophisticated plural handling, but I don't
remember reading anything about genre.

Unless there is any solution, I'm afraid this message optimisation
should be avoided.

Thanks,

Ferran



  1   2   >