IUGW 2017

2017-03-07 Thread Wagner, Alexander
Hi!


For the upcoming IUGW 2017 Connie and @lnielsen are finalizing the program 
right now.

Please note if you want to attend the workshop and did NOT book a hotel yet, 
that our pre-booked contingents will end tomorrow. Please book now!

Garching is very close to Munich, capital city of Bavaria, a city with a lot of 
events, so these contingents are there for a pretty good reason. (No camping on 
the TU campus near the reactor ;) The hotels told us that they can not prolong 
the prebooking.

-- 
Kind regards,

Alexander Wagner

Deutsches Elektronen-Synchrotron DESY
Library and Documentation

Building 01d Room OG1.444
Notkestr. 85
22607 Hamburg

phone:  +49-40-8998-1758
fax:+49-40-8994-1758
e-mail: alexander.wag...@desy.de



Re: TypeError when calling oaiharvest from CLI

2015-08-07 Thread Wagner, Alexander
Hi!

 The URL I tried was http://repositorium-dev.uni-muenster.de/oai/miami.
 This is our staging system, which is so far the only one to produce
 marcxml output.

If it's only about to get Marc records, you may try one of
the join2 repos, e.g.

https://bib-pubdb1.desy.de/oai2d
https://impulse.mlz-garching.de/oai2d
https://juser.fz-juelich.de/oai2d
https://publications.rwth-aachen.de/oai2d
https://repository.gsi.de/oai2d

However, I'd suggest to try the openaire or vdb sets. If you
harvest blindly you'll get our authority records. They are
nice as well, but probably not what you're striving for.

-- 
Kind regards,

Alexander Wagner

Deutsches Elektronen-Synchrotron DESY
Library and Documentation

Building 01d Room OG1.444
Notkestr. 85
22607 Hamburg

phone:  +49-40-8998-1758
fax:+49-40-8994-1758
e-mail: alexander.wag...@desy.de


RE: Please allow any indicator in any field

2014-03-30 Thread Wagner, Alexander
 as possible for the job and not be hindered by
 any marc or cataloging rules that are no longer really applicable.

You miss some very crucial points here. Just two examples.

- We have billions of records in this format.
- We have thousands of people /extremely/ skilled in this format, some
  trained for years. I have collegues who don't even think in terms
  like title or author but in 2450have to look at and 1001_.
  I do this myself if I want to refer to something unambigiously. We
  reintroduced this in our project as words where not good enough.
  So even non-libarians learnd marc so we were able to talk about the
  same entities efficiently.

Especially the latter point gets slightly underestimated if you take
the pure IT point of view. Anyway, just consider that if I can do
something in Marc I can easily find say 20 people in my (small)
library (~50% of it's staff!) who could handle it. Take /any/ other
format and I have probably 2 or 3 left with some IT background. And I
will need to invest huge ammounts of time for training to get
something similar.  But, I'll never get the same quality I get from
people trained and used to something they do for a decade or so. You
might call this unfortunate, but it is just a fact.

Thus, it could even be an economic argument. Sad to say, but
cataloguers, even skilled cataloguers, are just cheaper than IT
professionals. And they are fast at a very low error rate with the
stuff they are used to. Though I'm no cataloguer, I am myself much
faster in text marc than in MarcXML, cause it's not that chatty. Also
my error rate is lower.  I'm much faster to even jot down the record
in text marc from scratch than with any fancy form-based interface.
Depending on what you have to do this is a curcial factor. If I need
to correct an error in say 1000 records it is usually more cost
effective to just give them to our cataloguers than to write a program
that finds all exceptions and handles them well. (For 1000 records
usually I don't consider programming, if it is a one time issue.)

This however requires that Marc is Marc as it should be and not only
some subset of it.

In sum: libraries are built for a slightly longer time frame than
current technology. Thus we change slower, but we also work for a
much longer time. Most likely, even IT development will get much
slower, in, say 450 years from now, when it reaches the age of our
institutions. Probably, we could then use contemporary technology ;)

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Rob Atkinson [atkin...@fnal.gov]
Sent: Friday, March 28, 2014 17:08
To: Ferran Jorba; Esteban Gabancho
Cc: Wagner, Alexander; project-invenio-devel (Invenio developers mailing-list)
Subject: Re: Please allow any indicator in any field

On 03/28/2014 04:39 AM, Ferran Jorba wrote:

 the problem is more general in Invenio.  You can take Marc21 and use a
 subset of it.  Or just use a subset of the tags.  Or interpret more
 strictly, more loosely, use more local tags or subfields.  Marc21 is
 like a legal code (or a programming language, if you feel more
 confortable with), where there is flexibility.

 CERN may use a subset of Marc21 where most of the indicators are __.
 Ok, that's CERNs library decission.  But if the Invenio developers claim
 that Invenio supports Marc21, you *must* allow other indicators there,
 and consider it valid.

Then don't say it supports MARC21.  Simple solution.
The primary goal of invenio should be to meet the needs of its original
institution(s).  If marc indicators are not necessary in the database
functions of the originating institution, then feel free to ignore them.
  Avoid getting dragged back 40 years to the days of library catalogs by
any mandates to follow every rule to the letter.  Those rules may have
made sense in 1970 but they don't always now.  And MARC development has
been under the control of library association committees, made up of
librarians, who make decisions based on cataloging rules for description
of items (as typed on paper cards) and not contemporary technology.

It is important to remember that marc, was originally a U.S. Library of
Congress file format designed for large main-frame machines in a day of
top down programming, and magnetic tape reel storage.  Access was
entirely sequential which explains some of the record architecture.  the
format was intended to be used to generate paper file cards.

Modern computing should have freedom to use marc in any way that makes
it as suitable as possible for the job and not be hindered by any marc
or cataloging rules that are no longer really applicable.

Rob

RE: Please allow any indicator in any field

2014-03-30 Thread Wagner, Alexander
Hello Ferran!

[...]
  I know Marc21 reasonably well, and I don't remember now any case
  where having different indicators mean something so different that
  has to be treat differently.

 Here I would be more careful. Basically, I would treat Marc fields and
 indicators not as 3 digits plus two other funny chars but consider the
 whole bunch as a 5 character wide filed designation. I think, here I'm
 in fact a bit more in line with Estebans approach. At least if I
 understand it correctly. (Though I agree with you that one might not
 come up with a complete bibfield list, but just with a set of most
 common usages.)

But «most common usages» won't cover them all, and so, you cannot load
arbitrary records coming from unknown sources and expect Invenio to do
the expected thing with them.

I'm not sure, but I think its basically a missunderstanding but we
generally agree. As I said, for indexing/dispaly I perfectly agree with you.
In definition of the fields as such, telling invenio what in input
e.g. an author should be one could, and probably should, be more
explicit.

 Be conservative in what you do, be liberal in what you accept from
 others.

Perfectly agree.

I'd add the famous Einstein her. As simple as possible, but not
simpler. ;)

 I share some concerns about this with Ferran and Martin and some
 others, and I'm very sure it's quite a task...

I don't think it is so difficult if the code just accepts 245%% for
title, 100%% for first author, etc.  With a 10% effort we could cover
more than 95% of the cases.

Alexander, would you accept to exchange the current Invenio default
behaviour with the default I'm proposing?  Knowing that it would not be
perfect, do you think that it would be better?

I think in general, yes. As said above, I feel we perfecly agree about
how Invenios default indexing and even dispaly should be set up, and
there %% is in all cases I see better than __.

If one defines a field, lets stick to author, I would however suggest,
that the definition says:

- Author should be 1001_ and stored as lastname, firstname
- alternatively 1002_ and sorted firstname lastname (note: deprecated)
- as fallback 100%% is treated as author in the index in case we have
foreign data (note: very deprecated)

You 245 example is quite telling and the people with not to much
library background might miss the point here a bit, simply as title
is as such a quite simple field in the sense that it is only a string,
from the IT point of view.

The point missed here, and it is really an important point, is that if
you get foreign data you /never/ get 245__ in your Marc, you'll
/always/ get indicators. So, stock invenio if I pull in 10.000 records
from our latest ebook package e.g. will have no titles. I consider
this indeed a bug, not an inconvenience.

Even our modernists who consider Marc ancient should agree that data
exchange is quite important. And no, I do not know /any/ format that
can transport the richness of Marc in a standardized, accepted manner,
nor do I know any format that is used for such a host of data as
Marc21. (If you consider it ancient, pease note that currently all german
library catalogues are migrated to use it instead of our own invention
MAB.) Hey, journal literature from the sciences is really quite
trivial. Book literatrue from the humanities is quite a different
story. If you don't believe it have some fun with
http://gso.gbv.de/DB=2.1/PPNSET?PPN=741186039 and it's friends.

Additionally, those indicators contain a strong meaning. I agree that
the first indicator might be considered superflous in databases (it is
probably /not/ if you consider that you have to create a bibliography,
and this is a /very/ common request).

Disregarding the second indicator is another story. If you have a
large bibliography it isn't too sensible to use a blind string
sorting. You'll get so many entries in T that you don't find your
stuff anymore. Yes, I know, these are offline bibliographies. Yes, I
hate to print a database. I feel it is completely sensless. Yes, I
work for several years now in converting all our peers to accept an URL
instead of 400 pages of paper.

BUT, IRL they often don't like trees and insist on silly printed
lists. Yes, ignoring The in sorting would be simple, but just to
add German we have 3 definite articles and two indefinite articles, of
course they have a different lenght and if you add other languages,
well.
--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: 

Re: [pu jsonalchemy] Aggregation of several fields into now

2014-03-27 Thread Wagner, Alexander
Hi Lars!

Be careful with this general simplified notion.

In some areas of research author ordering can cause sort of religious wars. And 
I may also add that depending on the area of research, the question I want all 
my papers, but only those where I am the first author is /very/ common. Up to 
the notion my papers == only those where I am the first author == only those 
count at all.

So, a destinction of 1xx/7xx (of course Ferran is right about the other fields) 
is really important as is the ordering field $b.

--
Kind regards,

Alexander Wagner

Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


- Reply message -
From: Lars Holm Nielsen lars.holm.niel...@cern.ch
Date: Thu, Mar 27, 2014 17:08
Subject: [pu jsonalchemy] Aggregation of several fields into now
To: Esteban Gabancho esteban.jose.garcia.gaban...@cern.ch
Cc: project-invenio-devel (Invenio developers mailing-list) 
project-invenio-devel@cern.ch

On 27.03.2014 15:38, Esteban Gabancho wrote:
 Hey guys!

 I have a question about the aggregation of several fields into one.

 Taking the example of the authors, lets say I have two fields `_first_author` 
 and `_additional_authors` and I want to aggregate then into `authors`.
 The common case, and the easiest, is when I have one `_first_author` and cero 
 or more `_additional_authors`, in which case I just put a list with the 
 authors (what else right? :-)
 The problem, or the question, comes when I don’t have a `_first_author` in 
 which case I’m not sure about the content of the `authors` field, it could be 
 i) only the list of `_additional_authors` or ii) `None` follow by the the 
 list of `_additional_authors`.

 I think the second solution is the closest one to reality, the `None` express 
 that the record doesn’t have a first author. And I also think that we could 
 apply this solution for other cases where we have this kind of situation 
 (like with the `110__` and `710__`).

In Zenodo i'm just interested in the list of authors, and the first in
the list is by definition the first author. The first author/additional
authors are somehow an artifact of having to store in MARC as master
format. I'm not sure what we would be the most appropriate solution.

Cheers,
Lars



 What do you think?
 Lars, as you have already pu in production, how do you deal with this problem?

 Cheers,
 --
 Esteban J. G. Gabancho



--
Lars Holm Nielsen
CERN, IT Department, Collaboration  Information Services
http://zenodo.org | Tel: +41 22 76 79182 | Cel: +41 76 672 8927






Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt





Re: Search Bracketing

2014-02-27 Thread Wagner, Alexander
Hi!

About bracket reduction: this is what I added to my code. It is not that simple 
as I generate them from othet ops.

I agree, that it would be a good idea if the query parser removes unnecessary 
brackets.  Would avoid trouble.

I'll try to build a sample against the demo site.

--
Kind regards,

Alexander Wagner

Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


- Reply message -
From: Tibor Simko tibor.si...@cern.ch
Date: Thu, Feb 27, 2014 14:28
Subject: Search  Bracketing
To: Wagner, Alexander a.wag...@fz-juelich.de
Cc: project-invenio-devel@cern.ch project-invenio-devel@cern.ch

On Thu, 27 Feb 2014, Wagner, Alexander wrote:
 The ind:val-example I gave, where you noticed it is perfrctly ok, is
 actually broken on JuSER.

Would you have an example that fails on our demo site?

 (ind:val1 and ind:val2) and ((ind:val3 or ind:val4) or
 ind:val5)

If you use this query literally, then please note that several pairs of
parentheses are not needed here, as they are chaining within the same
AND or OR sequence.  You can reduce the above to:

  ind:val1 and ind:val2 and (ind:val3 or ind:val4 or ind:val5)

The logic being:

  (A and B) and ((C or D) or E) == A and B and (C or D or E)

It may be helpful to the query parser to always reduce the number of
incoming parentheses.

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt





Re: Search Bracketing

2014-02-26 Thread Wagner, Alexander
Hello Tibor!

Just shortly, heading for a train:

The ind:val-example I gave, where you noticed it is perfrctly ok, is actually 
broken on JuSER. To get it working properly I have to remove the redundant 
brackets in the second or expression. One idea of Martin is to re-order 
and/or/not, but in general I can't drop all brackets, and it's not trivial 
either.

I see the physics case even tried adding spaces but I think those that I were 
able to add got stripped. It felt a bit like I wanted to add another sort of 
brackets like in math: (x+y) { a [ (b+c) (d+e) +f]}... :S

--
Kind regards,

Alexander Wagner

Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


- Reply message -
From: Tibor Simko tibor.si...@cern.ch
Date: Wed, Feb 26, 2014 21:29
Subject: Search  Bracketing
To: Wagner, Alexander a.wag...@fz-juelich.de
Cc: project-invenio-devel@cern.ch project-invenio-devel@cern.ch

On Wed, 26 Feb 2014, Alexander Wagner wrote:
 http://invenio-software.org/ticket/131

There are some more tickets that are open in this regard, notably:

   http://invenio-software.org/ticket/453

041__a:eng

 vs.

(041__a:eng)

Note that CDS also emits the following warning in the 2nd case:

   No exact match found for (041__a:eng), using 041 a: eng instead...

This substitute query is wrongly guessed, which leads to wrong results.

The troubles stem from the following.  There are physics terms such as
'SU(1)' that we don't want to interpret as a parenthesised search, but
rather do literal match.  Upon seeing '(041__a:eng)', the system
interprets it similarly, i.e. not as a composed search, but as a math
search, so to speak.  This is mostly because there is no blank within
parenthesised expression.  Adding something tautological to create a
Boolean expression would overcome this interpretation, for example:

  (041__a:eng eng)

would return the same number of hits as 041__a:eng.

In summary, the best way to use parentheses in order to express
composed searches is not to use parentheses around singletons, but
always around Boolean expressions, e.g. things containing at least
some white space.

 (ind:val1 and ind:val2) and ((ind:val3 or ind:val4) or
 ind:val5)

This use is perfectly OK.

 I fear there's still a bug in the in bracket handling.

Yes, e.g. see the above ticket #453.

We may try to improve parenthesised expression check for word boundaries
in order to behave more properly for queries like (xy:zzy), e.g. to
give preference to composed search interpretation.  Though there are
situations like (p,q) where one wants to retain math search
interpretation we are favouring now...

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt





RE: chatroom future?

2013-10-28 Thread Wagner, Alexander
Hi!

From my point of view it would be nice if more people just connect to the 
chatroom. Also I do not see why to opt for the closed world if one can have 
the open one ;)

BTW: on our own hgf-project we use a private chat room quite extensively and it 
is in fact the fastest communication channel there. However, our whole gang 
just added pidgin/licq/whathaveyou to their autostart mechanism and set it to 
autologin / stay connected. ;) I do the same with the CERN room.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Tibor Simko [tibor.si...@cern.ch]
Sent: Monday, October 28, 2013 15:42
To: project-invenio-devel (Invenio developers mailing-list)
Subject: RFC: chatroom future?

Hi gang:

Since many months, there does not seem to be much presence in our
developer chatroom anymore.  For the reminder, here are instructions:

   http://invenio-software.org/wiki/Community/ChatRooms

People being absent may make some incoming traffic to be missed and/or
unnoticed, moreover we are using various private communication channels
on the side.  All this may make wider communication channels less
optimal in times of need.

There are basically two options:

  a) We can revive the current chatroom in its state and use it more
 widely.

  b) We can switch to other popular alternatives we are using anyway,
 such as Google Hangout or Skype from the closed source world.

What would you prefer?  Shall we give these alternatives a parallel try
for a few weeks and see which of them takes root?  Right now we have the
usual Jabber chatroom and an experimental Google group hangout.  (Please
raise hand if you would like to be added to the latter.)

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt





RE: Copyright notice per docfile

2013-06-03 Thread Wagner, Alexander
Hi!

I think there is / should be a close connection between knwoledge bases and 
authority. It could well be that real authority records end up to be a 
librarian accessible version of a knowledge bases, at least for simple auth 
records.

One idea is, that you can add an auth record and some bibshed tasklet condenses 
them to suitable kb entries. This will work in some cases.

Still, I think, there're some beasts of auth controled things. E.g. our 
journals are of this kind. We have such information as well when we're talking 
about authors and institutes, and I'm not sure that this maps to kbs /in 
general/. At least, I do not see a way how my librarians can handle necessary 
changes to kbs at the moment at all.

On the other hand I think kbs can not handle connected entries. Say you have a 
logical chain of subsequent entries. Institute A get's renamed to Alpha and 
later on is called A again and you have to track this history. Would be 
horizontal connectinos. Same for vertical A is top of a is top of alpha. And 
then connect this with a horizonal logic on each level. We have this for the 
instiutes mentioned, but also grant like entites show this. (Project B has 
several children and is the successor of A, not all children of A have 
sucessors in B and so on.)

I think all this gets really important if you live on stuff that is used for 
some evaluation of projects or institutes and other kinds of bean counting. 
How much was accomplished by A split by time and so on.

Invenio can do a lot here but it just lacks some infrastructure ot enforce auth 
control. Our main problem right now is, however, that support for this lacks in 
the search engine. But this is due to the fact that we added most of the input 
methods in websubmit.

IMHO we should discuss this a bit more in detail, but it gets a bit difficult 
without good examples and I think hands on will help a lot. @Tibor I'd really 
like to discuss this when I'm @cern and as Theodoros mentinoes IUGM should have 
a topic on this. We already did that at the start of our project, as you will 
remember. But right now I think I can come up with quite a lot more details 
than back then and also some solutions and pending issues.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Tibor Simko [tibor.si...@cern.ch]
Sent: Thursday, May 30, 2013 10:18
To: Theodoros Theodoropoulos
Cc: Wagner, Alexander; project-invenio-devel
Subject: Re: Copyright notice per docfile

On Wed, 29 May 2013, Theodoros Theodoropoulos wrote:
 Please tell me that you're thinking seriously to revive BibAuthority!

Yes, I confirm.  It was in the plans for the past year already, but the
revival got sidetracked due to shift in priorities.  Now it
progressively surfaces back to higher positions in the priority list...

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




RE: Copyright notice per docfile

2013-06-03 Thread Wagner, Alexander
Theodoros, this is exactly the functionality that we are lacking at the moment 
in search. For our stats it is not an issue, we simply don't search silly 
things like real names but only IDs. For our users it would indeed be helpful 
if they could get a handle on the name without the detour through 
Authorities/People.

Anyway, if this would be possible via a kb it would still be important for my 
librarian to fill it in. Though she could add a 400 field to the auth set I 
fear kb editing is out of the question. So even if I fear there is a part 
missing at the moment.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Theodoros Theodoropoulos [th...@physics.auth.gr]
Sent: Monday, June 03, 2013 12:03
To: Tibor Simko
Cc: Wagner, Alexander; project-invenio-devel
Subject: Re: Copyright notice per docfile

Although I vaguely remember hearing something about it in the past, I
have a simple question related to this topic[1]:
Can we get 'See/See also' related results using KBs?

For example, I have:
100 $0 ID:1 $a Lastname, Firstname
and in another record:
100 $0 ID:1 $a SimilarLastname, SimilarFirstname
(note the SAME ID for the two names!)

What I want is to get results for SimilarLastname too when I search for
Lastname.

[btw, mapping: 1---Lastnameand Lastname---SimilarLastname
Seems not a proper option because we lose the 'connection' that
SimilarLastname has also ID1 and maybe other IDs may have
SimilarLastname. Also, even with that I'm not sure how I can apply both
these KBs in a search query]

If this is impossible with current implementation using KBs, then I
believe it should be noted as another VERY useful thing that will come
along with Authorities in Invenio.

Cheers,
Theodoros


[1] You realize that we keep discussing this very important issue of
Authorities in a thread with an irrelevant subject :) Let's hope that
we'll manage to dig these replies up in a future search for this topic...




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




RE: Copyright notice per docfile

2013-06-03 Thread Wagner, Alexander
Hello Tibor!

Perfectly agree that complete Authority is better than KBs on steroids. :)

I came to the idea of doping a kb automagically from an auth record form 
another direction.

If you consider a large set of authorities (JuSER holds some 64k of them), and 
take into account, that the improvement of authorities are shared authorities 
you may arrive a the point that for X only a certain subset of auth records is 
applicable. For an average user it's best only to show those applicable to 
avoid wrong entries. We have this case already as well and currently have a 
working, but not too fast solution.

First of all, we have horizontal and vertical connections and thus a real tree. 
We need to recurse through the auth records to get the leaves of such a tree 
only, especially if we want to allow a top level criterion for input (say: 
someone types energy she should get solar cells as a proposal as this is 
the child of a child of energy). Then we need to intersect the leaves found 
with some criterion like not ignored at X (this is an intellectual criterion, 
so it needs to be tagged manually; e.g. 
http://juser.fz-juelich.de/record/92322/export/hm, marc 751_7 marks this record 
not to show up at DESY).

Finally, we have to do all this on the fly and return a JSON in websubmit. So, 
it also has to be fast.

Currently, we really do the recursion and the intersection on the backend on 
real data with cached JS output formats. It works, it's even fast enough at the 
moment, but one of my ideas to speed it up was to just condense what's needed 
at X in a specific kb. This would drop the recursion and most likely a bunch of 
the intersections (though they are not that critical). Change rate of those 
datasets is something up to once in 5 years, so a once every day type of 
job would work.

At this point it might be worthwile to consider kb doping.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Tibor Simko [tibor.si...@cern.ch]
Sent: Monday, June 03, 2013 10:51
To: Wagner, Alexander
Cc: Theodoros Theodoropoulos; project-invenio-devel
Subject: Re: Copyright notice per docfile

On Mon, 03 Jun 2013, Wagner, Alexander wrote:
 I think there is / should be a close connection between knwoledge
 bases and authority. It could well be that real authority records end
 up to be a librarian accessible version of a knowledge bases, at
 least for simple auth records.

One could consider BibAuthority as a BibKnowledge on steroids.  KBs
allow for simple ``good - bad'' substitutions, while authority records
offer much more additional values, comments, material on the target
side, that one can take advantage of.  Kind of `str - str' vs `str -
dict' thing.

 One idea is, that you can add an auth record and some bibshed tasklet
 condenses them to suitable kb entries. This will work in some cases.

May be cleaner to keep the two modules independent, and let people
choose which one to use depending on the given need.

 At least, I do not see a way how my librarians can handle necessary
 changes to kbs at the moment at all.

Yes, that's the steroid use case.  (Or a regular MARC world use case for
authority records, one could say.)  It is better not to force KBs to
handle these cases, rather to improve BibAuthority.

 Invenio can do a lot here but it just lacks some infrastructure ot
 enforce auth control. Our main problem right now is, however, that
 support for this lacks in the search engine.

ChrisD was working on this.  So the planned revival of BibAuthority
shall add appropriate indexing support, search support, deletion/editing
support...

 @Tibor I'd really like to discuss this when I'm @cern

+1

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Re: Copyright notice per docfile

2013-05-29 Thread Wagner, Alexander
Hi!

BibAuthority.

Point is, it can be more complex then a simple string. An auth record can 
easily hold a pdf or the like with full license description.

As we already discussed, when I am @cern I'll give some idea what we do with 
auth records. :)

--
Kind regards,


Alexander Wagner

Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi

- Reply message -
From: Tibor Simko tibor.si...@cern.ch
Date: Wed, May 29, 2013 16:12
Subject: Copyright notice per docfile
To: Wagner, Alexander a.wag...@fz-juelich.de
Cc: Theodoros Theodoropoulos th...@physics.auth.gr, project-invenio-devel 
project-invenio-devel@cern.ch


On Wed, 29 May 2013, Alexander Wagner wrote:
 I would say that a cleaner way would be to have some link to a
 license.  Some sort of license as authority where I can add a link
 to a persistent ID within the system that refers to this very
 license. Makes it easier to maintain, as you need to keep the relevant
 licences only in one place and you don't need individual texts per
 record.

This points either to reviving BibAuthority, or else the license string
could be a pointer to a knowledge base of licences containing more
information.

Best regards
--
Tibor Simko




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




RE: Invenio 1.0.2 @ FreeBSD 9.1

2013-04-16 Thread Wagner, Alexander
Hello Sam!

Thanks for your fix.

But actually the main source of trouble seems to have been wsgi 3.4. We 
reverted that back to 3.3 and it seems to work now as expected.

Concerning the release: JuSER calls itself 1.0.2.11-2232d, so I understand we 
are a bit beyond 1.0.2.

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Samuele Kaplun [samuele.kap...@cern.ch]
Sent: Monday, April 15, 2013 21:53
To: Wagner, Alexander
Cc: project-invenio-devel (Invenio developers mailing-list); 
hgf-invenio-...@desy.de
Subject: Re: Invenio 1.0.2 @ FreeBSD 9.1

Hi Alex,

In data lunedì 15 aprile 2013 21:10:59, Samuele Kaplun ha scritto:
 Have you enabled HTTPS? There might be a further bug fix to back port...
 Cheers,
   Sam

here it is:

commit ce280f87aba05d73bf706b4ce19fb0ab42a7f617
Author: Samuele Kaplun samuele.kap...@cern.ch
Date:   Tue Oct 30 16:30:25 2012 +0100

WebStyle: req.is_https() fix

* Fixes the implementation of SimulatedModPythonRequest.is_https()
  to use wsgi.url_scheme as per PEP333, rather than using
  wsgiref.util.guess_scheme().

diff --git a/modules/webstyle/lib/webinterface_handler_wsgi.py 
b/modules/webstyle/lib/webinterface_handler_wsgi.py
index 7437d04..7c221a0 100644
--- a/modules/webstyle/lib/webinterface_handler_wsgi.py
+++ b/modules/webstyle/lib/webinterface_handler_wsgi.py
@@ -25,7 +25,7 @@ import inspect
 from fnmatch import fnmatch

 from wsgiref.validate import validator
-from wsgiref.util import FileWrapper, guess_scheme
+from wsgiref.util import FileWrapper

 if __name__ != __main__:
 # Chances are that we are inside mod_wsgi.
@@ -237,7 +237,7 @@ class SimulatedModPythonRequest(object):
 del self.__headers['content-length']

 def is_https(self):
-return int(guess_scheme(self.__environ) == 'https')
+return self.__environ.get('wsgi.url_scheme') == 'https'

 def get_method(self):
 return self.__environ['REQUEST_METHOD']

Cheers!
Sam

P.s. but actually... if you are stick with 1.0.x, then what about
installing 1.0.3, rather than 1.0.2 which includes out of the box
this and many more bug fixes? ;-)
--
Samuele Kaplun
Invenio Developer ** http://invenio-software.org/
INSPIRE Service Manager ** http://inspirehep.net/




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




Invenio 1.0.2 @ FreeBSD 9.1

2013-04-15 Thread Wagner, Alexander
Hi!

We currently try to install Invenio 1.0.2 on a FreeBSD 9.1 box. We have exactly 
the same setup as on our other linux boxes were we do not experience any 
problems. However, on the BSD box we see the database locked away for access as 
soon as we fire up the apace webserver. We see several connections to the DB 
and invenio does not repsond anymore as soon as its subprocesses try to access 
any of the tables locked by the wsgi-process ie. the apache. Killing the 
webapplication gives access to the db and shell access allows us to set up the 
demosite load records an so on. Until we fire up the web server.

On the web frontend the immediate visible result is an infinite loop as soon as 
we try to log in. We tracked that down to the fact that invenio is unable to 
write to the session tables so we get an invalid session cookie and do never 
get a valid one as we can't write a valid one to the database.

Our environment is a stock FreeBSD 9.1 with the following modules installed 
from the ports/.

mysql 5.5.30
wsgi  3.4
apache 2.2.24-prefork
python 2.7.3
python-mysqldb-1.2.3

Any help would be much appreciated as we fiddle with this stuff for a day and 
wanted to get this running already this morning ;)


--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt




RE: Invenio 1.0.2 @ FreeBSD 9.1

2013-04-15 Thread Wagner, Alexander
Hello Sam!

We tried it but our initial problem persists. Upon login we always get:

-
The page isn't redirecting properly
  Iceweasel has detected that the server is redirecting the request for this 
address in a way that will never complete.
This problem can sometimes be caused by disabling or refusing to accept cookies.
--

Which as far as we tracked it happens as Invenio can not write its sessions 
table. :S

Some open bugs in wsgi beyond 3.3? Any other ideas?

--

Kind regards,

Alexander Wagner
Subject Specialist
Central Library
52425 Juelich

mail : a.wag...@fz-juelich.de
phone: +49 2461 61-1586
Fax  : +49 2461 61-6103
www.fz-juelich.de/zb/DE/zb-fi


From: Samuele Kaplun [samuele.kap...@cern.ch]
Sent: Monday, April 15, 2013 16:44
To: Wagner, Alexander
Cc: \project-invenio-devel ‎[project-invenio-devel@cern.ch]‎\; 
hgf-invenio-...@desy.de
Subject: Re: Invenio 1.0.2 @ FreeBSD 9.1

Hi Alex,

In data lunedì 15 aprile 2013 14:17:11, Wagner, Alexander ha scritto:
 mysql 5.5.30

this sounds like the 5.5.3+ autocommit bug that we have fixed in Invenio 1.1.x.

You might want to amend your dbquery.py module with the following patch:
[...]
@@ -122,24 +116,26 @@ def _db_login(dbhost=CFG_DATABASE_HOST, relogin=0):
 else:
 thread_ident = (os.getpid(), get_ident())
 if relogin:
-_DB_CONN[dbhost][thread_ident] = connect(host=dbhost,
+connection = _DB_CONN[dbhost][thread_ident] = connect(host=dbhost,
  port=int(CFG_DATABASE_PORT),
  db=CFG_DATABASE_NAME,
  user=CFG_DATABASE_USER,
  passwd=CFG_DATABASE_PASS,
  use_unicode=False, charset='utf8')
-return _DB_CONN[dbhost][thread_ident]
+connection.autocommit(True)
+return connection
 else:
 if _DB_CONN[dbhost].has_key(thread_ident):
 return _DB_CONN[dbhost][thread_ident]
 else:
-_DB_CONN[dbhost][thread_ident] = connect(host=dbhost,
+connection = _DB_CONN[dbhost][thread_ident] = connect(host=dbhost,
  port=int(CFG_DATABASE_PORT),
  db=CFG_DATABASE_NAME,
  user=CFG_DATABASE_USER,
  passwd=CFG_DATABASE_PASS,
  use_unicode=False, charset='utf8')
-return _DB_CONN[dbhost][thread_ident]
+connection.autocommit(True)
+return connection
[...]

I.e. the trick is to add connection.autocommit(True) before the return 
statement.

Let me know if this fixes your issue, and we might work on officially 
backporting the patch on maint-1.0

Cheers!
Sam
--
Samuele Kaplun
Invenio Developer ** http://invenio-software.org/
INSPIRE Service Manager ** http://inspirehep.net/




Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt