IUGW 2017
Hi! For the upcoming IUGW 2017 Connie and @lnielsen are finalizing the program right now. Please note if you want to attend the workshop and did NOT book a hotel yet, that our pre-booked contingents will end tomorrow. Please book now! Garching is very close to Munich, capital city of Bavaria, a city with a lot of events, so these contingents are there for a pretty good reason. (No camping on the TU campus near the reactor ;) The hotels told us that they can not prolong the prebooking. -- Kind regards, Alexander Wagner Deutsches Elektronen-Synchrotron DESY Library and Documentation Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg phone: +49-40-8998-1758 fax:+49-40-8994-1758 e-mail: alexander.wag...@desy.de
Re: TypeError when calling oaiharvest from CLI
Hi! The URL I tried was http://repositorium-dev.uni-muenster.de/oai/miami. This is our staging system, which is so far the only one to produce marcxml output. If it's only about to get Marc records, you may try one of the join2 repos, e.g. https://bib-pubdb1.desy.de/oai2d https://impulse.mlz-garching.de/oai2d https://juser.fz-juelich.de/oai2d https://publications.rwth-aachen.de/oai2d https://repository.gsi.de/oai2d However, I'd suggest to try the openaire or vdb sets. If you harvest blindly you'll get our authority records. They are nice as well, but probably not what you're striving for. -- Kind regards, Alexander Wagner Deutsches Elektronen-Synchrotron DESY Library and Documentation Building 01d Room OG1.444 Notkestr. 85 22607 Hamburg phone: +49-40-8998-1758 fax:+49-40-8994-1758 e-mail: alexander.wag...@desy.de
RE: Please allow any indicator in any field
as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. You miss some very crucial points here. Just two examples. - We have billions of records in this format. - We have thousands of people /extremely/ skilled in this format, some trained for years. I have collegues who don't even think in terms like title or author but in 2450have to look at and 1001_. I do this myself if I want to refer to something unambigiously. We reintroduced this in our project as words where not good enough. So even non-libarians learnd marc so we were able to talk about the same entities efficiently. Especially the latter point gets slightly underestimated if you take the pure IT point of view. Anyway, just consider that if I can do something in Marc I can easily find say 20 people in my (small) library (~50% of it's staff!) who could handle it. Take /any/ other format and I have probably 2 or 3 left with some IT background. And I will need to invest huge ammounts of time for training to get something similar. But, I'll never get the same quality I get from people trained and used to something they do for a decade or so. You might call this unfortunate, but it is just a fact. Thus, it could even be an economic argument. Sad to say, but cataloguers, even skilled cataloguers, are just cheaper than IT professionals. And they are fast at a very low error rate with the stuff they are used to. Though I'm no cataloguer, I am myself much faster in text marc than in MarcXML, cause it's not that chatty. Also my error rate is lower. I'm much faster to even jot down the record in text marc from scratch than with any fancy form-based interface. Depending on what you have to do this is a curcial factor. If I need to correct an error in say 1000 records it is usually more cost effective to just give them to our cataloguers than to write a program that finds all exceptions and handles them well. (For 1000 records usually I don't consider programming, if it is a one time issue.) This however requires that Marc is Marc as it should be and not only some subset of it. In sum: libraries are built for a slightly longer time frame than current technology. Thus we change slower, but we also work for a much longer time. Most likely, even IT development will get much slower, in, say 450 years from now, when it reaches the age of our institutions. Probably, we could then use contemporary technology ;) -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Rob Atkinson [atkin...@fnal.gov] Sent: Friday, March 28, 2014 17:08 To: Ferran Jorba; Esteban Gabancho Cc: Wagner, Alexander; project-invenio-devel (Invenio developers mailing-list) Subject: Re: Please allow any indicator in any field On 03/28/2014 04:39 AM, Ferran Jorba wrote: the problem is more general in Invenio. You can take Marc21 and use a subset of it. Or just use a subset of the tags. Or interpret more strictly, more loosely, use more local tags or subfields. Marc21 is like a legal code (or a programming language, if you feel more confortable with), where there is flexibility. CERN may use a subset of Marc21 where most of the indicators are __. Ok, that's CERNs library decission. But if the Invenio developers claim that Invenio supports Marc21, you *must* allow other indicators there, and consider it valid. Then don't say it supports MARC21. Simple solution. The primary goal of invenio should be to meet the needs of its original institution(s). If marc indicators are not necessary in the database functions of the originating institution, then feel free to ignore them. Avoid getting dragged back 40 years to the days of library catalogs by any mandates to follow every rule to the letter. Those rules may have made sense in 1970 but they don't always now. And MARC development has been under the control of library association committees, made up of librarians, who make decisions based on cataloging rules for description of items (as typed on paper cards) and not contemporary technology. It is important to remember that marc, was originally a U.S. Library of Congress file format designed for large main-frame machines in a day of top down programming, and magnetic tape reel storage. Access was entirely sequential which explains some of the record architecture. the format was intended to be used to generate paper file cards. Modern computing should have freedom to use marc in any way that makes it as suitable as possible for the job and not be hindered by any marc or cataloging rules that are no longer really applicable. Rob
RE: Please allow any indicator in any field
Hello Ferran! [...] I know Marc21 reasonably well, and I don't remember now any case where having different indicators mean something so different that has to be treat differently. Here I would be more careful. Basically, I would treat Marc fields and indicators not as 3 digits plus two other funny chars but consider the whole bunch as a 5 character wide filed designation. I think, here I'm in fact a bit more in line with Estebans approach. At least if I understand it correctly. (Though I agree with you that one might not come up with a complete bibfield list, but just with a set of most common usages.) But «most common usages» won't cover them all, and so, you cannot load arbitrary records coming from unknown sources and expect Invenio to do the expected thing with them. I'm not sure, but I think its basically a missunderstanding but we generally agree. As I said, for indexing/dispaly I perfectly agree with you. In definition of the fields as such, telling invenio what in input e.g. an author should be one could, and probably should, be more explicit. Be conservative in what you do, be liberal in what you accept from others. Perfectly agree. I'd add the famous Einstein her. As simple as possible, but not simpler. ;) I share some concerns about this with Ferran and Martin and some others, and I'm very sure it's quite a task... I don't think it is so difficult if the code just accepts 245%% for title, 100%% for first author, etc. With a 10% effort we could cover more than 95% of the cases. Alexander, would you accept to exchange the current Invenio default behaviour with the default I'm proposing? Knowing that it would not be perfect, do you think that it would be better? I think in general, yes. As said above, I feel we perfecly agree about how Invenios default indexing and even dispaly should be set up, and there %% is in all cases I see better than __. If one defines a field, lets stick to author, I would however suggest, that the definition says: - Author should be 1001_ and stored as lastname, firstname - alternatively 1002_ and sorted firstname lastname (note: deprecated) - as fallback 100%% is treated as author in the index in case we have foreign data (note: very deprecated) You 245 example is quite telling and the people with not to much library background might miss the point here a bit, simply as title is as such a quite simple field in the sense that it is only a string, from the IT point of view. The point missed here, and it is really an important point, is that if you get foreign data you /never/ get 245__ in your Marc, you'll /always/ get indicators. So, stock invenio if I pull in 10.000 records from our latest ebook package e.g. will have no titles. I consider this indeed a bug, not an inconvenience. Even our modernists who consider Marc ancient should agree that data exchange is quite important. And no, I do not know /any/ format that can transport the richness of Marc in a standardized, accepted manner, nor do I know any format that is used for such a host of data as Marc21. (If you consider it ancient, pease note that currently all german library catalogues are migrated to use it instead of our own invention MAB.) Hey, journal literature from the sciences is really quite trivial. Book literatrue from the humanities is quite a different story. If you don't believe it have some fun with http://gso.gbv.de/DB=2.1/PPNSET?PPN=741186039 and it's friends. Additionally, those indicators contain a strong meaning. I agree that the first indicator might be considered superflous in databases (it is probably /not/ if you consider that you have to create a bibliography, and this is a /very/ common request). Disregarding the second indicator is another story. If you have a large bibliography it isn't too sensible to use a blind string sorting. You'll get so many entries in T that you don't find your stuff anymore. Yes, I know, these are offline bibliographies. Yes, I hate to print a database. I feel it is completely sensless. Yes, I work for several years now in converting all our peers to accept an URL instead of 400 pages of paper. BUT, IRL they often don't like trees and insist on silly printed lists. Yes, ignoring The in sorting would be simple, but just to add German we have 3 definite articles and two indefinite articles, of course they have a different lenght and if you add other languages, well. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats:
Re: [pu jsonalchemy] Aggregation of several fields into now
Hi Lars! Be careful with this general simplified notion. In some areas of research author ordering can cause sort of religious wars. And I may also add that depending on the area of research, the question I want all my papers, but only those where I am the first author is /very/ common. Up to the notion my papers == only those where I am the first author == only those count at all. So, a destinction of 1xx/7xx (of course Ferran is right about the other fields) is really important as is the ordering field $b. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi - Reply message - From: Lars Holm Nielsen lars.holm.niel...@cern.ch Date: Thu, Mar 27, 2014 17:08 Subject: [pu jsonalchemy] Aggregation of several fields into now To: Esteban Gabancho esteban.jose.garcia.gaban...@cern.ch Cc: project-invenio-devel (Invenio developers mailing-list) project-invenio-devel@cern.ch On 27.03.2014 15:38, Esteban Gabancho wrote: Hey guys! I have a question about the aggregation of several fields into one. Taking the example of the authors, lets say I have two fields `_first_author` and `_additional_authors` and I want to aggregate then into `authors`. The common case, and the easiest, is when I have one `_first_author` and cero or more `_additional_authors`, in which case I just put a list with the authors (what else right? :-) The problem, or the question, comes when I don’t have a `_first_author` in which case I’m not sure about the content of the `authors` field, it could be i) only the list of `_additional_authors` or ii) `None` follow by the the list of `_additional_authors`. I think the second solution is the closest one to reality, the `None` express that the record doesn’t have a first author. And I also think that we could apply this solution for other cases where we have this kind of situation (like with the `110__` and `710__`). In Zenodo i'm just interested in the list of authors, and the first in the list is by definition the first author. The first author/additional authors are somehow an artifact of having to store in MARC as master format. I'm not sure what we would be the most appropriate solution. Cheers, Lars What do you think? Lars, as you have already pu in production, how do you deal with this problem? Cheers, -- Esteban J. G. Gabancho -- Lars Holm Nielsen CERN, IT Department, Collaboration Information Services http://zenodo.org | Tel: +41 22 76 79182 | Cel: +41 76 672 8927 Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
Re: Search Bracketing
Hi! About bracket reduction: this is what I added to my code. It is not that simple as I generate them from othet ops. I agree, that it would be a good idea if the query parser removes unnecessary brackets. Would avoid trouble. I'll try to build a sample against the demo site. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi - Reply message - From: Tibor Simko tibor.si...@cern.ch Date: Thu, Feb 27, 2014 14:28 Subject: Search Bracketing To: Wagner, Alexander a.wag...@fz-juelich.de Cc: project-invenio-devel@cern.ch project-invenio-devel@cern.ch On Thu, 27 Feb 2014, Wagner, Alexander wrote: The ind:val-example I gave, where you noticed it is perfrctly ok, is actually broken on JuSER. Would you have an example that fails on our demo site? (ind:val1 and ind:val2) and ((ind:val3 or ind:val4) or ind:val5) If you use this query literally, then please note that several pairs of parentheses are not needed here, as they are chaining within the same AND or OR sequence. You can reduce the above to: ind:val1 and ind:val2 and (ind:val3 or ind:val4 or ind:val5) The logic being: (A and B) and ((C or D) or E) == A and B and (C or D or E) It may be helpful to the query parser to always reduce the number of incoming parentheses. Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
Re: Search Bracketing
Hello Tibor! Just shortly, heading for a train: The ind:val-example I gave, where you noticed it is perfrctly ok, is actually broken on JuSER. To get it working properly I have to remove the redundant brackets in the second or expression. One idea of Martin is to re-order and/or/not, but in general I can't drop all brackets, and it's not trivial either. I see the physics case even tried adding spaces but I think those that I were able to add got stripped. It felt a bit like I wanted to add another sort of brackets like in math: (x+y) { a [ (b+c) (d+e) +f]}... :S -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi - Reply message - From: Tibor Simko tibor.si...@cern.ch Date: Wed, Feb 26, 2014 21:29 Subject: Search Bracketing To: Wagner, Alexander a.wag...@fz-juelich.de Cc: project-invenio-devel@cern.ch project-invenio-devel@cern.ch On Wed, 26 Feb 2014, Alexander Wagner wrote: http://invenio-software.org/ticket/131 There are some more tickets that are open in this regard, notably: http://invenio-software.org/ticket/453 041__a:eng vs. (041__a:eng) Note that CDS also emits the following warning in the 2nd case: No exact match found for (041__a:eng), using 041 a: eng instead... This substitute query is wrongly guessed, which leads to wrong results. The troubles stem from the following. There are physics terms such as 'SU(1)' that we don't want to interpret as a parenthesised search, but rather do literal match. Upon seeing '(041__a:eng)', the system interprets it similarly, i.e. not as a composed search, but as a math search, so to speak. This is mostly because there is no blank within parenthesised expression. Adding something tautological to create a Boolean expression would overcome this interpretation, for example: (041__a:eng eng) would return the same number of hits as 041__a:eng. In summary, the best way to use parentheses in order to express composed searches is not to use parentheses around singletons, but always around Boolean expressions, e.g. things containing at least some white space. (ind:val1 and ind:val2) and ((ind:val3 or ind:val4) or ind:val5) This use is perfectly OK. I fear there's still a bug in the in bracket handling. Yes, e.g. see the above ticket #453. We may try to improve parenthesised expression check for word boundaries in order to behave more properly for queries like (xy:zzy), e.g. to give preference to composed search interpretation. Though there are situations like (p,q) where one wants to retain math search interpretation we are favouring now... Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: chatroom future?
Hi! From my point of view it would be nice if more people just connect to the chatroom. Also I do not see why to opt for the closed world if one can have the open one ;) BTW: on our own hgf-project we use a private chat room quite extensively and it is in fact the fastest communication channel there. However, our whole gang just added pidgin/licq/whathaveyou to their autostart mechanism and set it to autologin / stay connected. ;) I do the same with the CERN room. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Tibor Simko [tibor.si...@cern.ch] Sent: Monday, October 28, 2013 15:42 To: project-invenio-devel (Invenio developers mailing-list) Subject: RFC: chatroom future? Hi gang: Since many months, there does not seem to be much presence in our developer chatroom anymore. For the reminder, here are instructions: http://invenio-software.org/wiki/Community/ChatRooms People being absent may make some incoming traffic to be missed and/or unnoticed, moreover we are using various private communication channels on the side. All this may make wider communication channels less optimal in times of need. There are basically two options: a) We can revive the current chatroom in its state and use it more widely. b) We can switch to other popular alternatives we are using anyway, such as Google Hangout or Skype from the closed source world. What would you prefer? Shall we give these alternatives a parallel try for a few weeks and see which of them takes root? Right now we have the usual Jabber chatroom and an experimental Google group hangout. (Please raise hand if you would like to be added to the latter.) Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: Copyright notice per docfile
Hi! I think there is / should be a close connection between knwoledge bases and authority. It could well be that real authority records end up to be a librarian accessible version of a knowledge bases, at least for simple auth records. One idea is, that you can add an auth record and some bibshed tasklet condenses them to suitable kb entries. This will work in some cases. Still, I think, there're some beasts of auth controled things. E.g. our journals are of this kind. We have such information as well when we're talking about authors and institutes, and I'm not sure that this maps to kbs /in general/. At least, I do not see a way how my librarians can handle necessary changes to kbs at the moment at all. On the other hand I think kbs can not handle connected entries. Say you have a logical chain of subsequent entries. Institute A get's renamed to Alpha and later on is called A again and you have to track this history. Would be horizontal connectinos. Same for vertical A is top of a is top of alpha. And then connect this with a horizonal logic on each level. We have this for the instiutes mentioned, but also grant like entites show this. (Project B has several children and is the successor of A, not all children of A have sucessors in B and so on.) I think all this gets really important if you live on stuff that is used for some evaluation of projects or institutes and other kinds of bean counting. How much was accomplished by A split by time and so on. Invenio can do a lot here but it just lacks some infrastructure ot enforce auth control. Our main problem right now is, however, that support for this lacks in the search engine. But this is due to the fact that we added most of the input methods in websubmit. IMHO we should discuss this a bit more in detail, but it gets a bit difficult without good examples and I think hands on will help a lot. @Tibor I'd really like to discuss this when I'm @cern and as Theodoros mentinoes IUGM should have a topic on this. We already did that at the start of our project, as you will remember. But right now I think I can come up with quite a lot more details than back then and also some solutions and pending issues. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Tibor Simko [tibor.si...@cern.ch] Sent: Thursday, May 30, 2013 10:18 To: Theodoros Theodoropoulos Cc: Wagner, Alexander; project-invenio-devel Subject: Re: Copyright notice per docfile On Wed, 29 May 2013, Theodoros Theodoropoulos wrote: Please tell me that you're thinking seriously to revive BibAuthority! Yes, I confirm. It was in the plans for the past year already, but the revival got sidetracked due to shift in priorities. Now it progressively surfaces back to higher positions in the priority list... Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: Copyright notice per docfile
Theodoros, this is exactly the functionality that we are lacking at the moment in search. For our stats it is not an issue, we simply don't search silly things like real names but only IDs. For our users it would indeed be helpful if they could get a handle on the name without the detour through Authorities/People. Anyway, if this would be possible via a kb it would still be important for my librarian to fill it in. Though she could add a 400 field to the auth set I fear kb editing is out of the question. So even if I fear there is a part missing at the moment. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Theodoros Theodoropoulos [th...@physics.auth.gr] Sent: Monday, June 03, 2013 12:03 To: Tibor Simko Cc: Wagner, Alexander; project-invenio-devel Subject: Re: Copyright notice per docfile Although I vaguely remember hearing something about it in the past, I have a simple question related to this topic[1]: Can we get 'See/See also' related results using KBs? For example, I have: 100 $0 ID:1 $a Lastname, Firstname and in another record: 100 $0 ID:1 $a SimilarLastname, SimilarFirstname (note the SAME ID for the two names!) What I want is to get results for SimilarLastname too when I search for Lastname. [btw, mapping: 1---Lastnameand Lastname---SimilarLastname Seems not a proper option because we lose the 'connection' that SimilarLastname has also ID1 and maybe other IDs may have SimilarLastname. Also, even with that I'm not sure how I can apply both these KBs in a search query] If this is impossible with current implementation using KBs, then I believe it should be noted as another VERY useful thing that will come along with Authorities in Invenio. Cheers, Theodoros [1] You realize that we keep discussing this very important issue of Authorities in a thread with an irrelevant subject :) Let's hope that we'll manage to dig these replies up in a future search for this topic... Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: Copyright notice per docfile
Hello Tibor! Perfectly agree that complete Authority is better than KBs on steroids. :) I came to the idea of doping a kb automagically from an auth record form another direction. If you consider a large set of authorities (JuSER holds some 64k of them), and take into account, that the improvement of authorities are shared authorities you may arrive a the point that for X only a certain subset of auth records is applicable. For an average user it's best only to show those applicable to avoid wrong entries. We have this case already as well and currently have a working, but not too fast solution. First of all, we have horizontal and vertical connections and thus a real tree. We need to recurse through the auth records to get the leaves of such a tree only, especially if we want to allow a top level criterion for input (say: someone types energy she should get solar cells as a proposal as this is the child of a child of energy). Then we need to intersect the leaves found with some criterion like not ignored at X (this is an intellectual criterion, so it needs to be tagged manually; e.g. http://juser.fz-juelich.de/record/92322/export/hm, marc 751_7 marks this record not to show up at DESY). Finally, we have to do all this on the fly and return a JSON in websubmit. So, it also has to be fast. Currently, we really do the recursion and the intersection on the backend on real data with cached JS output formats. It works, it's even fast enough at the moment, but one of my ideas to speed it up was to just condense what's needed at X in a specific kb. This would drop the recursion and most likely a bunch of the intersections (though they are not that critical). Change rate of those datasets is something up to once in 5 years, so a once every day type of job would work. At this point it might be worthwile to consider kb doping. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Tibor Simko [tibor.si...@cern.ch] Sent: Monday, June 03, 2013 10:51 To: Wagner, Alexander Cc: Theodoros Theodoropoulos; project-invenio-devel Subject: Re: Copyright notice per docfile On Mon, 03 Jun 2013, Wagner, Alexander wrote: I think there is / should be a close connection between knwoledge bases and authority. It could well be that real authority records end up to be a librarian accessible version of a knowledge bases, at least for simple auth records. One could consider BibAuthority as a BibKnowledge on steroids. KBs allow for simple ``good - bad'' substitutions, while authority records offer much more additional values, comments, material on the target side, that one can take advantage of. Kind of `str - str' vs `str - dict' thing. One idea is, that you can add an auth record and some bibshed tasklet condenses them to suitable kb entries. This will work in some cases. May be cleaner to keep the two modules independent, and let people choose which one to use depending on the given need. At least, I do not see a way how my librarians can handle necessary changes to kbs at the moment at all. Yes, that's the steroid use case. (Or a regular MARC world use case for authority records, one could say.) It is better not to force KBs to handle these cases, rather to improve BibAuthority. Invenio can do a lot here but it just lacks some infrastructure ot enforce auth control. Our main problem right now is, however, that support for this lacks in the search engine. ChrisD was working on this. So the planned revival of BibAuthority shall add appropriate indexing support, search support, deletion/editing support... @Tibor I'd really like to discuss this when I'm @cern +1 Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
Re: Copyright notice per docfile
Hi! BibAuthority. Point is, it can be more complex then a simple string. An auth record can easily hold a pdf or the like with full license description. As we already discussed, when I am @cern I'll give some idea what we do with auth records. :) -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi - Reply message - From: Tibor Simko tibor.si...@cern.ch Date: Wed, May 29, 2013 16:12 Subject: Copyright notice per docfile To: Wagner, Alexander a.wag...@fz-juelich.de Cc: Theodoros Theodoropoulos th...@physics.auth.gr, project-invenio-devel project-invenio-devel@cern.ch On Wed, 29 May 2013, Alexander Wagner wrote: I would say that a cleaner way would be to have some link to a license. Some sort of license as authority where I can add a link to a persistent ID within the system that refers to this very license. Makes it easier to maintain, as you need to keep the relevant licences only in one place and you don't need individual texts per record. This points either to reviving BibAuthority, or else the license string could be a pointer to a knowledge base of licences containing more information. Best regards -- Tibor Simko Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: Invenio 1.0.2 @ FreeBSD 9.1
Hello Sam! Thanks for your fix. But actually the main source of trouble seems to have been wsgi 3.4. We reverted that back to 3.3 and it seems to work now as expected. Concerning the release: JuSER calls itself 1.0.2.11-2232d, so I understand we are a bit beyond 1.0.2. -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Samuele Kaplun [samuele.kap...@cern.ch] Sent: Monday, April 15, 2013 21:53 To: Wagner, Alexander Cc: project-invenio-devel (Invenio developers mailing-list); hgf-invenio-...@desy.de Subject: Re: Invenio 1.0.2 @ FreeBSD 9.1 Hi Alex, In data lunedì 15 aprile 2013 21:10:59, Samuele Kaplun ha scritto: Have you enabled HTTPS? There might be a further bug fix to back port... Cheers, Sam here it is: commit ce280f87aba05d73bf706b4ce19fb0ab42a7f617 Author: Samuele Kaplun samuele.kap...@cern.ch Date: Tue Oct 30 16:30:25 2012 +0100 WebStyle: req.is_https() fix * Fixes the implementation of SimulatedModPythonRequest.is_https() to use wsgi.url_scheme as per PEP333, rather than using wsgiref.util.guess_scheme(). diff --git a/modules/webstyle/lib/webinterface_handler_wsgi.py b/modules/webstyle/lib/webinterface_handler_wsgi.py index 7437d04..7c221a0 100644 --- a/modules/webstyle/lib/webinterface_handler_wsgi.py +++ b/modules/webstyle/lib/webinterface_handler_wsgi.py @@ -25,7 +25,7 @@ import inspect from fnmatch import fnmatch from wsgiref.validate import validator -from wsgiref.util import FileWrapper, guess_scheme +from wsgiref.util import FileWrapper if __name__ != __main__: # Chances are that we are inside mod_wsgi. @@ -237,7 +237,7 @@ class SimulatedModPythonRequest(object): del self.__headers['content-length'] def is_https(self): -return int(guess_scheme(self.__environ) == 'https') +return self.__environ.get('wsgi.url_scheme') == 'https' def get_method(self): return self.__environ['REQUEST_METHOD'] Cheers! Sam P.s. but actually... if you are stick with 1.0.x, then what about installing 1.0.3, rather than 1.0.2 which includes out of the box this and many more bug fixes? ;-) -- Samuele Kaplun Invenio Developer ** http://invenio-software.org/ INSPIRE Service Manager ** http://inspirehep.net/ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
Invenio 1.0.2 @ FreeBSD 9.1
Hi! We currently try to install Invenio 1.0.2 on a FreeBSD 9.1 box. We have exactly the same setup as on our other linux boxes were we do not experience any problems. However, on the BSD box we see the database locked away for access as soon as we fire up the apace webserver. We see several connections to the DB and invenio does not repsond anymore as soon as its subprocesses try to access any of the tables locked by the wsgi-process ie. the apache. Killing the webapplication gives access to the db and shell access allows us to set up the demosite load records an so on. Until we fire up the web server. On the web frontend the immediate visible result is an infinite loop as soon as we try to log in. We tracked that down to the fact that invenio is unable to write to the session tables so we get an invalid session cookie and do never get a valid one as we can't write a valid one to the database. Our environment is a stock FreeBSD 9.1 with the following modules installed from the ports/. mysql 5.5.30 wsgi 3.4 apache 2.2.24-prefork python 2.7.3 python-mysqldb-1.2.3 Any help would be much appreciated as we fiddle with this stuff for a day and wanted to get this running already this morning ;) -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt
RE: Invenio 1.0.2 @ FreeBSD 9.1
Hello Sam! We tried it but our initial problem persists. Upon login we always get: - The page isn't redirecting properly Iceweasel has detected that the server is redirecting the request for this address in a way that will never complete. This problem can sometimes be caused by disabling or refusing to accept cookies. -- Which as far as we tracked it happens as Invenio can not write its sessions table. :S Some open bugs in wsgi beyond 3.3? Any other ideas? -- Kind regards, Alexander Wagner Subject Specialist Central Library 52425 Juelich mail : a.wag...@fz-juelich.de phone: +49 2461 61-1586 Fax : +49 2461 61-6103 www.fz-juelich.de/zb/DE/zb-fi From: Samuele Kaplun [samuele.kap...@cern.ch] Sent: Monday, April 15, 2013 16:44 To: Wagner, Alexander Cc: \project-invenio-devel [project-invenio-devel@cern.ch]\; hgf-invenio-...@desy.de Subject: Re: Invenio 1.0.2 @ FreeBSD 9.1 Hi Alex, In data lunedì 15 aprile 2013 14:17:11, Wagner, Alexander ha scritto: mysql 5.5.30 this sounds like the 5.5.3+ autocommit bug that we have fixed in Invenio 1.1.x. You might want to amend your dbquery.py module with the following patch: [...] @@ -122,24 +116,26 @@ def _db_login(dbhost=CFG_DATABASE_HOST, relogin=0): else: thread_ident = (os.getpid(), get_ident()) if relogin: -_DB_CONN[dbhost][thread_ident] = connect(host=dbhost, +connection = _DB_CONN[dbhost][thread_ident] = connect(host=dbhost, port=int(CFG_DATABASE_PORT), db=CFG_DATABASE_NAME, user=CFG_DATABASE_USER, passwd=CFG_DATABASE_PASS, use_unicode=False, charset='utf8') -return _DB_CONN[dbhost][thread_ident] +connection.autocommit(True) +return connection else: if _DB_CONN[dbhost].has_key(thread_ident): return _DB_CONN[dbhost][thread_ident] else: -_DB_CONN[dbhost][thread_ident] = connect(host=dbhost, +connection = _DB_CONN[dbhost][thread_ident] = connect(host=dbhost, port=int(CFG_DATABASE_PORT), db=CFG_DATABASE_NAME, user=CFG_DATABASE_USER, passwd=CFG_DATABASE_PASS, use_unicode=False, charset='utf8') -return _DB_CONN[dbhost][thread_ident] +connection.autocommit(True) +return connection [...] I.e. the trick is to add connection.autocommit(True) before the return statement. Let me know if this fixes your issue, and we might work on officially backporting the patch on maint-1.0 Cheers! Sam -- Samuele Kaplun Invenio Developer ** http://invenio-software.org/ INSPIRE Service Manager ** http://inspirehep.net/ Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Prof. Dr. Sebastian M. Schmidt