Re: [Dspace-tech] scientific formulas do not appear as they are when uploaded

2009-08-17 Thread stuart yeates
Morupisi, Maitumelo wrote:
> We have a problem uploading documents with scientific formulas. They do
> not appear as they are in the original.

If your formulas are in the metadata fields and they contain unicode 
supplementary characters, you may be out of luck. This is a 
long-standing issue that also affects some more obscure natural 
languages / scripts (we run into it with Linear B).


Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.
DSpace-tech mailing list

[Dspace-tech] Embargo in 1.6

2009-12-07 Thread stuart yeates
I have some questions about the Embargo plugin in 1.6. I'm basing this 
on and trolling through the 
subversion repository (dspace-1.6.0-rc1 tag).

We'd like to have an drop-down box in our self-deposit which allows 
users to select an embargo period (probably 3, 6, 12, 18 or 24 months). 
This then gets put in the metadata field pointed to by 
embargo.field.terms (probably 'VUW.embargo'), and the date of uplift 
calculated and stored in that pointed to by embargo.field.lift (probably 

The DefaultEmbargoSetter automatically sets the default permissions so 
that while the item metadata for an embargoed item is globally readable, 
the bitstreams are inaccessible to everyone but admins. [This could be 
overridden to a less strict lockdown by overriding the setEmbargo 
method, we havent' thrashed this out yet]  DefaultEmbargoSetter also 
calculates the embargo.field.lift date, from the embargo.field.terms and 
the current time/date.

Once a day the EmbargoManager runs and uplifts items whose embargo has 
expired. Uplifting involves setting the permissions to whatever the 
default permissions are for the collection it's in, making item's 
bitstreams public.

My questions are:
[1] Does the above sound sane?
[2] Is there any way to generate notifications of lifting? The easiest 
thing I can see would be to do a search for the embargo.field.terms 
field, sorting on the availability, and supply that as an RSS feed.
[3] We're considering how to direct users to other sources for the item 
when it's currently embargoed. This would probably be involve displaying 
a block of text which might be inviting them to login and giving them 
alternative access routes to the item. Has anyone done this?
[4] In the wiki at, if I am 
reading things correctly, the second to last option in the config file 
snippet is missing the relevant default option and the last option has 
it truncated. Am I reading it correctly?

Due to my technical skills, I'd prefer options that involve XSLT to 
those involving Java :)

Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Return on Information:
Google Enterprise Search pays you back
Get the facts.
DSpace-tech mailing list

[Dspace-tech] DSpace 1.6 upgrade advice

2010-03-03 Thread stuart yeates

I've read (and believe I understand) the upgrade notes at however, 
I'm concerned that my upgrade won't be as smooth as it might be. Mainly 
because I suspect that changes have been applied outside the approved 
update mechanism and I don't want to loose those changes. I'm planning a 
more conservative upgrade path (outlined below), which I hope will allow 
me to catch any problems before I take our production instance off line. 
I'd greatly appreciate feedback as to whether my approach is flawed, 
and/or whether there are significant classes of issues that I won't 
catch in this manner.

My approach relies on the fact that I have a smaller dev machine that I 
can use for my purposes (alas it is too small to hold the entire assets 
directory). My work is also complicated by the fact that I'm running a 
dual-instance server. Both servers are running the redhat family of 
linux operating systems.

My plan:

0) Do a full backup of live

1) Using a database snapshot and file copy of [dspace-source] and 
[dspace], duplicate both dspace on the dev machine (with the exception 
of the assets directory). Leaving the live server running.

2) Upgrade the dev server and see what issues fall out of the woodwork.

3) Fix those issues in the dev server.

4) Do a full backup of live

5) Stop the live server.

6) Upgrade the live server

7) Transfer any necessary fixes from the dev server to the live server

8) Restart live server

Does that sound like a sane approach? Is there anything I can't test / 
debug / fix in this way?

I'm already aware that:
* handles will be broken (because they'll point to the live server 
rather than the dev site)
* search will be broken (because i'll not have the assets directory)

Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
DSpace-tech mailing list

[Dspace-tech] stop lists for Māori language for us e in search

2010-03-04 Thread stuart yeates
After some discussion in the #dspace channel, here are word frequency 
lists for the Māori language for configuring Māori language support when 

The first set of words is derived ultimately from the Māori Niupepa 
Collection at These are mainly 19th Century 
newspapers in traditional orthography (= no macrons). The commandline 
used to generate them is:

cat [0-9]*/doc.xml | sed 's/]*>[^>]*>//' | sed 
's/<[^>]*>//g' | sed 's/<[^;]*;//g' | grep -vi '[qysdflbvxzc]' | tr 
-cs '[^a-zA-Z]' '\012' |tr '[A-Z]' '[a-z]' |sort | uniq -c | sort -n

9534 ata
9673 wha
9734 kino
9975 motu
   10025 kahore
   10064 katahi
   10083 tahi
   10086 marama
   10120 whai
   10492 ingoa
   10532 wahine
   10548 wa
   10610 kau
   10685 muri
   10740 heoi
   10748 mau
   10864 pa
   10941 kawanatanga
   11182 kaha
   11299 rangatira
   11308 whaka
   11518 ahua
   11560 taha
   11592 tamariki
   11619 rongo
   11717 hui
   12014 mana
   12016 mohio
   12244 ora
   12353 rua
   12697 take
   13280 puta
   13977 engari
   14079 taku
   14103 wahi
   14201 ona
   14417 ahau
   14789 raua
   15247 ture
   15407 kotahi
   15688 utu
   15813 reira
   16081 kite
   16249 noho
   16332 moni
   16586 tera
   16993 whakaaro
   17184 tino
   17299 iho
   17416 tika
   17721 enei
   18284 ara
   18680 tena
   19048 ranei
   19086 tana
   19758 hoa
   20478 koe
   21323 tikanga
   21592 tatou
   21719 aua
   21987 noa
   22317 tae
   22356 whare
   22374 tu
   22400 etahi
   22415 matou
   22916 kaore
   23346 ake
   23887 rawa
   25629 au
   25816 mate
   26329 pakeha
   27063 tau
   27326 ta
   27380 kore
   27961 koutou
   27977 tonu
   28943 kupu
   31213 tona
   31711 pai
   31881 runga
   31903 korero
   32158 roto
   34556 whenua
   34991 tetahi
   35540 katoa
   36485 no
   36609 nui
   36674 kai
   37706 haere
   38645 iwi
   38904 to
   39048 kei
   39090 ma
   42886 ra
   45527 mahi
   48027 hei
   48219 taua
   50332 na
   50620 ratou
   55736 maori
   56889 kua
   58158 hoki
   61591 ano
   65470 ia
   69170 tenei
   72397 tangata
   72615 mea
   72836 ai
   75123 nei
   77790 atu
   80224 mo
   82101 mai
   90419 me
   98716 kia
  117543 ana
  141715 ka
  147326 ko
  156732 he
  193002 a
  283013 nga
  302250 o
  306391 e
  310758 ki
  474233 i
  833949 te

The second set of words is derived from a private corpus (not 
distributable for copyright reasons). This is modern text (20th and 21st 
Century), primarily in modern orthography (= macrons are used)and 
primarily from government and official channels.  The commandline used 
to generate them is:

cat *.xml | sed 's/<[^ ]* xml:lang="en">[^>]*>//' | sed 's/<[^>]*>//g' | 
tr  ' \(\)\{\}\[\];:",.0-9-' '\012' |grep -vi '[qysdflbvxzc]' |tr 
'[A-Z]' '[a-z]' |sort | uniq -c | sort -n

3043 ota
3096 mō
3136 ara
3204 kaunihera
3206 kore
3303 £
3346 taha
3387 tu
3403 rohe
3406 iho
3432 noho
3462 the
3588 riihi
3728 tae
3787 whakahaere
3841 nui
3860 koe
3887 aua
3892 etahi
4055 mau
4057 tona
4063 iwi
4078 tika
4168 utu
4209 pukapuka
4258 poraka
4278 take
4391 reira
4424 wahi
4533 tekau
4623 tekiona
4634 whai
4636 tonu
4702 haere
4721 ā
4770 tuku
4935 no
5099 takiwa
5102 tono
5331 ano
5359 nama
5383 ingoa
5560 na
5586 kupu
5917 to
6164 mana
6295 ake
6348 mea
6712 katoa
7111 mahi
7113 moni
7161 kooti
7233 ratou
7288 tau
7704 tikanga
7845 raro
7945 kei
7956 ma
8562 ranei
8733 kai
9005 hoki
9856 ra
   10239 hei
   10271 tetahi
   10695 ai
   11492 roto
   11612 tenei
   11654 tangata
   11664 runga
   11855 ture
   12186 mai
   12527 ia
   13468 kua
   14336 taua
   15843 nei
   16143 maori
   18231 mo
   18411 ngā
   18833 kia
   21203 atu
   22526 he
   22921 ana
   25268 whenua
   26545 me
   28052 ka
   32916 ko
   44994 a
   52515 nga
   60488 e
   69607 ki
   92490 o
  128901 i
  208905 te

Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
DSpace-tech mailing list

[Dspace-tech] current status of UTF-8 in dspace and XML-UI / Manakin

2008-10-08 Thread stuart yeates
We're having issues with UTF-8 in our UI.

A great many of the metadata fields and controlled vocabularies contain 
characters that aren't in the ISO-latin character sets, principally 
vowels with macrons.

I've looked fairly closely at our install and I don't think we've done 
anything too stupid. Looking at the mailing lists, there's a post 
blaming the issue on the DatabaseManager and suggesting a fix

Is this still the current status?

Stuart Yeates
Te Pātaka Kōrero o Te Whare Wānanga o te Ūpoko o te Ika a Māui   New Zealand Electronic Text Centre Institutional Repository

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
DSpace-tech mailing list

Re: [Dspace-tech] current status of UTF-8 in dspace and XML-UI / Manakin

2008-10-09 Thread stuart yeates
> Hi Stuart,
> I'm replying to the list in case this helps anyone else, or, more 
> probably, in case someone can point out the stupidity of my workaround 
> when importing items.
> What problems are you suffering exactly? Funky characters showing up 
> instead?

We get error message during submission and bad characters showing up.

> If you take a look at, the 
> dc.identifier.citation field uses macrons (eg. Māori). We've had no real 
> problem with this. That's Dspace 1.4.2, JSPUI.

This is with 1.5.* and Manakin / XML/UI.

> I didn't have a lot to do with content on the server I've linked to 
> there, but I've recently been importing large numbers of theses and 
> dissertations from an ADT repository to a test Dspace server, and have 
> had to convert a lot of UTF-8 characters like vowels with macrons and 
> umlauts into their HTML equivalents to satisfy 
> A Maori macron "a" is ā, for 
> example.
> This might have very bad side effects when any non-HTML manipulation or 
> indexing of metadata is going on, so I welcome any criticism or pointers 
> there.
> (For instance, authors' names with umlaut-u in them display fine when 
> browsing by title or after a search, but click that author's name to 
> start a new 'browse by author', and the character becomes garbled and no 
> search results are found.)

I'd check that the connector was handling the URLs correctly. Check

Alternatively, you could write an update for the "text_value" column of 
the "metadatavalue" table, which is where all these fields are stored.

[I didn't see your response on the list, even though you cc'd the list, 
  following up to the list in case other's are having these problems.]

Stuart Yeates
Te Pātaka Kōrero o Te Whare Wānanga o te Ūpoko o te Ika a Māui   New Zealand Electronic Text Centre Institutional Repository

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
DSpace-tech mailing list

Re: [Dspace-tech] two handles???

2008-10-23 Thread stuart yeates
Kyle Kaliebe wrote:
> Hello all,
> One of our users had Dspace open in two browsers on the same machine, 
> one in IE and the other in Firefox. She was logged in on the same 
> account. She then went to the last step of submission for an item. At 
> that point, she claims that her session jumped from Firefox to IE 

Most browsers have a call-back mechanism to allow urls and filetypes not 
handled natively by a browser to be handled (this is how, for example 
flash and most audio formats work as well as links from formats from 
PDFs to HTML files).

Somewhere along the line, a URL most have been opened that invoked this 
mechanism which would have resulted in the default browser being invoked 
to open that URL. The mechanism is very platform specific.  In this case 
I'm guessing that the default browser was I.E.. I.E. had a separate 
session open (based on a separate login and separate session cookie) and 
the user ended up editing the same record simultaneously from two 
sessions and this caused badness.

I'm not a microsoft windows person, but I suspect this could be avoided 
by ensuring that you have the same default browser for http and https 
and using that browser for all the editing actions (as opposed to the 
view and verification actions, which is what I'm assuming you're using 
the second browser for).

How do you determine which is your default browser? Click on links in 
Word, OOo and PDF documents (opened by double clicking the files, rather 
than from within the browser). If they all go to the same browser, 
that's your default, if they don't, consult someone who understands 
application settings on your platform.


Stuart Yeates
Te Pātaka Kōrero o Te Whare Wānanga o te Ūpoko o te Ika a Māui   New Zealand Electronic Text Centre Institutional Repository

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
DSpace-tech mailing list

Re: [Dspace-tech] filter-media problem - question on size limit

2008-10-27 Thread stuart yeates
There are a number of different versions of PDF and a number of 
applications that generate PDFs. Some combinations of version and 
application generate PDFs that are subtly misunderstood by some 
applications that read PDFs.

I suggest that you try to narrow down which application was used to 
generate the PDFs you're having difficulty with.

If you can isolate a set of versions and applications that give you 
trouble you can then open and re-save the PDFs in a tool that doesn't 
have the problem. This can potentially be automated too, if you have 
many PDFs.

We have found, for example, that PDFCreator (the windows-based PDF 
program that works like a print-driver) strips out the full-text when 
used to concatenate documents together. Once we discovered this it was a 
relatively simple matter to adjust our workflow to compensate for the 
problem and catch the few bad PDFs that had already made it through into 
the collection.


Thornton, Susan M. (LARC-B702)[NCI INFORMATION SYSTEMS] wrote:
> I found out something very interesting this weekend.  I took a .pdf file 
> that was "unfilterable"; in other words filter-media displayed an error 
> like this:
>  "ERROR filtering, skipping bitstream #21220 Error: 
> value is not an integer type actual='--20'"
> On a hunch, I looked at the document and found it had several pages of 
> graphics/images in it.  I deleted all pages in the document, which 
> contained images and guess what?  It filtered just fine.
> Hmmm…we have to be able to upload documents that contain images.  NASA 
> has a LOT of images in their documents.  Now what??
> Sue Walker-Thornton
> NASA Langley Research Center
> (757) 224-4074
> -Original Message-
> From: Graham Triggs [mailto:[EMAIL PROTECTED]
> Sent: Friday, October 24, 2008 3:13 PM
> To:
> Subject: Re: [Dspace-tech] filter-media problem - question on size limit
> If anyone has example PDFs that cause the text extraction to fail
> (smaller PDFs preferably!) that they are able to share, please send them
> - or a link to retrieve them - to me.
> Thanks,
> G
> Mark H. Wood wrote:
>>  I found this:
>>  PJX and PDF Jester look, at first glance, as though they might be
>>  worth considering.
>>  OTOH it looks like PDFBox might be getting more attention in its new
>>  home, and if so, then it makes sense to stick with it and help to
>>  improve it.
>>  -
>>  This SF.Net email is sponsored by the Moblin Your Move Developer's 
> challenge
>>  Build the coolest Linux based applications with Moblin SDK&  win great 
> prizes
>>  Grand prize is a trip for two to an Open Source event anywhere in the 
> world
>>  ___
>>  DSpace-tech mailing list
> This email has been scanned by Postini.
> For more information please visit
> -
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great 
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> ___
> DSpace-tech mailing list
> -----
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based 

Re: [Dspace-tech] Dspace Metadata

2008-10-28 Thread stuart yeates
pavan krishnamurthy wrote:
> Hi all,
> This may be stupid question , but please bare with me .
> I added a type called journalrt in the metadata registry. Now when i
> try to submit a document , i am not able to see the journalrt as a
> type .
> What are the changes i need to make to add a type , when we submit a document.

That will largely depend on whether you're using the .jsp or the 
XML/UI/Manakin interface, since they do this differently. The version of 
dspace you're using is also relevant.


Stuart Yeates
Te Pātaka Kōrero o Te Whare Wānanga o te Ūpoko o te Ika a Māui   New Zealand Electronic Text Centre Institutional Repository

This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
DSpace-tech mailing list

Re: [Dspace-tech] Federating a number of DSpace instances

2009-02-10 Thread stuart yeates
We have a situation here in New Zealand where each University has a 
repository (mainly dspace) and the National Library maintains an 
OAI-powered search engine that runs across them.

OAI doesn't include the full text, so searching is significantly limited.

If you search for a relatively rare surname in both, you'd expect to 
find more in the cross-site search, but this isn't the case, because 
most personal names occur in the acknowledgements and bibliography, 
which don't make it into the OAI metadata. Compare

The second URL is the poorer search over the larger body of documents.

The has huge benefits, but exhaustive searching isn't 
really one of them.


Walker, David wrote:
> We have that use case here, John -- we are implementing separate DSpace 
> instances for our 23 campuses, but we also need a separate interface that can 
> search them all.
> We're just going to harvest the data using OAI-PMH.  I'm sure you've thought 
> of that, too.
> --Dave
> ==
> David Walker
> Library Web Services Manager
> California State University
> From: John Preston []
> Sent: Tuesday, February 10, 2009 8:53 AM
> To:
> Subject: [Dspace-tech] Federating a number of DSpace instances
> Does anyone know if any work is going on regarding federating a number
> of DSpace instances so that the group could be considered as a single
> DSpace instance for searching say. My use case has a number of DSpace
> instances that are operated and maintained as individual instances.
> When a user wishes to search for some information, then the search is
> performed across all instances, and returns links to where the info
> was found on the individual instance.
> John
> --
> Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
> software. With Adobe AIR, Ajax developers can use existing skills and code to
> build responsive, highly engaging applications that combine the power of local
> resources and data with the reach of the web. Download the Adobe AIR SDK and
> Ajax docs to start building applications today-
> ___
> DSpace-tech mailing list
> --
> Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
> software. With Adobe AIR, Ajax developers can use existing skills and code to
> build responsive, highly engaging applications that combine the power of local
> resources and data with the reach of the web. Download the Adobe AIR SDK and
> Ajax docs to start building applications today-
> ___
> DSpace-tech mailing list

Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-
DSpace-tech mailing list

[Dspace-tech] script to validate all PDFs ?

2009-02-24 Thread stuart yeates
Does anyone have a script that checks all of the previously uploaded 
PDFs and find ones that are malformed and reports their URLs/record IDs?

I can see how to write a script that uses the unix command line 'file' 
and 'pdftops' tools to check that every file that looks like a PDF is a 
good and valid PDF. Going from a file on the disk to a database record 
I'm not too sure of.

Stuart Yeates   New Zealand Electronic Text Centre Institutional Repository

Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
DSpace-tech mailing list

Re: [Dspace-tech] two Instances of dspace

2011-01-31 Thread stuart yeates
We do exactly this, and Tim's advice is good.

We run our dspace's in different Tomcat engines listening on different 
ports, http://localhost:8080/ and http://localhost:8081/ Each engine has 
a separate solr too, and you'll need to update the URLs in the config.


On 01/02/11 04:04, Tim Donohue wrote:
> Sisay,
> You should not rename the 'dspace.cfg' to be 'dspace2.cfg', as this may
> cause issues.
> You can install two instances of DSpace on one server, just don't rename
> any of the primary DSpace configuration files.  So, if you want two
> instances, you need to have two separate directories like:
> /dspace   ->  first install directory, with all configs, webapps,
> assetstore, etc
> /dspace2  ->  second install directory, with all configs, webapps,
> assetstore, etc. (Do not rename any of the configs or sub directories)
> Essentially, to install two instances, you'll want to follow the
> installation instructions twice (obviously).
> The keys when performing two installs on one server are the following:
> * The two installs should use a separate Database
> * The two installs should use a separate Installation Directory
> * You'll also want to link Tomcat to both installations, and provide
> different URL paths for each in Tomcat (e.g.
> http://localhost:8080/xmlui/ and http://localhost:8080/xmlui2/).
> I hope that helps.
> - Tim
> On 1/31/2011 6:47 AM, Webshet, Sisay (ILRI) wrote:
>> Hi All,
>> I tried to have 2 instances of dspace 1.6.2 on Debian, how ever when I
>> run mvn package
>> Build fails unable to get dspace.cfg.
>> So far I did
>>  1. Create a new database (dspace2)
>>2. Create a new installation directory (dspace2)
>>3. A new sources directory (dspace-1.6.2-src-release-2)
>>  old (dspace-1.6.2-src-release)
>>4. copy dspace.cfg (cp dspace.cfg dspace2.cfg)
>>5. Edit dspace2.cfg as follow
>>dspace.dir = /dspace2
>> =http://[YourURL]:8080/dspace2<http://%5byoururl%5d:8080/dspace2>
>>db.url = jdbc:postgresql://localhost:5432/dspace2
>> What will be my next step?
>> Should I link tomcat to dspace2 folder?
>> Should I run mvn clean?
>> --
>> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
>> Finally, a world-class log management solution at an even better price-free!
>> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
>> February 28th, so secure your free ArcSight Logger TODAY!
>> ___
>> DSpace-tech mailing list
> ------
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> ___
> DSpace-tech mailing list

Stuart Yeates
Library Technology Services

Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY!
DSpace-tech mailing list

Re: [Dspace-tech] DSpace and shibboleth authentication problem

2011-01-31 Thread stuart yeates
This is likely to be a cookies issue, because cookie handling and 
default behaviour are different between IE and firefox.

Likely test cases are where you already have cookies from the shibboleth 
service in your browser's cache.

Debugging the issue is likely to be easier / more consistent if you 
start every time with a new profile.


On 31/01/11 20:55, Lehtilä Tapani wrote:
> We've had a couple shibboleth authentication problems since we updated to 
> 1.6.2. At the same time the service was moved to another computer. That means 
> I can not be sure whether this is a problem of DSpace or only our shibboleth 
> authentication installation.
> Problem is, that sometimes I cannot log into DSpace with an account at some 
> specific computer/browser. For example at this moment my account work right 
> with Mozilla Firefox, but at the same computer login with Internet Explorer 
> has an answer: Authentication Failed.
> Part of dspace.log of that Internet Explorer trials:
> 2011-01-31 08:44:03,035 INFO  org.dspace.authenticate.ShibAuthentication @ 
> Shibboleth login started...
> 2011-01-31 08:44:03,035 INFO  org.dspace.authenticate.ShibAuthentication @ 
> RemoteUser identified as: null
> 2011-01-31 08:44:03,035 ERROR org.dspace.authenticate.ShibAuthentication @ No 
> email is given, you're denied access by Shib, please release email address
> 2011-01-31 08:44:03,035 INFO  org.dspace.authenticate.ShibAuthentication @ 
> Shibboleth login started...
> 2011-01-31 08:44:03,035 INFO  org.dspace.authenticate.ShibAuthentication @ 
> RemoteUser identified as: null
> 2011-01-31 08:44:03,035 ERROR org.dspace.authenticate.ShibAuthentication @ No 
> email is given, you're denied access by Shib, please release email address
> 201
> So the Shibboleth don't give the email address to DSpace.
> At earlier case couple of weeks ago I couldn't log in with this computer with 
> either browser, but when I tried with computer/browser combination which is 
> not in same Windows domain and which I haven't used earlier to log into 
> DSpace it worked well (with the same DSpace account).
> Restarting the shibboleth service helps but the problem has appeared again 
> after a few weeks.
> While this could be a problem of our organization's shibboleth service, I'm 
> asking if somebody has seen something like this and could this be a problem 
> of DSpace? To me it seems some kind on session problem.
> yours
> Tapani Lehtilä

Stuart Yeates
Library Technology Services

Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY!
DSpace-tech mailing list

Re: [Dspace-tech] Urgent: question about search behavior

2011-01-31 Thread stuart yeates
On 01/02/11 10:09, Jizba, Richard wrote:
> I have faculty and staff in our medical education department who want an
> explanation of why their search for “wellness” seems to be picking up
> records that do not contain the word anywhere in the full-text or
> metadata. All they are seeing is the possibility that the search is also
> retrieving just the word “well” or getting the phrase “well as”. These
> folks are very unhappy because they have to do a major curriculum review.
> Is DSpace doing something by default to a search on the word “wellness”
> or is my index messed up?

You don't provide a link to your repository, so I can't check, but 
almost certainly the culprit is stemming. See:

Stuart Yeates
Library Technology Services

Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY!
DSpace-tech mailing list

[Dspace-tech] Bitstream / bundle permissions

2011-02-01 Thread stuart yeates
We are seeing two separate issues related to bitstream / bundle 
permissions which may or may not be linked to

The first issue is that Collection Admins don't appear to be able to 
upload additional files / bitstreams to bundles.

The second issue is that Collection Admins don't appear to be able to 
change the permissions on individual bitstreams on a case by case basis.

In each case we have to do this logged on as Administrator.

Has anyone else managed to get this working? Am I just misunderstanding 
the authorisation section in the dspace.cfg?

We're running 1.6.2.

Stuart Yeates
Library Technology Services

Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY!
DSpace-tech mailing list

Re: [Dspace-tech] More re Porter Stem Filter

2011-02-02 Thread stuart yeates
On 03/02/11 06:35, Jizba, Richard wrote:
> Tim,
> Thanks for the response.
> The first option isn't an option for because we need to search for
> numbers.
> I did see something like the second option, which said to basically
> comment out the PorterStemFilter.
> So my question is, can I eliminate that level of stemming all together.
> This is what I want to do:
>   ===
>  public final TokenStream tokenStream(String fieldName, final Reader
> reader)
>  {
>  TokenStream result = new DSTokenizer(reader);
>  result = new StandardFilter(result);
>  result = new LowerCaseFilter(result);
>  result = new StopFilter(result, stopSet);
>  /*result = new PorterStemFilter(result); */
>  return result;
>  }
> Will this 'break' anything?
> As I understand it, DSpace will then use the DSAnalyser, parse the
> character data into words, convert them to lower case and index the
> terms excluding the stop list.

There is a nice exemplar patch for how to do these kinds of things right at:

If you made such a patch and contributed it, not only would it fix the 
problem for you, but rather than re-fix future releases it would be a 
simple config change.

A more comprehensive fix might be to build parallel indexes, one stemmed 
and stop-worded and one unstemmed and unstop-worded. The indexes would 
take up twice the disk space, but I think not too many people are 
worried about index disk space these days.

> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.

A simple question, but there are complexities here you appear not to 
have thought of.

Stuart Yeates
Library Technology Services

Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY!
DSpace-tech mailing list

Re: [Dspace-tech] re Porter Stem Filter

2011-02-02 Thread stuart yeates
Almost all searches by almost all users, globally, are done on half a 
dozen search platforms (google, bing, etc). These all use extensive 
normalisation (stemming, case folding, unicode normalisation, etc) and 
doing a fabulous job of teaching their users (and by extension our 
users) that this is the way "search" is done.

While I fully support using controlled vocabulary and authority control 
for terms we care about, I believe the battle to define how full-text 
search should work has was lost some time ago (probably last millennia).

I suspect that the real solution to problems such as Richard Jizba's is 
a automated extraction of controlled vocabularies (Medical Subject 
Headings in this case) from the full text and then a browsing / search 
interface to than.


On 03/02/11 07:36, Schumacher, John wrote:
> Hello.
> Opinions on this were requested.
> I agree completely with Richard Jizba.
> John
> John Schumacher
> Office of Library and Information Services
> SUNY System Administration
> SUNY Plaza
> Albany, NY 12246
> 518-320-1477 (Note, new number!)
> 518-320-1554 (fax)
> SUNY Digital Repository
>  Philosophical Discussion 
> I am little surprised that the DSpace community thinks stemming like
> that done by the Porter Stemming Algorithm is so important. I have been
> searching bibliographic databases since the early 1980s and teach
> courses to our health sciences students on search techniques. We have
> always appreciated the systems that give us the power to find exactly
> the terms and the combinations we want. Language is just too rich and
> varied for any other approach in my experience. There have been many
> times when I have needed to search for a singular form of a noun vs a
> plural form or vice versa. Using truncation and wildcard operators is
> not rocket science. Lucene has some really powerful search operators,
> but their power is basically nullified by the Stemming operation.
> Our DSpace instance isn't aimed primarily at a broad worldwide user
> base, but select groups of students, staff and faculty with rather
> sophisticated information needs. Besides, most of our collection can
> also be discovered through Google. Why duplicate that, when I have the
> option of also creating an alternative search environment that provides
> for sophisticated, analytical searches of scholarly, curricular and
> administrative documents?
> You might be surprised at how quickly the people in our Office of
> Medical Education have picked up on the nuances of how and where they
> put metadata, the need for standardized vocabulary in defining lecture
> objectives, and how quickly they figured out what was happening to their
> attempts to search for "wellness" (stemmed to "well"). (It did not
> surprise me!)
> I think the distributed community administration available with DSpace
> will really help our faculty and staff  take seriously the data (text)
> they put into their collections. Our expertise as "consultants" and
> trainers to the staff in the Office of Medical Education has really made
> them appreciate the expertise of librarians, particularly my reference
> librarians who have very good analytical search skills. Don't sell
> people short -- they can be very sophisticated which means we need to
> provide them with powerful tools, not heavy-handed interventions (the
> Porter Algorithm)
> I'm planning on being at OR11 and would be happy to discuss this over a
> beer.
> If anybody is still with me, I would be curious if there is a
> LowerCaseFilter that would permit the retention of capital 'A's.
> Eliminating 'A's in medical research databases is a problem. Vitamin A
> is the obvious example, but there are many other occurrences of 'A' as
> an important, non-trivial term in a name.
> Richard Jizba
> Creighton University
> --
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> ___
> DSpace-tech mailing list

Stuart Yeates
Library Technology Services


Re: [Dspace-tech] LDAP Authentication

2011-02-03 Thread stuart yeates
However maintains the LDAP at your site almost certainly controls who 
can access it. If your new server has a different machine name or IP 
address, their records may need to be updated.

There are also all manner of firewall issues that can occur, depending 
on the configuration at your site.


On 04/02/11 04:46, Savage, Karen R. wrote:
> I'm upgrading our instance of Dspace from 1.3 to 1.7 and at the same time, 
> migrating to a new server. I have the LDAP config information from the old 
> config file (and have confirmed with our IT guy that it is still correct), 
> but when I enter it on the new config file, I can't log in. Is there 
> something I need to do to the server itself before it'll work? (I wasn't 
> around when our live instance first went up). This is the last bit I need to 
> get working.
> Running:
> RHEL 4.1
> Tomcat 6.0
> Java JDK 1.6
> PostgreSQL 9.0
> Dspace 1.7
> --
> Karen Savage
> Baylor University Libraries
> Electronic Library
> Library Systems
> (254) 710-3275
> --
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> ___
> DSpace-tech mailing list

Stuart Yeates
Library Technology Services

The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world?
DSpace-tech mailing list

Re: [Dspace-tech] Error SWORD, servicedocument

2011-02-06 Thread stuart yeates
Sounds like an institutional firewall problem to me. Check your firewall 
/ proxy / internet connection options in your browser and ensure that 
you're giving similar options to the SWORD client and/or curl


On 05/02/11 01:10, Julio Pemau wrote:
> The Sword is working, I have tried to access our Service Document
> through a web browser at
> ( When I access directly
> through this option it works as I get the XML that corresponds to our
> service document but when I try to access the service document via SWORD
> servlet client or SWORD client I get the following error:
> "Server returned: org.purl.sword.client.SWORDClientException:
> Read timed out"
> We havent tested the SWORD via curl command but we did with the SWORD
> JAVA client, and in this case it doenst work either.
> We will try to make deposits via curl command to check if it works
> through this other option.

Stuart Yeates
Library Technology Services

The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world?
DSpace-tech mailing list

Re: [Dspace-tech] $PATH in Unix for DSpace

2011-02-06 Thread stuart yeates
$PATH and $CLASSPATH are two completely separate variables.

On POSIX $PATH is the list of directories of executables. Often these 
are directories ending in .../bin

In java, $CLASSPATH is the list of directories of to look for compiled 
classes (usually in .jar format). Often these are directories ending in 
.../lib The default Tomcat startup script adds all the necessary lib 
directories to the $CLASSPATH, so you shouldn't need to add them 
yourself unless you've changed anything.

We use:




On 06/02/11 02:57, Thornton, Susan M. (LARC-B702)[LITES] wrote:
> Hmmm….can anyone else comment on this?  What does everyone put in their
> $CLASSPATH for DSpace?
> Thanks,
> Sue
> *//*
> *//*
> */Sue Walker-Thornton/*
> */Software Developer/Database Administrator/*
> */NASA Langley Research Center|LITES Contract/*
> */(757) 224-4074/*
> *From:*Alvaro Sandoval []
> *Sent:* Thursday, February 03, 2011 3:31 PM
> *To:* Thornton, Susan M. (LARC-B702)[LITES]
> *Subject:* Re: [Dspace-tech] $PATH in Unix for DSpace
> Hi Susan:
> We only included dspace jar libraries, like this:
> CLASSPATH="/opt/instaladores/apache-log4j-1.2.15/log4j-1.2.15.jar:/opt/instaladores/commons-cli-1.1/commons-cli-1.1.jar:/opt/dspace-1.6.0-src-release/dspace-api/target/classes"
> CLASSPATH="$CLASSPATH:/opt/instaladores/commons-dbcp-1.2.2/commons-dbcp-1.2.2.jar:/opt/instaladores/commons-pool-1.3/commons-pool-1.3.jar:/opt/instaladores/postgresql-8.1-412.jdbc3.jar"
> CLASSPATH="$CLASSPATH:/opt/instaladores/javamail-1.4.1/mail.jar"
> export CLASSPATH
> Where /opt/instaladores is an arbitrary directory where we downloaded
> some classes.
> By the way, we have DSpace 1.6.0
> Regards,
> Alvaro
> El 03/02/11 17:15, Thornton, Susan M. (LARC-B702)[LITES] escribió:
> Ok great. Now what about $CLASSPATH? Is there any need to have
> /dspace/lib in our $CLASSPATH?
> *//*
> *//*
> */Sue Walker-Thornton/*
> */Software Developer/Database Administrator/*
> */NASA Langley Research Center|LITES Contract/*
> */(757) 224-4074/*
> *From:*Alvaro Sandoval []
> *Sent:* Thursday, February 03, 2011 12:48 PM
> *To:* Thornton, Susan M. (LARC-B702)[LITES]
> *Subject:* Re: [Dspace-tech] $PATH in Unix for DSpace
> Hi:
> You just need to include java, maven and ant paths. Here's a copy of our
> PATH, running Dspace on Debian 5.
> PATH=/usr/local/java/jdk1.6.0_05/bin:/opt/apache-maven-2.0.8/bin:/usr/local/ant/bin
> export PATH
> Regards,
> Alvaro
> El 03/02/11 14:01, Thornton, Susan M. (LARC-B702)[LITES] escribió:
> Hi,
> Can someone tell me what needs to be in $PATH in order to run DSpace on
> a Unix server? I was looking at how we’ve defined $PATH and I think we
> may have some things in there we don’t need, including:
> /dspace/lib
> /dspace/bin
> /dspace-{version}-src-release
> We also have these directories in there also:
> /usr/postgres/{version}/bin
> /usr/local/pgsql/bin
> /usr/local/pgsql/lib
> I’m thinking we don’t need any of these in $PATH.
> We do, of course, have the locations of Maven, Ant, and Java in there.
> I would appreciate any input.
> Thanks a bunch,
> Best regards,
> Sue
> */Sue Walker-Thornton/*
> */Software Developer/Database Administrator/*
> */NASA Langley Research Center|LITES Contract/*
> */SGT, Inc.|130 Research Drive/*
> */Hampton, Va. 23666/*
> */Office: (757) 224-4074/*
> */Mobile: (757) 506-9903/*
> */Fax: (757) 224-4001/*
> */ <>/*
> --
> Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
> Finally, a world-class log management solution at an even better price-free!
> Download using promo code Free_Logger_4_Dev2Dev. Offer expires
> February 28th, so secure your free ArcSight Logger TODAY!
> ___
> DSpace-tech mailing list
>  <>

Re: [Dspace-tech] allowCDATA

2011-02-08 Thread stuart yeates
On 09/02/11 08:36, helix84 wrote:
> On Tue, Feb 8, 2011 at 16:43, Wendy J Bossons  wrote:
>> The 10,000 dollar question - Is it possible to allow CDATA output in the
>> head of a DSpace document? I want to add some javascript that contains an&
>> and cannot get the source output to include CDATA tags. I am writing it in
>> the structural.xsl this way . . . Everytime I view the html source, all I
>> get are the //, no cdata tags. They have been stripped.

Re: [Dspace-tech] Encrypted Assetstores

2011-02-09 Thread stuart yeates
On 10/02/11 07:57, Joseph Rhoads wrote:
> Dear Dspace-tech Community,
> Does anyone have any experience storing their files in an encrypted
> assetstore?
> I know I can make DSpace use HTTPS and encrypt the data while in transit
> but I’m also concerned about the data while at rest.

It probably deserves to be noted that there is an explicit tension 
between preservation and encryption here. The chances of being able to 
recover from various catastrophic failures is substantially reduced by 
the move to encryption.

Having said that, if I needed to do this, I'd use file-system level 
encryption for the entire directory. Depending on your platform this 
might be eCryptfs (linux), BitLocker (Microsoft) or FileVault (MacOS).

Stuart Yeates
Library Technology Services

The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
DSpace-tech mailing list

[Dspace-tech] SWORD URL issue

2011-02-14 Thread stuart yeates
I'm having difficulties with SWORD URL handling.

I'm running 1.6 at it's behind apache 
httpd, the obviously relevant section of the config is:

 Order deny,allow
 Allow from all
 ProxyPass /down/ !
 ProxyPass / http://localhost:8080/
 ProxyPassReverse / http://localhost:8080/

#CustomLog logs/researcharchive_access_log combined

in my dspace.cfg I have:

sword.deposit.url =
sword.servicedocument.url = http://localhost:8080/sword/servicedocument =

[I've tried a number of permutations of these]

When I try to deposit an item the error message I get is:

java.lang.StringIndexOutOfBoundsException: String index out of range: -2

The full error is at:

As near as I can see, SWORDUrlManager.getDSpaceObject() should be using 
URL-parsing methods to extract and operate on only the file part of the 
URL. At the very least URL canonicalisation needs to happen here.

Or have I misunderstood something?


Stuart Yeates
Library Technology Services

The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
DSpace-tech mailing list

Re: [Dspace-tech] Two Instances dspace 1.6.2

2011-02-17 Thread stuart yeates
We have a very similar setup, but we have the two servers listening on 
different ports, we have httpd sitting in front, using the virtual host 
to proxy the requests to the correct port based on the host name in the 
incoming request.

On POSIX machines, only one process can listen to a single port.


On 17/02/11 19:00, Webshet, Sisay (ILRI) wrote:
> Hi All,
> I just already have a dpsace 1.6.2 on debian server. So far I have done
> as follows
> I created the second instance of dspace from scratch by downloading the
> release install, packaging, installing, and configuring.I believe I
> have everything setup correctly.
> I created a directory dspace2
> A separate database  dspace2
> I run maven and ant successfully
> Link Tomcat to both installations,
> Both the first and the second instance use the same port 8180
> Below the settings in dspace.cfg.
> dspace.dir =/home/dspace2
> dspace.baseUrl =
> db.url = jdbc:postgresql://localhost:5432/dspace2
> Nothing is changed inside tomcat server.xml
> the error as follows
>   *HTTP Status 404 - /dspace2/xmlui*
> *type* Status report
> *message* _/dspace2/xmlui_
> *description* _The requested resource (/dspace2/xmlui) is not available._
>   *Apache Tomcat/5.5*
> Thanks
> Sisay

Stuart Yeates
Library Technology Services

The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
DSpace-tech mailing list

Re: [Dspace-tech] Turning SSL on with self-signed certificate breaks solr functionality

2015-04-09 Thread Stuart Yeates
We do HTTPS by putting apache HTTPD in front of tomcat. Tomcat works in 
pure-HTTP (but is not accessible from the network) and HTTPD proxies tomcat on 
HTTP and HTTPS as necessary.


I have a new phone number: 04 463 5692 /

From: Chris Gray 
Sent: Friday, 10 April 2015 3:10 a.m.
Subject: [Dspace-tech] Turning SSL on with self-signed certificate breaks   
solr functionality

We're using DSpace 5.1 and when we turn on SSL as per the instructions
in the installation documentation then browsing and RSS feeds break.

Looking at the localhost access logs it looks like requests to solr on return a 302 status rather than 200.

Using wget from the command line I'm told I need to add the
--no-check-certificate parameter.

Is there a way to have tomcat7 force 8080 traffic to 8443 only for the
hostname and public IP address and not for localhost and


BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises event?utm_
DSpace-tech mailing list
List Etiquette:

BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises event?utm_
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Initial Community Screen really slow to load on 5.1

2015-04-13 Thread Stuart Yeates

(a) is it this particular page which is slow, or is it any first page in a user 
session (which just happens to usually be this page) ?

(b) are subsequent reloads of the same page shortly after also slow ?

Slowness is likely to be caused by the Java app (or the entire VM) being 
swapped out from memory; database connections needing to be reestablished; 
per-session customisations meaning nothing can be cached; or HTTPD DNS lookups.


I have a new phone number: 04 463 5692 /

From: Wally Grotophorst 
Sent: Tuesday, 14 April 2015 3:02 p.m.
Subject: Re: [Dspace-tech] Initial Community Screen really slow to load on 5.1

Did a complete rebuild of DSpace 5.1 today and it seems to be working much 
better now...back to the 1-2 second response time.
So I don't know what the problem might have been but I post this followup just 
in case someone hits a similar issue.

- Wally

Just upgraded our DSpace instance from 4.3 to 5.1.   A few hiccups along the 
way but it is running now.   One issue is the very long (10-15 seconds) wait 
time until the initial page (the community list) loads.   Once it loads things 
go swiftly on any subsequent page.

I ran a full vacuum on the postgres database but that didn't improve things.
And I have set this to "false" in dspace.cfg = false

and this: = 18 hours

Neither made much difference.

Is there anything else that I might try to speed this up...or perhaps a hint on 
which log to study to see what it's doing during that long delay.  Any 
suggestions will be welcomed.   I should say that this same platform ran 4.3 
quite briskly.

Theme:  Mirage 1 XMLUI
but I see the same slowness on the JSPUI side as well

JAVA_OPTS="-d64 -server -XX:PermSize=512m -XX:-UseParallelGC -verbose:gc 
-Xms2048m -Xmx4G -Djava.awt.headless=true -Dfile.encoding=UTF-8"

Server:  Mac OS 10.10.3
Tomcat: 7.0.59
Postgres 9.3

- Wally

Wally Grotophorst
George Mason University
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises event?utm_
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] SFX links - suppress in collection?

2015-05-07 Thread Stuart Yeates
I would be tempted to see whether the link can be hidden in CSS. If possible 
this would be the minimally intrusive method.



I have a new phone number: 04 463 5692 /

From: Searle, Shannon 
Sent: Friday, 8 May 2015 3:15 a.m.
Subject: [Dspace-tech] SFX links - suppress in collection?


I am currently running a DSpace instance on 4.3 with Mirage, and I have two 
different types of collections. One is harvested from another agency and is 
exclusively open access pdf links, and the other collection is links to 
journals and articles that are available through shibboleth+our SFX server. The 
link works fine for the content of one collection, but is completely unrequired 
and confusing to users in the other. I see that altering the render in older 
versions at the file will fix it at item level - -  my 
java/xml/theming skills are pretty non-existent; is there any way to remove the 
links by collection? I couldn't find anyone who has tried to do this at this 
level previously?

Any help much appreciated...


Shannon Searle
Library Systems Integration and Development Officer
Barrington Library
Cranfield University
Shrivenham, Swindon
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.;117567292;y___
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Repec export

2015-06-15 Thread Stuart Yeates
We have a highly-customised version of the same script which I believe has this 
defect fixed.

This version of the script also does some quality control checking and squawks 
if the metadata doesn't meet the RePEc minimums (which usually indicates a user 
error somewhere at our end). We run it with crontab lines that look like: 

32 12 * * * cd /var/www/html/local/RePEc/ ; 
/var/www/html/local/RePEc/  /var/www/html/local/RePEc/vuwecf.conf


I have a new phone number: 04 463 5692 /

From: TAYLOR Robin 
Sent: Monday, 15 June 2015 9:17 p.m.
To: dspace-tech; DSpace Developers
Subject: [Dspace-tech] Repec export

(Apologies for cross-posting)

Hi all,

We run a Perl script ( periodically that uses OAI-PMH to 
generate a file which we use to update I've forgotten the 
origins of the script but it was downloaded and I believe is used at other 
sites. Anyway, we recently upgraded DSpace to 4.2 and the script stopped 
working as it has hardcoded assumptions about the OAI resumption tokens which 
are no longer valid at 4.2. I'm emailing on the off chance that others have 
experienced the same problems and have already fixed the script?

Cheers, Robin.

Robin Taylor
Main Library
University of Edinburgh
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

DSpace-tech mailing list
List Etiquette:

DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Restoring Backed up Dspace files

2014-01-23 Thread Stuart Yeates
Do you also have a backup of the SQL database used to store the item metadata? 
Exactly what that looks like and where it's stored will depend on the database 
you use and how you've been backing it up.


From: Eric Martyns []
Sent: Friday, 24 January 2014 10:31 a.m.
Subject: [Dspace-tech] Restoring Backed up Dspace files

i had a fire incident that destroyed our dspace server,
we were able to extract some files from the server.
right now we installed the dspace 4.0 but we have a challenge of moving these 
to the new dspace 4.0 so we can see what's in them.
we initially had the 1.8 version

How do we migrate these data to the new dspace installed
here is one of the files pulled out and its content,
dspace folder containing

Best Regards
Martyns Eric
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] CJK Support on DSpace 4.0

2014-01-26 Thread Stuart Yeates
If browsing to Latin characters works and browsing to CJK characters doesn't, 
it's almost certainly a case that either httpd, your connector or tomcat is 
confused about whether it should be using UTF-8.

In particular check that you connector has the URIEncoding="UTF-8" attribute. 
Search the archives of this list for "UTF-8" for other related issues.


From: Alexander Wong []
Sent: Monday, 27 January 2014 3:12 p.m.
Subject: [Dspace-tech] CJK Support on DSpace 4.0

Dear All,

Recently I have been able to upgrade my DSpace installation from 1.5 to 4.0.
However the only issue is that I have some of my items carrying CJK characters.

On the author browsing page:

Clicking on the item with Chinese will return:
No Entries in Index
There are no entries in the index for "All of DSpace".

Any advice will be grateful. Thanks again.

Best Regards,
Alexander Wong
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Submission taking too much time after license agreement

2014-03-06 Thread Stuart Yeates
[Don't worry about your English; the description of the problem is fine.]

(a)The first thing I do in terms of Java slowness is to repeat the 
operation. Often tools such as tomcat can be very slow to start up, but once 
they're started they can be much quicker.

(b)   The second thing is to make sure that you've giving your java enough 
memory (not enough memory results in excessive garbage collection) but not too 
much (which can result in excessive swapping). Try doubling or halving the 
memory you give it. Use system tools to see how much it's actually using of 
what you give it.

(c)Use a tool like 'top' to see what is running during the 10 minutes. If 
java / tomcat is running, it's a dspace issue. If postgres is running it's a 
database issue. If nothing is running it's a network issue (such as trying to 
connect to a remote site and waiting for a timeout) or a file I/O issue (such 
as writing a very large file to a slow disk, etc).

From: Elifelet Lopez Ramos []
Sent: Friday, 7 March 2014 2:52 p.m.
Subject: [Dspace-tech] Submission taking too much time after license agreement

Hello All

I'm new in the lists world and English is not my first language so thank you 
for your understanding.

I'm using DSpace 3.2 and as the title says, when an Item is submitted it takes 
too much time on the final step (the license agreement). I checked the logs and 
found a line that says that Apache Cocoon processed the request en 10.092 

I don't know where to start searching for a fix since there is no error at all.
Basically i haven't change anything since the installation (except from the 
header file of the jspui) so everything should be default.

Could someone give me some guidance please?
I'm attaching part the cocoon log where i found the line I'm talking about 
(it's almost at the end).

Elifelet López

Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] news-xmlui.xml issue

2014-03-17 Thread Stuart Yeates
Usually the issue is that news-xmlui.xml is not well formed XML, or perhaps has 
the wrong namespace declaration.

Check the first couple of lines in the file look the same as the original 
version and validate the file using xmllint or similar.


-Original Message-
From: Gord Ripley [] 
Sent: Tuesday, 18 March 2014 9:33 a.m.
Subject: [Dspace-tech] news-xmlui.xml issue


I have 4.1 running on Windows 7. So far, so good. 

Except that the text from [dspace]/config/news-xmlui.xml is not displaying on 
the front page. Neither the downloaded version nor a custom version that worked 
in 3.2 are functional. The text is simply omitted. Consequently, the first 
words on the page are ... 'Communities in DSpace'. Has anyone else encountered 
this, and is there a fix?


Gord Ripley,
Research and Development Librarian,
Bata Library, Trent University
Voice : 705- 748-1011 ext. 7517
Web : 

Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the 
definitive new guide to graph databases and their applications. Written by 
three acclaimed leaders in the field, this first edition is now available. 
Download your free book today!
DSpace-tech mailing list
List Etiquette:

Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Java Heap Space Problems

2014-03-25 Thread Stuart Yeates
We restart our application stack (httpd, tomcat, postgres and handle server) 
immediately prior to the start of each business day (Mon-Fri).

The timing is so that there are staff around to fix things if the restart 
doesn't work cleanly.

We're on linux, so we're using cron for this.


-Original Message-
From: Matthew Sherman [] 
Sent: Wednesday, 26 March 2014 8:06 a.m.
To: dspace-tech
Subject: [Dspace-tech] Java Heap Space Problems

Hi all,

I was hoping some one can give some insight to a problem we are having.  We are 
running DSpace 1.8 and every couple of weeks we end up having to reboot the 
Tomcat because it crashes due to a Java heap space error.  We have been trying 
to correct it since the beginning of the year by incrementally increasing the 
memory assigned to Java.  By now we are at half the server memory, roughly 2 
GB, assigned to Java and while it is taking longer we still are having to 
reboot due to heap space errors.  We are not sure what the issue is and we are 
at loss as to how to fix it because throwing more memory at it is not fixing 
the problem.  I would welcome any thoughts, suggestions, or advice on how to 
fix this.  Thanks for your time and insights.

Matt Sherman

Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the 
definitive new guide to graph databases and their applications. Written by 
three acclaimed leaders in the field, this first edition is now available. 
Download your free book today!
DSpace-tech mailing list
List Etiquette:

Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] using two google analytics accounts in DSpace

2014-04-30 Thread Stuart Yeates
We do something similar. We rolled it out a while ago, before the new the new 
"Universal" thing. If you look at the source of there are two google code sections. The first 
is our hand-rolled cross-site google code:";>


Which was cut-and-pasted directly into the skin that we're using. Below that is 
the default code:

   var _gaq = _gaq || [];
   _gaq.push(['_setAccount', 'UA-6716162-1']);

   (function() {
   var ga = document.createElement('script'); ga.type = 
'text/javascript'; ga.async = true;
   ga.src = ('https:' == document.location.protocol ? 
'https://ssl' : 'http://www') + '';
   var s = document.getElementsByTagName('script')[0]; 
s.parentNode.insertBefore(ga, s);


From: Jennifer Cwiok []
Sent: Thursday, 1 May 2014 7:38 a.m.
Subject: [Dspace-tech] using two google analytics accounts in DSpace

Hi everyone,

Our centralized IT group wants to implement Universal Google Analytics in our 
organization and our department is one of the test cases. We already have 
Dspace stats and a Google Analytics account for Dspace. We want to add the 
Universal Google Analytics code from our central IT in addition to the 
statistics we already have in place. Does anyone know if this is possible? And 
where would I add the code for the Universal Google Analytics account?

Thanks in advance,


Jennifer Cwiok
Digital Projects Manager
American Museum of Natural History
Research Library
79th Street and Central Park West
New York, NY 10024-5192

"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Fwd: postgreSQL Backup: -o (preserve oids) needed?

2014-05-19 Thread Stuart Yeates
I can see that option being useful if you're debugging the database and you 
want the database restored as close to possible to the original, to make 
diff'ing query plans more useful.


-Original Message-
From: Christian Völker [] 
Sent: Tuesday, 20 May 2014 11:08 a.m.
To: dspace-tech Tech
Subject: [Dspace-tech] Fwd: postgreSQL Backup: -o (preserve oids) needed?


my backup command for postgres is

/usr/bin/pg_dumpall -o -c -v > /srv/dspace/dumpall.sql

I have carried that with me since DSpace 1.3 or DSpace 1.4. Currently, I use 
DSpace 1.8.

Now, while moving to a new machine, I revisited the section under Architecture 
/ Storage Layer in th Documentation and could not find any hint as to why I had 
used the -o option at all. The oldest version of the docs that I still have 
stored locally is DSpace 1.4.2 and there is no mention of -o either.

Had there ever been a requirement to preserve oids while backing up which has 
been removed meanwhile? Did this option make it into my installation by an 
arbitrary yet stupid inspiration of mine? Should I preserve it since it has 
worked so good all the time or should I skip it to make things clean and 
simple? Any opinion and advice appreciated.

Thanks, Christian

"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE 
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available 
Simple to use. Nothing to install. Get started now for free."
DSpace-tech mailing list
List Etiquette:

"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Google Scholar - not indexed correctly

2014-05-21 Thread Stuart Yeates
That could very well be useful for administrators. 


-Original Message-
From: Mark H. Wood [] 
Sent: Thursday, 22 May 2014 12:49 a.m.
Subject: Re: [Dspace-tech] Google Scholar - not indexed correctly

On Wed, May 21, 2014 at 02:00:00PM +0200, Bram Luyten wrote:
> Interesting, thank you for reporting back Stuart.
> It's good to know that downtime in the past can trigger exclusion.

This gave me to think.  We have done quite some work to exclude spiders from 
the usage statistics, but I wonder if we could use those same judgments to 
generate periodic reports showing *only* who is spidering a site, and perhaps 
how often or how aggressively.  It might be useful for e.g. noticing that a 
desired spider is *not* listed.

Mark H. Wood, Lead System Programmer
Machines should not be friendly.  Machines should be obedient.

"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] [Dspace-general] Regarding Ranking of Repositories

2014-09-02 Thread Stuart Yeates
I'm not sure that knee-jerk reaction to an arbitrary list of bad practice is a 
good place to start and seems like a really bad driver for software development.

Maybe we should be talking to our fellow implementers and building on the work 
of,,, etc. to build a 
compilation of _best_ practice.


-Original Message-
From: Tim Donohue [] 
Sent: Wednesday, 3 September 2014 8:49 a.m.
To: Isidro F. Aguillo;
Cc: Jonathan Markow;
Subject: Re: [Dspace-general] [Dspace-tech] Regarding Ranking of Repositories

Hello Isidro,

DuraSpace (the stewarding organization behind DSpace and Fedora repository 
software) was planning to send you a compiled list of the concerns with your 
proposal. As you can tell from the previous email thread, many of the users of 
DSpace have similar concerns. Rather than bombard you with all of them 
individually (which you could see from browsing the thread), we hoped to draft 
up a response summarizing the concerns of the DSpace community.

Below you'll find an initial draft of the summarized concerns. The rule 
numbering below is based on the numbering at:

--- Concerns with the Proposal from Ranking Web of Repositories

* Rule #2 (IRs that don't use the institutional domain will be excluded) would 
cause the exclusion of some IRs which are hosted by DSpace service providers. 
As an example, some users have URLs 
https://[something] which would cause their exclusion as it is 
a non-institutional domain. Many other DSpace hosting providers have similar 
non-institutional domain URLs by default.

* Rule #4 (Repositories using ports other than 80 or 8080) would wrongly 
exclude all DSpace sites which use HTTPS (port 443). Many institutions choose 
to run DSpace via HTTPS instead of HTTP.

* Rule #5 (IRs that use the name of the software in the hostname would be 
excluded) may also affect IRs which are hosted by service providers (like 
DSpaceDirect). Again, some DSpaceDirect customers have URLs which use 
* (includes "dspace"). This rule would also exclude MIT's IR 
which is the original "DSpace" (and has used the same URL for the last 10+ 

* Rule #6 (IRs that use more than 4 directory levels for the URL address of the 
full texts will be excluded.) may accidentally exclude a large number of DSpace 
sites. The common download URLs for full text in DSpace are both are at least 4 
directory levels deep:

- XMLUI: [dspace-url]/bitstream/handle/[prefix]/[id]/[filename]
- JSPUI: [dspace-url]/bitstream/[prefix]/[id]/[sequence]/[filename]

NOTE: "prefix" and "id" are parts of an Item's Handle (, 
which is the persistent identifier assigned to the item via the Handle System. 
So, this is how a persistent URL like redirects to an Item in MIT's DSpace.

* Rule #7 (IRs that use more than 3 different numeric (or useless) codes in 
their URLs will be excluded.). It is unclear how they would determine this, and 
what the effect may be on DSpace sites worldwide. Again, looking at the common 
DSpace URL paths above, if a file had a "numeric" 
name, it may be excluded as DSpace URLs already include 2-3 numeric codes by 
default ([prefix],[id], and [sequence] are all numeric).

* Rule #8 (IRs with more than 50% of the records not linking to OA full text 
versions..). Again, unclear how they would determine this, and whether the way 
they are doing so would accidentally exclude some major DSpace sites. For 
example, there are major DSpace sites which include a larger number of 
Theses/Dissertations. These Theses/Dissertations may not be 100% Open Access to 
the world, but may be fully accessible everyone "on campus".


Another, perhaps more serious concern, is on the timeline you propose. 
You suggest a timeline of January 2015 when these newly proposed rules would be 
in place. Yet, if these rules were to go in place, some rules may require 
changes to the DSpace software itself (as I laid out above, some rules may not 
mesh well with DSpace software as it is, unless I'm misunderstanding the rule 

Unfortunately, based on our DSpace open source release timelines, we have ONE 
new release (DSpace 5.0) planned between now and January 2015. 
Even if we were able to implement some of these recommended changes at a 
software level, the vast majority (likely >80-90%) of DSpace instances would 
likely NOT be able to upgrade to the latest DSpace version before your January 
deadline (as the 5.0 release is scheduled for Nov/Dec). 
Therefore, as is, your January 2015 ranking may accidentally exclude a large 
number of DSpace sites from your rankings, and DSpace 

Re: [Dspace-tech] Syncrhonization and central repository

2014-09-08 Thread Stuart Yeates
Hello Germán

It might be more helpful if you told us the problem you're trying to solve.

For example, we have HTTP and HTTPS views of our repository, which are simply 
done using apache proxying.


From: Germán Biozzoli [] 
Sent: Tuesday, 9 September 2014 10:19 a.m.
To: DSpace techlist techlist;
Subject: [Dspace-tech] Syncrhonization and central repository

Hi everybody

I'm asking  if is there some way to maintain a remote repository completely 
synchronized with a central one (both DSpace's), including changes in 
communities/collection structure additionally to consume the items added. I've 
seen a presentation that combines OAI-ORE and SWORD, anybody is doing something 
like this?

Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
DSpace-tech mailing list
List Etiquette:

[Dspace-tech] testable properties of repositories that could be used to rate them

2014-09-11 Thread Stuart Yeates
A couple of us have drawn up a bit of a list of script-testable properties of 
repositories that could be used to rate them. We're tried to both avoid 
arbitrary judgements and the implication that every repository should meet 
every item:

Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

2014-09-14 Thread Stuart Yeates
I use a verifier to check my config:

Note that my settings are less secure than I might like, because increasing 
them causes some platforms (especially mobile platforms) to fail to access the 
content, while leaving nothing useful in the logs.

Personally I find the Mozilla advice a little strong on the "force users with 
outdated browsers to update" approach.

It's  also possible to force users who login to use more secure credentials 
than those who just access content, if you can assume that only admin staff 
login from their desktops with recent browsers. There's an example on


From: Alan Orth []
Sent: Sunday, 14 September 2014 7:39 p.m.
To: Ivan Masár
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

Hi, Hilton.

Thanks for your reply.  First, I'd like to point out that I reverse proxy 
DSpace via nginx (and Apache httpd a few years ago).  The decision to put nginx 
/ httpd in front of Tomcat was made partially on the fact that it's easier to 
configure HTTPS in those servers than Tomcat, and nginx supports more modern 
crypto than Apache http or Apache Tomcat.  Also mod_rewrite and vhosts etc were 

Your HTTPS configuration could use several improvements.  Attached is a 
screenshot of the negotiated cipher suite as seen in Chrome in GNU/Linux.  Of 
- The connection is encrypted using AES CBC.  AES is government-grade security, 
but implemented in CBC mode it is vulnerable to padding oracle attacks (see 
BEAST and Lucky13)[0].  It is recommended to use GCM mode (galois counter mode).
- Message authentication (MAC, basically a hash or fingerprint) is using SHA1, 
which is of course very old and started showing weaknesses in academic circles 
and was first shown to be broken in 2005[1].
- Your connection is using Diffie-Hellman Ephemeral, which is good! Ephemeral 
means that there is a temporary secret used in the HTTPS negotiation that is 
thrown away after the session. In the scenario that an adversary (NSA?) gets 
your HTTPS key and records secure traffic, they won't be able to decode those 
sessions.  This is called 'forward secrecy' (sometimes "perfect" forward 

Other than that, your HTTPS certs are signed using SHA1, which has been 
deprecated by all major browsers in favor of SHA2[2].

It's kinda overwhelming, but using the Mozilla cipher list will get you 
started.  They are a list of safe defaults which take into account most of the 
latest information we have on cryptography.

Hope that helps,


On Sat, Sep 13, 2014 at 10:35 PM, helix84>> wrote:
On Sat, Sep 13, 2014 at 9:05 PM, Hilton Gibson>> wrote:
> Who is the arbiter "safe ciphers"?
> I am not a cipher expert.

There's no arbiter. The set changes over time as new vulnerabilities
are found in existing ciphers and new ciphers are developed to
mitigate those attack vectors. A cipher might look good on paper, but
only widespread use reveals its weaknesses. Then there is the natural
deprecation of shorter key sizes, which is required as new computers
gets faster. Furthermore, errors exist in PRNGs, which encryption
vitally depends on. The only way is to keep up to date on this
information. That's why the Mozilla list Alan mentioned helps - they
watch it for you and give you their recommendations.


Compulsory reading: DSpace Mailing List Etiquette

Alan Orth
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

2014-09-15 Thread Stuart Yeates
Both of the guidelines make complete sense if you're a bank (or the payroll 
system of a university). They make less sense when if you are a service whose 
reason for existence is to promulgate information. For repositories to enforce 
the latest and greatest security settings for users to access documents makes 
no sense and is insane if (like my repositories) we also offer the same 
documents over HTTP.

Note, for example, that your site can't be accessed from IE 6 or by bots 
running certain varieties of Java. That's probably not a bad choice unless you 
need it to be accessible to the third world, which has a much older 
technological profile than the west.

It may make sense to lock down submission / admin interfaces, particularly if 
these are accessed from off campus.


From: Alan Orth []
Sent: Monday, 15 September 2014 8:36 p.m.
To: Stuart Yeates; Ivan Masár
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS


Interesting that you consider Mozilla's guidlines too strict.'s are even more so. :)

For reference, I use a "stricter" config than Mozilla's in that I disallow 
SSLv3 (as even XP supports TLS 1.0), and I get an A+ on the Qualys SSL test:

TLS is fun, isn't it?!

On 09/15/2014 01:20 AM, Stuart Yeates wrote:
I use a verifier to check my config:

Note that my settings are less secure than I might like, because increasing 
them causes some platforms (especially mobile platforms) to fail to access the 
content, while leaving nothing useful in the logs.

Personally I find the Mozilla advice a little strong on the "force users with 
outdated browsers to update" approach.

It's  also possible to force users who login to use more secure credentials 
than those who just access content, if you can assume that only admin staff 
login from their desktops with recent browsers. There's an example on


From: Alan Orth []
Sent: Sunday, 14 September 2014 7:39 p.m.
To: Ivan Masár
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

Hi, Hilton.

Thanks for your reply.  First, I'd like to point out that I reverse proxy 
DSpace via nginx (and Apache httpd a few years ago).  The decision to put nginx 
/ httpd in front of Tomcat was made partially on the fact that it's easier to 
configure HTTPS in those servers than Tomcat, and nginx supports more modern 
crypto than Apache http or Apache Tomcat.  Also mod_rewrite and vhosts etc were 

Your HTTPS configuration could use several improvements.  Attached is a 
screenshot of the negotiated cipher suite as seen in Chrome in GNU/Linux.  Of 
- The connection is encrypted using AES CBC.  AES is government-grade security, 
but implemented in CBC mode it is vulnerable to padding oracle attacks (see 
BEAST and Lucky13)[0].  It is recommended to use GCM mode (galois counter mode).
- Message authentication (MAC, basically a hash or fingerprint) is using SHA1, 
which is of course very old and started showing weaknesses in academic circles 
and was first shown to be broken in 2005[1].
- Your connection is using Diffie-Hellman Ephemeral, which is good! Ephemeral 
means that there is a temporary secret used in the HTTPS negotiation that is 
thrown away after the session. In the scenario that an adversary (NSA?) gets 
your HTTPS key and records secure traffic, they won't be able to decode those 
sessions.  This is called 'forward secrecy' (sometimes "perfect" forward 

Other than that, your HTTPS certs are signed using SHA1, which has been 
deprecated by all major browsers in favor of SHA2[2].

It's kinda overwhelming, but using the Mozilla cipher list will get you 
started.  They are a list of safe defaults which take into account most of the 
latest information we have on cryptography.

Hope that helps,


On Sat, Sep 13, 2014 at 10:35 PM, helix84>> wrote:
On Sat, Sep 13, 2014 at 9:05 PM, Hilton Gibson>> wrote:
> Who is the arbiter "safe ciphers"?
> I am not a cipher expert.

There's no arbiter. The set changes over time as new vulnerabilities
are found in existing ciphers and new ciphers are developed to
mitigate those attack vectors. A cipher might look good on paper, but
only widespread use reveals its weaknesses. Then 

Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

2014-09-15 Thread Stuart Yeates
There is also an argument that ‘freedom to read’-type statements suggest HTTPS 
to prevent casual snooping on people’s reading habits, however this is 
undermined by our use of DOI and handle which are reliably HTTPS, so we’re 
already leaking that info.


From: Hilton Gibson []
Sent: Tuesday, 16 September 2014 8:34 a.m.
To: Stuart Yeates
Cc: Alan Orth; Ivan Masár;
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

+1 to Stuart, my only intention with https is to secure user credentials, 
beyond that it does not matter.

Hilton Gibson
Ubuntu Linux Systems Administrator
JS Gericke Library
Room 1025C
Stellenbosch University
Private Bag X5036
South Africa

Tel: +27 21 808 4100 | Cell: +27 84 646 4758

On 15 September 2014 22:28, Stuart Yeates>> wrote:
Both of the guidelines make complete sense if you’re a bank (or the payroll 
system of a university). They make less sense when if you are a service whose 
reason for existence is to promulgate information. For repositories to enforce 
the latest and greatest security settings for users to access documents makes 
no sense and is insane if (like my repositories) we also offer the same 
documents over HTTP.

Note, for example, that your site can’t be accessed from IE 6 or by bots 
running certain varieties of Java. That’s probably not a bad choice unless you 
need it to be accessible to the third world, which has a much older 
technological profile than the west.

It may make sense to lock down submission / admin interfaces, particularly if 
these are accessed from off campus.


From: Alan Orth [<>]
Sent: Monday, 15 September 2014 8:36 p.m.
To: Stuart Yeates; Ivan Masár
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS


Interesting that you consider Mozilla's guidlines too strict.'s are even more so. :)

For reference, I use a "stricter" config than Mozilla's in that I disallow 
SSLv3 (as even XP supports TLS 1.0), and I get an A+ on the Qualys SSL test:

TLS is fun, isn't it?!

On 09/15/2014 01:20 AM, Stuart Yeates wrote:
I use a verifier to check my config:

Note that my settings are less secure than I might like, because increasing 
them causes some platforms (especially mobile platforms) to fail to access the 
content, while leaving nothing useful in the logs.

Personally I find the Mozilla advice a little strong on the “force users with 
outdated browsers to update” approach.

It’s  also possible to force users who login to use more secure credentials 
than those who just access content, if you can assume that only admin staff 
login from their desktops with recent browsers. There’s an example on


From: Alan Orth []
Sent: Sunday, 14 September 2014 7:39 p.m.
To: Ivan Masár
Subject: Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

Hi, Hilton.

Thanks for your reply.  First, I'd like to point out that I reverse proxy 
DSpace via nginx (and Apache httpd a few years ago).  The decision to put nginx 
/ httpd in front of Tomcat was made partially on the fact that it's easier to 
configure HTTPS in those servers than Tomcat, and nginx supports more modern 
crypto than Apache http or Apache Tomcat.  Also mod_rewrite and vhosts etc were 

Your HTTPS configuration could use several improvements.  Attached is a 
screenshot of the negotiated cipher suite as seen in Chrome in GNU/Linux.  Of 
- The connection is encrypted using AES CBC.  AES is government-grade security, 
but implemented in CBC mode it is vulnerable to padding oracle attacks (see 
BEAST and Lucky13)[0].  It is recommended to use GCM mode (galois counter mode).
- Message authentication (MAC, basically a hash or fingerprint) is using SHA1, 
which is of course very old and started showing weaknesses in academic circles 
and was first shown to be broken in 2005[1].
- Your connection is using Diffie-Hellman Ephemeral, which is good! Ephemeral 
means that there is a temporary secret used in the HTTPS negotiation that is 
thrown away after the session. In the scenario that an adversary (NSA?) gets 
your HTTPS key and records secure traffic, they won't be able to decode those 
sessions.  This is called 'forward secrecy' (sometimes "perfect" forward 

Other than that, your HTTPS certs are signed using SHA1, which has 

Re: [Dspace-tech] Recommended TLS cipher suite for sites using HTTPS

2014-09-15 Thread Stuart Yeates
>I think you're missing the point. Protecting the content is as you say 
>unimportant if it's open content. But the big threat here is to the privacy of 
>the patrons. Your viewing history, if it gets into the wrong hands, could 
>easily put you or someone you care about at risk.

The big threat for me is that someone can unload a bogus thesis into my 
repository and on that basis claim to have a degree ...

When a TLS connection gets established, the two parties negotiate the most 
secure option they both support. That negotiation is driven by the client, 
meaning that modern sanely configured clients will normally be very secure from 
passive listening attacks.  Active attacks are more challenging to prevent, and 
raising the minimum security of the certs supported is one approach to do that.


Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
DSpace-tech mailing list
List Etiquette:

[Dspace-tech] dspace <->fedora interop

2014-10-12 Thread Stuart Yeates
Does anyone know about the state of fedora interoperability? The particular 
task I'm looking at is a trial migration of some records from dspace to fedora. 
I'm aware of the script at but it 
hasn't shown much recent activity. Is there anything else I should be looking 

I have a new phone number: 04 463 5692

Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] DS-2220: Always load Google Analytics over SSL

2014-10-27 Thread Stuart Yeates
Isn't the fix for this to use protocol-independent URIs? i.e. the ones that 
start with // rather than https:// or http:// ?

Or is there an important secondary issue I'm missing?



I have a new phone number: 04 463 5692

From: Alan Orth 
Sent: Monday, 27 October 2014 11:51 p.m.
Subject: [Dspace-tech] DS-2220: Always load Google Analytics over SSL

I was just poking around and noticed we conditionally load Google Analytics 
over SSL.  We should *always* load ga.js over SSL.  Bug here:

Patch and pull request is linked in bug report.


Alan Orth
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] OAI metadata formats

2014-10-27 Thread Stuart Yeates
I'd be very interested in whether anyone in knows of any real-world harvesters 
of OAI in anything except

oai_dc ?



I have a new phone number: 04 463 5692

From: Scott Carlson 
Sent: Tuesday, 28 October 2014 9:53 a.m.
Subject: [Dspace-tech] OAI metadata formats

Greetings all,

We are in the midst of updating our IR's metadata registry and the 
corresponding crosswalks, to make sure all of our metadata is exposed through 
OAI. While working on these updates, it came to our attention that some of the 
metadata formats in the DSpace OAI module are out of date. For example, links 
to both the namespace ( and schema 
( of the DSpace Intermediate Metadata 
(DIM) format are apparently dead; meanwhile, the schemas for MODS and ETDMS 
have (relatively) recently been updated to 3.5 and 1.1, respectively, but the 
OAI links to previous versions of the two.

It would be great to have the latest schema versions available for updating OAI 
crosswalking; however, we are hesitant to make local changes, as the OAI webapp 
is currently being upgraded for the next DSpace software version and its xoai 
file still references the previous schemas and dead links.

Does anyone know if there has been any discussion about updating the metadata 
versions/links in the OAI webapp before the next update?


Scott Carlson
Metadata Coordinator
Rice University
Fondren Library
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] DS-2220: Always load Google Analytics over SSL

2014-10-28 Thread Stuart Yeates

> I was shooting for always loading over HTTPS, as surely loading ANYTHING we 
> can
> over HTTPS should increase our users' security, ie jQuery, images, CSS, etc...

Yes, but only if you're assuming that only humans connect and all of them use 
modern browsers with good https support.

Many users in the developing world access on an array of kinds of hardware and 
software that we would consider obsolete. Requiring the latest and greatest web 
technologies to access our research isn't going to decrease that development 

Many tools, from plain server monitoring systems to reference checking systems 
to fancy website thumbnail services just work better and more reliably over 
http than https.

DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] DSpace error "ClientAbortException"

2014-10-29 Thread Stuart Yeates
A monitoring system sending a HTTP request every ten seconds and closing the 
connection before any data is transferred will give you symptoms similar to 



I have a new phone number: 04 463 5692

From: Hilton Gibson 
Sent: Wednesday, 29 October 2014 11:36 p.m.
To: dspace-tech
Subject: [Dspace-tech] DSpace error "ClientAbortException"

Hi All

Using Ubuntu 12.04 LTS and DSpace 3.2 with the XMLUI.

I am getting the following error happening almost every 10 seconds.
ClientAbortException: Broken pipe
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(
at org.apache.catalina.connector.OutputBuffer.doFlush(
at org.apache.catalina.connector.OutputBuffer.flush(
at org.apache.catalina.connector.Response.flushBuffer(
at $Proxy18.service(Unknown Source)
at org.dspace.springmvc.CocoonView.render(
at javax.servlet.http.HttpServlet.service(
at javax.servlet.http.HttpServlet.service(
at org.apache.catalina.core.StandardHostValve.invoke(
at com.googlecode.psiprobe.Tomcat60AgentValve.invoke(
at org.apache.catalina.valves.ErrorReportValve.invoke(
at org.apache.catalina.connector.CoyoteAdapter.service(
at org.apache.coyote.http11.Http11Processor.process(
Caused by: Broken pipe
at Method)

Re: [Dspace-tech] Link Checker in Dspace

2014-10-30 Thread Stuart Yeates
That looks like your DNS host is redirecting unknown domains to a commercial 
sales site. If you change your DNS settings you may get a sane answer.



I have a new phone number: 04 463 5692

From: Edna Hagan 
Sent: Friday, 31 October 2014 6:48 a.m.
To: ''
Subject: [Dspace-tech] Link Checker in Dspace

Hello community,

We are working on Dspace 3.1 JSPUI/ UNIX. We have configured the Link Checker 
to check bad links in the dc.identifier.uri field. As a test we put 2 bad links 
in the metadata record and this was not reported as bad links.

A test with the curl command indicates it recognises / resolves the host one of 
the links but not the other even though they are both bad links hence one of 
them is not returned as a bad link. The links we used were



curl returns

Search the web:";>";>";>Click
 here to enter.

curl returns
curl: (6) Couldn't resolve host ''

Is this a known bug and is there any plans for a fix?

Thank you very much for your help.
Edna Hagan
Information Management Analyst | Analyste de gestion de l'information
International Development Research Centre | Centre de recherches pour le 
développement international
613-696-2297 | | |

DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Error running custom java scripts in 1.8.2

2014-11-05 Thread Stuart Yeates
The error message you are seeing is 100% consistent with dspace not being able 
to find the correct jars.

There are two main options.

1 They haven't been compiled and copied over to the new build (look for them in 
the directories of the new build)

2 The change to add them to the CLASSPATH is not present ('ps auxwww' should 
tell you the full classpath that tomcat is using, look for the path you found 
in (1))



I have a new phone number: 04 463 5692

From: Avino, Thomas W. (LARC-B702)[LITES] 
Sent: Thursday, 6 November 2014 9:22 a.m.
Subject: [Dspace-tech] Error running custom java scripts in 1.8.2

Ever since we upgraded Dspace 1.7.1 to 1.8.2, none of our custom java scripts 
run (these are not overlays).  I get the error below.
On our old Dspace, we put custom java code in the 
directory (local was created by us).  We never had any issues until the upgrade 
to 1.8.2.  Is it possible this directory is being ignored during the build?  
Any help would be appreciated.

Exception in thread "main" java.lang.NoClassDefFoundError: 
Caused by: java.lang.ClassNotFoundException:
at Method)
at java.lang.ClassLoader.loadClass(
at sun.misc.Launcher$AppClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
Could not find the main class:  Program will 
Error 1 in script tpsas2ldr in - aborting at Wed 05 Nov 2014 03:1

Thomas W. Avino
NASA Langley Research Center
Bldg 1194 Room 302B
Mail Stop 185
Hampton, VA 23681-2199
Phone: (757) 864-8495
Fax: (757) 864-6649

DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Use case for exporting a DSpace collection to CD/DVD

2014-11-19 Thread Stuart Yeates
If I were doing this, I'd be looking to customise a LiveCD to hide the 
installation stuff and autostart the tomcat stack.

One issue you're likely to have is that you don't know in advance how much RAM 
you'll have available, but the Live CD community may have a fix for that.

I have a new phone number: 04 463 5692

From: euler 
Sent: Wednesday, 19 November 2014 4:05 p.m.
Subject: [Dspace-tech] Use case for exporting a DSpace collection to CD/DVD

Dear All,

We are planning to distribute our DSpace collection via CD/DVD to other
libraries in areas where there are no internet connection. Searching the
mailing list, I found this thread [1]. I would like to ask if you have
implemented these and maybe share your experiences? If you have used
greenstone to export collections [2], you'll pretty get the idea. My
supervisor suggested to try out LibraryBox [3] but AFAIK, it will only work
for static pages and you can't include searching. My other concerns are,
just like the resulting exported collection from greenstone, I need a way to
"automatically" start the services (eg tomcat[4] and postgresql[5]) and
launch the default landing page for the collection. If someone can guide me
how to achieve these or point me to resources from the web. Any advices and
comments would be great too.

Thanks in advance.

[1]  Dspace in CD

[2]  Creating CD/DVD-ROM Collections

[3]  What is LibraryBox? 
[4]  Tomcat as add-ons in XAMPP

[5]  PostgreSQL Portable

View this message in context:
Sent from the DSpace - Tech mailing list archive at

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] [SPAM] Re: Internal System Error

2014-11-19 Thread Stuart Yeates
That's almost certainly a permissions error. Create the directories manually 
and make sure that the user tomcat is running as (the user column in 'ps' or 
'top') owns the directories and can read and execute all directories between 
the ones you create and /



I have a new phone number: 04 463 5692

From: Hamad Dafallah 
Sent: Thursday, 20 November 2014 8:57 a.m.
Cc: dspace-tech
Subject: Re: [Dspace-tech] [SPAM] Re: Internal System Error

I think the problems with solr:

SolrCore Initialization Failures

Cannot create directory: /dspace/solr/oai/data/index
Cannot create directory: /dspace/solr/statistics/data/index
Cannot create directory: /dspace/solr/search/data/index
How can I fix this?

On Wed, Nov 19, 2014 at 4:43 PM,>> wrote:

Check /dspace/log/ what is a problem. Many times can be a problem with


View this message in context:
Sent from the DSpace - Tech mailing list archive at

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Hamad Dafallah

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] OAI harvest "Date" matching

2014-11-25 Thread Stuart Yeates
Bear in mind too, that it's often best to rely on things that dspace will 
guarantee are always be available, such as and 

I have a new phone number: 04 463 5692

From: helix84 
Sent: Tuesday, 25 November 2014 11:22 p.m.
To: Jeffrey Sheldon
Subject: Re: [Dspace-tech] OAI harvest "Date" matching

On Mon, Nov 17, 2014 at 4:53 PM, Jeffrey Sheldon  wrote:
> After migrating from DSpace 1.8.2 to 3.3, I've been told by one of our 
> digital librarians that the OAI 2.0 returns records based on 
>  In the past, we were apparently providing items based 
> on (their preference).  Interestingly, when I view records 
> at /dspace-oai, it's listing "date modified" in the overview.
> I've looked through settings, made changes optimistically, and read quite a 
> bit of documentation, but can't seem to affect the date setting.

It seems like the class populating the oai index from the database
doesn't really care about the order and simply takes items in the
order served by the DB:

Full import:

Incremental import:

Try changing these queries first and see if it helps. If not, perhaps
there is some ordering during output from the index (I doubt so, but I
didn't check).


Compulsory reading: DSpace Mailing List Etiquette

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Fwd: *** GMX Spamverdacht *** Re: database layer spontaneously closing connections, crashes packager

2014-12-01 Thread Stuart Yeates
We're not seeing this issue, but if we were I'd be looking at increasing 
database timeouts and keepalives; reducing any possible database disk pauses 
and networking issues between the two.


I have a new phone number: 04 463 5692

From: Christian Völker 
Sent: Tuesday, 2 December 2014 2:23 a.m.
To: dspace-tech
Subject: [Dspace-tech] Fwd: *** GMX Spamverdacht *** Re: database layer 
spontaneously closing connections, crashes packager


has this issue ever been solved?

I encounter it here with DSpace 1.8.3 and Postgres 9.1

Thanks, Christian

Am 31.01.2012 um 15:59 schrieb Tim Donohue :

> Rui,
> Sorry, I haven't gotten any further with this issue.
> Does anyone else on this list have any ideas here?  Anyone else
> encountering this issue on either 1.7.x or 1.8.x, or find a way to
> resolve it?
> - Tim
> On 1/31/2012 6:28 AM, Rui Ramos wrote:
>> Tim Hi,
>> Have you manage to solve this issue ?
>> I have the same problem and from the logs it seams the Database resets
>> the connection.
>> Dspace - 1.7.2
>> PostgreSQL- 8.4.9
>> Any ideias ?
>> On Fri, 2011-12-09 at 10:58 -0600, Tim Donohue wrote:
>>> Actually, looking at these error messages a bit more closely, this may
>>> be entirely unrelated. It could be an XMLUI specific issue, rather than
>>> a database layer issue that you are seeing.
>>> - Tim
>>> On 12/9/2011 10:54 AM, Tim Donohue wrote:
 Hi Mark,

 Unfortunately, I don't have an answer for you. But, digging around I
 noticed a similar oddity on On we're
 getting some similar sorts of SocketException errors (though"broken
 pipe"  rather than"socket closed") in the dspace.log files, but these
 are from the XMLUI. is currently running:
 * DSpace 1.8.0
 * PostgreSQL 8.4.2
 * Tomcat 6.0.29
 * Java 1.6.0_24

 The error on is a"ClientAbortException: Broken pipe"  (in XMLUI), which is caused by an
 underlying" Broken pipe". Not the same error,
 but eerily similar. The XMLUI seems to handle it in stride, but it does
 cause frequent lines like this to appear in the logs:

 2011-12-09 10:51:48,663 ERROR @ Serious Error
 Occurred Processing Request!

 In the PostgreSQL logs I see some occasional"unexpected EOF on client
 connection"  errors, but the times don't seem to match up with the
 SocketExceptions above.

 I'm also seeing these same logged issues on my local development box,
 now that I look closer. Again, I cannot verify this is 100% related to
 your same issue, but it does look a bit similar.

 An example of a full error stack trace from the dspace logs is attached.

 Sorry I don't have more help to add. Just wanted to send along what I've
 noticed that seems a bit similar. So far, I haven't noticed this same
 issue with the packager -- but, admittedly I'm using 1.8.0 (which did
 include some changes/bug fixes to the packager, though I'm not sure that
 any were related to this issue).

 - Tim

 On 12/9/2011 10:18 AM, Mark H. Wood wrote:
> I'm trying to script a daily incremental dump using the packager.
> Sometimes it completes, but most often it throws"PSQLException: An I/O
> error occurred while sending to the backend"  caused by
> " Socket closed". About the same time
> PostgreSQL logs"unexpected EOF on client connection"  and tears down
> its session. I turned logging way up on both ends but saw nothing
> unusual on the DSpace side and only the unexpected EOF on the Pg side.
> Digging through the Pg log, I see lots of these unexpected EOFs.
> Apparently the webapp is getting this too but takes it in stride.
> Commandline app.s seem to be less fortunate.
> I captured a packet trace while running my script and can see that the
> client is chugging along, making requests and getting responses, then
> suddenly the client sends a TCP FIN, ACK packet and that's the end of
> that. No Pg shutdown message was sent, so far as I can tell, and
> that's why Pg says"unexpected EOF". After fairly rapid exchanges
> there's a pause of about 0.8 seconds between the previous packet
> (client ACKing the server's last response) and the FIN.
> The Pg backend is v9.0.5. This DSpace instance is v1.7.2. OS is
> Gentoo Linux x86 with all current userspace updates and kernel
> 2.6.39. JRE is Oracle (Sun) 1.6.0_29-b11. Ideas for debugging this?
> --
> Cloud Services Checklist: Pricing and Packaging Optimization
> This white paper is intended to serve as a 

Re: [Dspace-tech] Urgent: database layer spontaneously closing connections(…)

2014-12-02 Thread Stuart Yeates
> Although my postgresql is in standard configuration running autovacuum 
> regularly, I did a manual vacuum full and a reindex database and reindex 
> system for the dspace database yesterday. These jobs finished fast without 
> problems. The snippet of the logfile above was recorded after this 
> maintainance.

(a) have you cheaked the operating system logs for evidence of disk corruption? 
('dmesg' in the case of linux)

(b) is postgres configured to be utf-8?

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Inconsistent text_lang on metadatavalue fields

2014-12-03 Thread Stuart Yeates
I can tell you right off that some fields have linguistic content.

dc.language.iso, dc.type and should not have text_lang fields in normal 
use. If you're using date formats starting with "the Year of our Lord one 
thousand seven hundred and Eighty seven..." they may do, but if you're doing 
that, you probably have bigger issues.

I'm not sure where this gets set in dspace.



I have a new phone number: 04 463 5692

From: Terry Brady 
Sent: Thursday, 4 December 2014 6:12 a.m.
Subject: Re: [Dspace-tech] Inconsistent text_lang on metadatavalue fields

I am still looking for insight on this question.  I have posted this question 
to Stack Overflow.

I appreciate any insight you can offer.


On Fri, Nov 21, 2014 at 7:31 PM, Terry Brady>> wrote:
Over time, we have noticed that the text_lang column of the metadatavalues 
table has become inconsistent.

We have entries with each of the following values for data entered in english.

  *   en_US
  *   en
  *   blank
  *   null

These inconsistencies become problematic when we perform batch metadata edits.

I am working on an effort to normalize these values.

As a test, I created a new submission and populated a value into every field on 
the item submission screen.  When the submission was completed, I ended up with 
the following results.

  *   Most entries had a lang of "en"
  ** entries had a null value
 *   This applied to user-generated and system-generated dates
  *   most dc.identifer.uri entries had a null value
 *   dc.identifier.bibliographicCitation was set to "en"
  *   dc.relation.ispartofseries had a null value
 *   dc.relation.uri was set to "en"

Is there a property in one of the item submission workflows that controls how 
the text_lang column is set?

Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology

Terry Brady
Applications Programmer Analyst
Georgetown University Library Information Technology
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] 404 on redirect - dspace 3.0, linux

2014-12-08 Thread Stuart Yeates
All 404s are logged. To find them (on linux) use a command like:

grep ' 404 ' /var/log/httpd/access_log

you may need to further restrict to to a specific IP address if you have a lot 
of accesses.



I have a new phone number: 04 463 5692

From: Sara Amato 
Sent: Tuesday, 9 December 2014 8:37 a.m.
To: dspace-tech
Subject: Re: [Dspace-tech] 404 on redirect - dspace 3.0, linux

Checking in on this issue as we've seen it crop up again with a user who had 
some malware taking over 404s.Again we can solve the individual patron's 
issues when they come to us and tell us our dspace site is broken, but it would 
be nice if this were fixed in the code.Thanks for considering.

On Tue, Apr 23, 2013 at 10:45 AM, Sara Amato>> wrote:
We had a complaint from a patron that they couldn't get to the dspace login 
page.   Upon investigation it appeared that their browser was set to redirect 
all 404s to a search engine.   While we could solve that problem, we were 
baffled that the dspace redirect threw a 404.   It looks like perhaps that was 
discussed here:

Is there a way to fix this?  I was not able to discern exactly what file was 
being changed in the discussion thread above.

Any advice appreciated.

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] DSpace 4.2 on Centos 7

2014-12-08 Thread Stuart Yeates
The underlying issue is " 
unknown error" check that DNS is working for this host, that resolves, and that it resolves to either or



I have a new phone number: 04 463 5692

From: Goran Ivaz 
Sent: Tuesday, 9 December 2014 12:18 p.m.
Subject: [Dspace-tech] DSpace 4.2 on Centos 7

I have the site running fine. I can log in and work with it. The database 
connection seems fine too. Email messaging works fine too….

Here is the output from the final installation :

 [echo]  The DSpace code has been installed, and the database initialized.
 [echo]  To complete installation, you should do the following:
 [echo]  * Setup your Web servlet container (e.g. Tomcat) to look for your
 [echo]DSpace web applications in: /DSpace/webapps/
 [echo]OR, copy any web applications from /DSpace/webapps/ to
 [echo]the appropriate place for your servlet container.
 [echo](e.g. '$CATALINA_HOME/webapps' for Tomcat)
 [echo]  * Make an initial administrator account (an e-person) in DSpace:
 [echo]/DSpace/bin/dspace create-administrator
 [echo]  * Start up your servlet container (Tomcat etc.)
 [echo]  You should then be able to access your DSpace's 'home page':

I have chosen to copy the webapps from /DSpace/webapps/ to 

It throws an internal server error message as below….

Have tried accessing the xmlui and the jspui so far. They both work but create 
the error.

Beginning of error 

An internal server error occurred on

Date:   12/8/14 3:15 PM

Session ID: B3AEDB7914E671C9590D7D2779DD112E

User:   Anonymous

IP address:

-- URL Was:

-- Method: GET

-- Parameters were:


org.apache.jasper.JasperException: javax.servlet.ServletException: 
org.dspace.browse.BrowseException: org.dspace.discovery.SearchServiceException: 
IOException occured when talking to server at:





at javax.servlet.http.HttpServlet.service(




















Re: [Dspace-tech] "Malformed stream" error

2014-12-11 Thread Stuart Yeates
If you fill up the disk, you need to stop all of the tomcat stack and restart 
the operating system in such a way that it checks all disks when it restarts.

You then need to rebuild your search indexes and check that in-process files 
haven't been damaged.



I have a new phone number: 04 463 5692

From: Paul Go 
Sent: Friday, 12 December 2014 6:57 a.m.
To: Dspace Tech list
Subject: [Dspace-tech] "Malformed stream" error

I'm not sure why we are getting this error as we've freed space on the device.  
Does anyone have any insight?

Problem in creating the Request

Message: null

Description: No details available.


Source: Cocoon Servlet


Malformed stream:
No space left on device



Paul Go

Systems Librarian /
Library Technology Manager /
CS and ITM Liaison
Paul V. Galvin Library
Illinois Institute of Technology
35 West 33rd Street
Chicago, IL  60616

Driving Innovation through Knowledge and Scholarship
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] How to change the XMLUI

2014-12-14 Thread Stuart Yeates
If you're working on ubuntu / linux I recommend the 'locate' command which will 
find all files on a system of a given name.

It's very useful for finding files. It's even more useful when you're editing 
the wrong file, to find other files of the same name and thus locate the file 
you're meant to be editing.

There should be a file '.../config/dspace.cfg' which is the main dspace 
configuration file. Many other files in the same directory also configure 
different parts of the system.

The XMLUI branding can be modified in .../themes/Mirage/lib/css/style.css 
structural changes can be made in files such as




I have a new phone number: 04 463 5692

From: Erick AR 
Sent: Saturday, 13 December 2014 8:34 a.m.
Subject: [Dspace-tech] How to change the XMLUI

Hello everyone
I am a developer of mexico, and I'm trying
to modify the  XMLUI files in DSpace 4.2,
I'm using Ubuntu 14.04 server lts, and postgres 9.3,
and I want to modify this to adapt it to the organization where I'm trying
to implement it, but I failed to find the directory where
these settings are, I hope you can help me to find
and can give me recommendations of how to do this.
Thank you
I.T.I. Erick Emir Alonso Román
San Luis Potosi, Mexico

Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Submissions are stuck "being reviewed"

2014-12-17 Thread Stuart Yeates
Sounds like things are being cached incorrectly.

What happens when you restart the server and try and access from a browser 
install that's never visited the site?



I have a new phone number: 04 463 5692

From: Erica Koestner 
Sent: Thursday, 18 December 2014 11:59 a.m.
Subject: [Dspace-tech] Submissions are stuck "being reviewed"

Hello there,

I'm using DSpace version 1.6.x.  I recently added several new submission, and I 
believe that I did approve them all, although now I'm questioning myself.  The 
problem is that when I am logged in I see them all under "Submissions being 
reviewed", as though someone else is reviewing them, but no one is.  When other 
users are logged in, they do not see these in the submissions pool at all.  
Both myself and the other users have searched for the submissions, to be sure 
they haven't actually been approved and are available, but they are nowhere to 
be found.  I have looked for log errors, but I haven't found any.  Does anyone 
have suggestions or helpful hints?

Thank you,


Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Thumbnails on browse in Mirage2

2015-01-11 Thread Stuart Yeates
As it says on the home page, that's Mirage 2. Mirage 2 will be release as part 
of the up-and-coming DSpace 5 release (see ), which is 
expected shortly.

So in short the answer is to wait a couple of weeks for DSpace 5, then upgrade 
to it and Mirage 2.


I have a new phone number: 04 463 5692

From: Gabriel Martins 
Sent: Monday, 12 January 2015 9:01 a.m.
Subject: [Dspace-tech] Thumbnails on browse in Mirage2

Hello all,

In the atmire preview of the Mirage2 theme(, the
thumbnails of the items appear in recent submissions, search results and
other pages, but I don't know how to make the repository I am working behave
in the same way...

Any suggestions on how to make this happen?

Thanks for the help and sorry if the english is bad.

Gabriel Martins

View this message in context:
Sent from the DSpace - Tech mailing list archive at

Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
DSpace-tech mailing list
List Etiquette:

Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Broken pipe

2015-03-12 Thread Stuart Yeates
I would start by checking any maximum file size options in tomcat and/or httpd 
that might be sitting in front of it.



I have a new phone number: 04 463 5692

From: Herbert Nguruwe 
Sent: Thursday, 12 March 2015 6:56 p.m.
Subject: [Dspace-tech] Broken pipe


I am Cape Peninsula University of Technology library systems developer, we 
recently upgraded our Dspace
from 4.2 to 5.0 Since then Tomcat we cannot view/download our documents and the 
stack trace from  tomcat is as follows
if they is anyone who can help.

Mar 11, 2015 11:46:38 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet [bitstream] in context with path [/jspui] 
threw exception [org.apache.catalina.connector.ClientAbortException: Broken pipe] with root cause Broken pipe
at Method)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(
at org.apache.tomcat.util.buf.ByteChunk.append(
at org.apache.coyote.Response.doWrite(
at org.apache.tomcat.util.buf.ByteChunk.append(
at org.dspace.core.Utils.copy(
at org.dspace.core.Utils.bufferedCopy(
at javax.servlet.http.HttpServlet.service(
at javax.servlet.http.HttpServlet.service(

Re: [Dspace-tech] OAI server could not be reached in DSpace 5.2

2015-03-17 Thread Stuart Yeates
Is dspace running on a corporate or institutional network that has a firewall? 
Almost certainly the issue is that there is a firewall / proxy between dspace 
and the OAI server.



I have a new phone number: 04 463 5692

From: Ruben 
Sent: Wednesday, 18 March 2015 4:27 a.m.
Subject: [Dspace-tech] OAI server could not be reached in DSpace 5.2

Hi all,

I installed a new DSpace 5.2 with Mirage2 and now is running with a database 
upgraded from DSpace 1.7. All works well except OAI harvesting, which when I 
try to enter OAI Harvest URL and OAI Set Id and save it, shows the message "OAI 
server could not be reached". I tried to run dspace bin doing a ping test onto 
this set_id-harvest_url pair and the message was the same. I'm sure this URL is 
reachable because I tried to harvest it at and this message is not showed. 
OAI interface is enabled because I can access to 
url_base/oai/request?verb=Identify, and proxy configs were set up well, so I 
don't know what's happening...

Someone can help me?

Thanks in advance


Rubén Boada
Tècnic de Càlcul i Aplicacions
Consorci de Serveis Universitaris de Catalunya (CSUC)

Gran Capità, 2 (Edifici Nexus)*08034 Barcelona
T.93 551 62 13* *Twitter @CSUC_info*Facebook*Linkedin
Subscriu-te al butlletí; (

Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] [dspace-tech]:RSS feeds

2015-03-23 Thread Stuart Yeates
You can do fancy RSS slicing and dicing using third party tools such as

You probably want to start with the Atom variant of RSS.

I have a new phone number: 04 463 5692

From: Nada Abo-Eita 
Sent: Tuesday, 24 March 2015 12:07 a.m.
To: dspace-tech
Subject: [Dspace-tech] [dspace-tech]:RSS feeds

Hi All,

Kindly, I have two questions related to RSS feeds on dspace.

As I understood so far, RSS feeds at each level (site, collection, or community 
level) show the latest items that have been added to that area.

I am wondering if there is a way to get top 10 items per month?. For example,  
Get the most recent 10 items submitted in January under collection (x).

My second question regarding to rss feed web development. If I want to style 
the rss feed web interface, for example, changing the way the list of items are 
listed. So where can I find the css and the xsl templates responsible for that?

Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now.
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] NEWS: 7 reasons why frameworks are the new programming languages | CIO

2015-03-30 Thread Stuart Yeates
Are you suggesting a change of direction for DSpace?



I have a new phone number: 04 463 5692

From: Hilton Gibson 
Sent: Tuesday, 31 March 2015 6:51 a.m.
To: dspace-tech
Subject: [Dspace-tech] NEWS: 7 reasons why frameworks are the new programming 
languages | CIO

In the 1980s, the easiest way to start a nerd fight was to proclaim that your 
favorite programming language was best. C, Pascal, Lisp, Fortran? Programmers 
spent hours explaining exactly why their particular way of crafting an 
if-then-else clause was superior to your way.

That was then. Today, battles involving syntax and structure are largely over 
because the world has converged on a few simple standards.
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now.
DSpace-tech mailing list
List Etiquette:

[Dspace-tech] dspace feed into SSRN?

2012-10-10 Thread Stuart Yeates
We have a pair of feeds from into RePEc ( and ).

I have been wondering whether it's possible to get similar feeds set up for 
SSRN Has anyone tried this successfully? There's a case 
study at which 
seems to imply that it's possible.


Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
DSpace-tech mailing list

Re: [Dspace-tech] dspace feed into SSRN?

2012-10-15 Thread stuart yeates
On 11/10/12 20:13, helix84 wrote:
> On Thu, Oct 11, 2012 at 4:03 AM, Stuart Yeates  
> wrote:
>> We have a pair of feeds from into RePEc ( 
>> and 
>> ).
>> I have been wondering whether it's possible to get similar feeds set up for 
>> SSRN Has anyone tried this successfully? There's a case 
>> study at which 
>> seems to imply that it's possible.
> Hi Stuart,
> you're mentioning names of particular websites. It would be far more
> useful if you mentioned the software they're running and possibly its
> version.

You are, for course, correct. Hwowever, I've not been able to determine 
what software or ingest protocols SSRN supports.

Stuart Yeates
Library Technology Services

Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
DSpace-tech mailing list

[Dspace-tech] whitespace bug with global change (commandline)

2012-12-04 Thread stuart yeates
I've encountered an issue with batch updating dates using the command 
line version.

In my exported CSV there are two columns which differ only by whitespace 
differences in the language code: "[ ]" and 

When I try to merge these (to the second), the confirmation step sees 
these as different and asks for confirmation to make the change, but the 
following change step doesn't actually make the change. Other changes in 
the same CSV work as expected.

I'm thinking that either the confirmation and update steps have 
different ideas about whitespace normalisation or this is a bug in the 
SQL whitespace handling (which google suggests is a challenging topic).

I'm using DSpace 1.8.2 / postgres.

Anyone else seeing this? Am I doing something silly?

Stuart Yeates
Library Technology Services

LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
DSpace-tech mailing list
List Etiquette:

Re: [Dspace-tech] Spring vulnerabilities in DSpace 1.5.2?

2013-09-05 Thread stuart yeates
The vulnerability appears to be JSP specific, those running only the 
xmlui interface should be fine, right?


On 06/09/13 04:50, Halliday, James Leonard wrote:
> Hello,
> I am trying to follow up on some vulnerabilities in the Spring
> framework, which are documented here:
> A recent survey of all our running DSpace instances showed a DSpace
> 1.5.2 instance with Spring 2.5.1 jars included. These are the jars that
> might be vulnerable. Can someone tell me if the jars are being used in a
> way that makes them vulnerable? There is a later Spring 2.5.x release
> that fixed the problem; should we simply replace the existing jars
> without needing to make any other changes?
> Thanks so much.
> -Jim Halliday
> -Indiana University
> --
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> ___
> DSpace-tech mailing list
> List Etiquette: 

Stuart Yeates
Library Technology Services

Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
DSpace-tech mailing list
List Etiquette:

[Dspace-tech] ODBC External Reporting

2013-09-05 Thread stuart yeates
As one of the outcomes of the IR day held in Wellington, NZ last week 
I've started a wiki page on connecting your DSpace instance to your 
organisational reporting tools.

The overall aim is to enable organisations to report across their whole 
collections rather than reporting separately on DSpace and their 
physical collections.

I would encourage anyone who has any reports they could share to upload 

Note that this relates to connecting to the SQL database, which contains 
item metadata, collection and community data, workflow data, etc. Web 
access stats are stored in solr not the SQL database, for information on 
accessing that see

A big thanks to LCONZ for funding this meeting 
and work.

Stuart Yeates
Library Technology Services

Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
DSpace-tech mailing list
List Etiquette: