Re: [Dspace-tech] FW: 1.5.2 browse and UTF-8 and diacritics

2009-05-28 Thread KlausDK

We had the same problem. The problem was that we specified
URIEncoding=UTF-8 in the wrong place in the server.xml config.

We specified it  under the Connector port=8080 maxHttpHeaderSize=8192
section, but as we use proxy_ajp to port 8009 , we needed to specify it
under 

Connector port=8009 section :



So it look like this in server.xml :

Connector port=8009
URIEncoding=UTF-8
   enableLookups=false redirectPort=8443 protocol=AJP/1.3
/


-- 
View this message in context: 
http://www.nabble.com/Re%3A-FW%3A-1.5.2-browse-and-UTF-8-and-diacritics-tp23400478p23757858.html
Sent from the DSpace - Tech mailing list archive at Nabble.com.


--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA,  Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] FW: 1.5.2 browse and UTF-8 and diacritics

2009-05-06 Thread Andrea Bollini
Hi Jennifer,
the sort order class is defined in dspace.cfg so you can create your
personal class that ignore diacritics and use it.
Try something like this

package my.edu.sort;

import org.dspace.text.filter.StripDiacritics;
import org.dspace.text.filter.LowerCaseAndTrim;
import org.dspace.text.filter.TextFilter;
import org.dspace.sort.AbstractTextFilterOFD;

public class OrderFormatAuthorIgnoreDiacritics extends AbstractTextFilterOFD
{
{
filters = new TextFilter[] { new StripDiacritics(),
 new LowerCaseAndTrim() };
}
}

and add in the dspace.cfg
author = org.dspace.sort.OrderFormatAuthorIgnoreDiacritics

Hope this help,
Andrea

Jennifer Whalan ha scritto:

 While I was at it. Has anyone solved the issue about the sort order,
 with authors (or subjects I suppose), that contain diacritics. At this
 link:
 http://www.territorystories.nt.gov.au/browse?order=ASCrpp=20sort_by=-1etal=-1offset=5823type=author
 http://www.territorystories.nt.gov.au/browse?order=ASCrpp=20sort_by=-1etal=-1offset=5823type=author

  

 We have the authors:

 Muston, C.

 Mutch, Verdun Joseph.

 Müller, D.

 Myeni, Annie D.

 Myerscough, Mark.

  

  

 But our cataloguers expect for the author “Müller, D.” to be between
 the authors:

  

 Mull, A. E. E.

  

 and

  

 Muller, W.J.

  

  

 I’ve had a look at the source, and if I’m reading it correctly, the
 OrderFormatAuthor is the file that controls this, and when it calls
 DecomposeDiacitics, it changes this author to “mu(diacritic)ller,
 w.j.” I’m assuming that because it places the diacritic after the u,
 that is why this author is sorting after all the authors that begin
 with Mu.

  

 To make this long story short, is there a way to make the sort, ignore
 diacritics completely, and just order by the character.

  

  

 Thanks

 Jennifer

  

 *Jennifer Whalan
 *Territory Stories Administrator
 Innovation  Access, Northern Territory Library
 Department of Natural Resources, Environment, The Arts and Sport
 Northern Territory Government

 Phone:  (08) 8922 0757
 Fax:(08) 8922 0722
 Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au
 Web:   * *www.ntl.nt.gov.au http://www.ntl.nt.gov.au*
 *
 The information contained in this message and any attachments may be
 confidential information and may be subject to legal privilege, public
 interest or legal profession privilege. If you are not the intended
 recipient, any use, disclosure or copying of this message or any
 attachments is unauthorised. If you have received this document in
 error, please advise the sender. No representation or warranty is
 given that attached files are free from viruses or other defects. The
 recipient assumes all responsibility for any loss or damage resulting
 directly or indirectly from the use of any attached files.

 

 *From:* Jennifer Whalan [mailto:jennifer.wha...@nt.gov.au]
 *Sent:* Wednesday, 6 May 2009 10:32 AM
 *To:* dspace-tech@lists.sourceforge.net
 *Subject:* [Dspace-tech] FW: 1.5.2 browse and UTF-8

  

 Just resending, as I did not get any replies.

  

  

 A question about browsing in 1.5.2 XMLUI

  

 On our test instance, we have upgraded from 1.5.1, to 1.5.2 (using
 manakin), and we have an author with the name Müller, D. However,
 when you go to view the browse list of the items of this author, the
 url is browse?value=Müller%2C+D.type=author, but the page says

 Browsing by Author Müller, D.

 and shows no items.

 Reading through the changelist for 1.5.2, my understanding was that
 this http://jira.dspace.org/jira/browse/DS-132 fixed this problem(?).
 Another issue (http://jira.dspace.org/jira/browse/DS-130), says that
 you need to remove the URIEncoding=UTF-8 from the tomcat settings.
 This setting is currently on for us.

 In 1.5.2, do you still need to remove this from the tomcat settings
 (although the manual states that when installing, you need to make
 sure this is added to the tomcat settings), or am I missing something.
 Also, the web.xml file does have cocoon filter in it.

 If it makes any difference, the request header for the browse page of
 this author is

 Accept-Charset   ISO-8859-1,utf-8;q=0.7,*;q=0.7

 and the response header is

 Content-Typetext/html;charset=utf-8

  

 Thanks

 Jennifer Whalan

 *Jennifer Whalan
 *Territory Stories Administrator
 Innovation  Access, Northern Territory Library
 Department of Natural Resources, Environment, The Arts and Sport
 Northern Territory Government

 Phone:  (08) 8922 0757
 Fax:(08) 8922 0722
 Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au
 Web:   * *www.ntl.nt.gov.au http://www.ntl.nt.gov.au*
 *
 The information contained in this message and any attachments may be
 confidential information and may be subject to legal privilege, public
 interest or legal profession privilege. If you are not the intended
 recipient, any use, disclosure or copying

Re: [Dspace-tech] FW: 1.5.2 browse and UTF-8

2009-05-06 Thread Andrea Bollini
Hi Jennifer,
the uRIEncoding need to be set to UTF-8 so your current setting is ok
I'm not able to reproduce the issue on the
http://dspace-testhaton.cilea.it/xmlui instance, see
http://dspace-testhaton.cilea.it/xmlui/browse?value=name3%2C+M%C3%BCllertype=author
http://dspace-testhaton.cilea.it/xmlui/browse?value=An%C3%B6ther%2C+Te%C5%A1ttype=author

Have you make any customization? are you sure to have deployed the new war?
Try also to clean the tomcat work directory I'm not sure if this can
help in this case but just in case...
Hope this help,
A.

Jennifer Whalan ha scritto:

 Just resending, as I did not get any replies.

  

  

 A question about browsing in 1.5.2 XMLUI

  

 On our test instance, we have upgraded from 1.5.1, to 1.5.2 (using
 manakin), and we have an author with the name Müller, D. However,
 when you go to view the browse list of the items of this author, the
 url is browse?value=Müller%2C+D.type=author, but the page says

 Browsing by Author Müller, D.

 and shows no items.

 Reading through the changelist for 1.5.2, my understanding was that
 this http://jira.dspace.org/jira/browse/DS-132 fixed this problem(?).
 Another issue (http://jira.dspace.org/jira/browse/DS-130), says that
 you need to remove the URIEncoding=UTF-8 from the tomcat settings.
 This setting is currently on for us.

 In 1.5.2, do you still need to remove this from the tomcat settings
 (although the manual states that when installing, you need to make
 sure this is added to the tomcat settings), or am I missing something.
 Also, the web.xml file does have cocoon filter in it.

 If it makes any difference, the request header for the browse page of
 this author is

 Accept-Charset   ISO-8859-1,utf-8;q=0.7,*;q=0.7

 and the response header is

 Content-Typetext/html;charset=utf-8

  

 Thanks

 Jennifer Whalan

 *Jennifer Whalan
 *Territory Stories Administrator
 Innovation  Access, Northern Territory Library
 Department of Natural Resources, Environment, The Arts and Sport
 Northern Territory Government

 Phone:  (08) 8922 0757
 Fax:(08) 8922 0722
 Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au
 Web:   * *www.ntl.nt.gov.au http://www.ntl.nt.gov.au*
 *
 The information contained in this message and any attachments may be
 confidential information and may be subject to legal privilege, public
 interest or legal profession privilege. If you are not the intended
 recipient, any use, disclosure or copying of this message or any
 attachments is unauthorised. If you have received this document in
 error, please advise the sender. No representation or warranty is
 given that attached files are free from viruses or other defects. The
 recipient assumes all responsibility for any loss or damage resulting

 

 --
 The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
 production scanning environment may not be a perfect world - but thanks to
 Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
 Series Scanner you'll get full speed at 300 dpi even with all image 
 processing features enabled. http://p.sf.net/sfu/kodak-com
 

 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech
   


-- 
Dott. Andrea Bollini
Project Manager, IT Architect  Systems Integrator
Sezione Servizi per le Biblioteche e l'Editoria Elettronica
CILEA, http://www.cilea.it
tel. +39 06-59292853
cel. +39 348-8277525

---

Disclaimer: the content of this email is confidential and may be privileged, 
and it must not be disclosed or copied without the sender's consent. If you 
have received this message in error, please notify the sender and remove it 
from your system. The content of this email does not constitute legal advice, 
nor any responsibility is accepted for loss or damage incurred as a result of 
acting upon its contents or attachments. 
The statements and opinions expressed in this email are those of the author and 
do not necessarily reflect those of the employer.

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] FW: 1.5.2 browse and UTF-8

2009-05-05 Thread Jennifer Whalan
Just resending, as I did not get any replies.

 

 

A question about browsing in 1.5.2 XMLUI

 

On our test instance, we have upgraded from 1.5.1, to 1.5.2 (using manakin), 
and we have an author with the name Müller, D. However, when you go to view 
the browse list of the items of this author, the url is 
browse?value=Müller%2C+D.type=author, but the page says 

Browsing by Author Müller, D.

and shows no items.

Reading through the changelist for 1.5.2, my understanding was that this 
http://jira.dspace.org/jira/browse/DS-132 fixed this problem(?). Another issue 
(http://jira.dspace.org/jira/browse/DS-130), says that you need to remove the 
URIEncoding=UTF-8 from the tomcat settings. This setting is currently on for 
us.

In 1.5.2, do you still need to remove this from the tomcat settings (although 
the manual states that when installing, you need to make sure this is added to 
the tomcat settings), or am I missing something. Also, the web.xml file does 
have cocoon filter in it.

If it makes any difference, the request header for the browse page of this 
author is

Accept-Charset   ISO-8859-1,utf-8;q=0.7,*;q=0.7

and the response header is

Content-Typetext/html;charset=utf-8

 

Thanks

Jennifer Whalan

Jennifer Whalan
Territory Stories Administrator
Innovation  Access, Northern Territory Library
Department of Natural Resources, Environment, The Arts and Sport
Northern Territory Government 

Phone:  (08) 8922 0757
Fax:(08) 8922 0722
Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au 
Web:www.ntl.nt.gov.au http://www.ntl.nt.gov.au 

The information contained in this message and any attachments may be 
confidential information and may be subject to legal privilege, public interest 
or legal profession privilege. If you are not the intended recipient, any use, 
disclosure or copying of this message or any attachments is unauthorised. If 
you have received this document in error, please advise the sender. No 
representation or warranty is given that attached files are free from viruses 
or other defects. The recipient assumes all responsibility for any loss or 
damage resulting

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] FW: 1.5.2 browse and UTF-8 and diacritics

2009-05-05 Thread Jennifer Whalan
While I was at it. Has anyone solved the issue about the sort order, with 
authors (or subjects I suppose), that contain diacritics. At this link: 
http://www.territorystories.nt.gov.au/browse?order=ASCrpp=20sort_by=-1etal=-1offset=5823type=author

 

We have the authors:

Muston, C.

Mutch, Verdun Joseph.

Müller, D.

Myeni, Annie D.

Myerscough, Mark.

 

 

But our cataloguers expect for the author Müller, D. to be between the 
authors:

 

Mull, A. E. E.

 

and 

 

Muller, W.J.

 

 

I've had a look at the source, and if I'm reading it correctly, the 
OrderFormatAuthor is the file that controls this, and when it calls 
DecomposeDiacitics, it changes this author to mu(diacritic)ller, w.j. I'm 
assuming that because it places the diacritic after the u, that is why this 
author is sorting after all the authors that begin with Mu.

 

To make this long story short, is there a way to make the sort, ignore 
diacritics completely, and just order by the character.

 

 

Thanks

Jennifer

 

Jennifer Whalan
Territory Stories Administrator
Innovation  Access, Northern Territory Library
Department of Natural Resources, Environment, The Arts and Sport
Northern Territory Government 

Phone:  (08) 8922 0757
Fax:(08) 8922 0722
Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au 
Web:www.ntl.nt.gov.au http://www.ntl.nt.gov.au 

The information contained in this message and any attachments may be 
confidential information and may be subject to legal privilege, public interest 
or legal profession privilege. If you are not the intended recipient, any use, 
disclosure or copying of this message or any attachments is unauthorised. If 
you have received this document in error, please advise the sender. No 
representation or warranty is given that attached files are free from viruses 
or other defects. The recipient assumes all responsibility for any loss or 
damage resulting directly or indirectly from the use of any attached files.



From: Jennifer Whalan [mailto:jennifer.wha...@nt.gov.au] 
Sent: Wednesday, 6 May 2009 10:32 AM
To: dspace-tech@lists.sourceforge.net
Subject: [Dspace-tech] FW: 1.5.2 browse and UTF-8

 

Just resending, as I did not get any replies.

 

 

A question about browsing in 1.5.2 XMLUI

 

On our test instance, we have upgraded from 1.5.1, to 1.5.2 (using manakin), 
and we have an author with the name Müller, D. However, when you go to view 
the browse list of the items of this author, the url is 
browse?value=Müller%2C+D.type=author, but the page says 

Browsing by Author Müller, D.

and shows no items.

Reading through the changelist for 1.5.2, my understanding was that this 
http://jira.dspace.org/jira/browse/DS-132 fixed this problem(?). Another issue 
(http://jira.dspace.org/jira/browse/DS-130), says that you need to remove the 
URIEncoding=UTF-8 from the tomcat settings. This setting is currently on for 
us.

In 1.5.2, do you still need to remove this from the tomcat settings (although 
the manual states that when installing, you need to make sure this is added to 
the tomcat settings), or am I missing something. Also, the web.xml file does 
have cocoon filter in it.

If it makes any difference, the request header for the browse page of this 
author is

Accept-Charset   ISO-8859-1,utf-8;q=0.7,*;q=0.7

and the response header is

Content-Typetext/html;charset=utf-8

 

Thanks

Jennifer Whalan

Jennifer Whalan
Territory Stories Administrator
Innovation  Access, Northern Territory Library
Department of Natural Resources, Environment, The Arts and Sport
Northern Territory Government 

Phone:  (08) 8922 0757
Fax:(08) 8922 0722
Email:  jennifer.wha...@nt.gov.au mailto:jennifer.wha...@nt.gov.au 
Web:www.ntl.nt.gov.au http://www.ntl.nt.gov.au 

The information contained in this message and any attachments may be 
confidential information and may be subject to legal privilege, public interest 
or legal profession privilege. If you are not the intended recipient, any use, 
disclosure or copying of this message or any attachments is unauthorised. If 
you have received this document in error, please advise the sender. No 
representation or warranty is given that attached files are free from viruses 
or other defects. The recipient assumes all responsibility for any loss or 
damage resulting

--
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech