Re: [Dspace-tech] Problem with ordering in browsing

2011-05-26 Thread Ladislav Kulhanek
I have solved a problem. It is hack but it works.
The problem was, then item browsing did not work correctly, when values in
bi_* tables was normalised (characters with diacritic split to two
characters - first character without diacritics and second diacritic itself)
and search results sorting did not works correctly when values in bi_* was
not normalised:)

I use my own class OrderLowerTrim.java for creating sort strings:

public class OrderLowerTrim extends AbstractTextFilterOFD {

{
filters = new TextFilter[] { new LowerCaseAndTrim() };
}

}

So strings are not normalized in db. And I modified
org.dspace.search.DSIndexer.java to ignore configuration and make sort
strings normalized. Change is on line 1166:

//String value = OrderFormat.makeSortString(dcv[0].value, dcv[0].language,
so.getType());
String value = (new OrderFormatTitle()).makeSortString(dcv[0].value,
dcv[0].language);

Sort strings are always made by OrderFormatTitle, which produce normalized
values.

Graham, thanks again for your response, it was very helpful. Can you
explain, why simply writing UTF-8 to the database tables results in very
random sorting for diacritics? I do it now in DSpace and it works.

2011/5/23 Ladislav Kulhanek ladislav.kulha...@vsb.cz

 Thanks for responses.

 I created class OrderFormatLocale:

  public class OrderFormatLocale extends AbstractTextFilterOFD {
{
filters = new TextFilter[] { new LowerCaseAndTrim(),
   new
 LocaleOrderingFilter() };
}
 }

 but sorting was then very queer. For example alphabet starts with B, A
 was after D and there was other queer things like this. So I modified
 class by removing LocaleOrderingFilter to this form:

 public class OrderFormatLocale extends AbstractTextFilterOFD {
{
filters = new TextFilter[] { new LowerCaseAndTrim()};
}
 }

 Then sorting was correct in browsing ( by title, author and subjects
 too) but started to be incorrect in search results. When search
 results are sorted by title or author, string with diacritics are
 sorted to the end after all letters without diacritics.

 2011/5/19 Graham Triggs grahamtri...@gmail.com:
  Please take a look at a previous post of mine on this subject:
 
 http://dspace.2283337.n4.nabble.com/Browse-UTF-8-and-sorting-in-1-5-tp3281449p3281450.html
  Regards,
  G
 
  On 19 May 2011 15:18, Peter Dietz pdiet...@gmail.com wrote:
 
  Hi Ladislav,
  I've noticed that our librarians here are happier with sorting when we
 use
  the collate of C as opposed to utf8/en_US.
 
  postgres=# create database dspace with owner = dspace encoding='utf8'
  tablespace=pg_default lc_collate = 'C' lc_ctype='en_US.UTF-8' template
  template0;
 
  I've add these three authors to a test collection that had some sample
  data in it, and it has the results you were expecting:
  == Author Name ==
  Cabanová, Zuzana
  Cablová, Barbora
  creatorlast, creatorfirst
  Čabla, Michael
 
 
 
  Peter Dietz
 
 
 
  On Thu, May 19, 2011 at 4:41 AM, Ladislav Kulhanek
  ladislav.kulha...@vsb.cz wrote:
 
  Hello everybody.
 
  We have data in our DSpace in czech language (code cs in accordance
  with ISO 639-1) and we have a problem with order in browsing by
  author, titles and subjects (order in search results is correct).
  There are letters with diacritic in czech alphabet, for example Č
  (0x010C code in unicode). This letter should be ordered between C
  and D, but in DSpace it is ordered to the same place as C. For
  example we have ordered list as
 
  Cabanová, Zuzana
  Čabla, Michael
  Cablová, Barbora
 
  and this list should be
 
  Cabanová, Zuzana
  Cablová, Barbora
  Čabla, Michael
 
  And czech alphabet contains letter Ch (it consists from two
  characters). This letter should be ordered between h and i. This
  letter is ordered in DSpace correctly. So it looks like DSpace order
  in accordance with czech alphabet, but ignore diacritics.
  We have DSpace 1.7.1, Manakin, db PostgreSQL 8.4 (database has
  Collation and Ctype set as cs_CZ.UTF-8), and in tomcat connector is
  URIEncoding=UTF-8. Any idea how to solve it? Thanks.
 
  Ladislav Kulhanek
 
 
 
 --
  What Every C/C++ and Fortran developer Should Know!
  Read this article and learn how Intel has extended the reach of its
  next-generation tools to help Windows* and Linux* C/C++ and Fortran
  developers boost performance applications - including clusters.
  http://p.sf.net/sfu/intel-dev2devmay
  ___
  DSpace-tech mailing list
  DSpace-tech@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dspace-tech
 
 
 
 
 --
  What Every C/C++ and Fortran developer Should Know!
  Read this article and learn how Intel has extended the reach of its
  next-generation tools to help 

Re: [Dspace-tech] Problem with ordering in browsing

2011-05-23 Thread Ladislav Kulhanek
Thanks for responses.

I created class OrderFormatLocale:

 public class OrderFormatLocale extends AbstractTextFilterOFD {
{
filters = new TextFilter[] { new LowerCaseAndTrim(),
   new LocaleOrderingFilter() };
}
}

but sorting was then very queer. For example alphabet starts with B, A
was after D and there was other queer things like this. So I modified
class by removing LocaleOrderingFilter to this form:

public class OrderFormatLocale extends AbstractTextFilterOFD {
{
filters = new TextFilter[] { new LowerCaseAndTrim()};
}
}

Then sorting was correct in browsing ( by title, author and subjects
too) but started to be incorrect in search results. When search
results are sorted by title or author, string with diacritics are
sorted to the end after all letters without diacritics.

2011/5/19 Graham Triggs grahamtri...@gmail.com:
 Please take a look at a previous post of mine on this subject:
 http://dspace.2283337.n4.nabble.com/Browse-UTF-8-and-sorting-in-1-5-tp3281449p3281450.html
 Regards,
 G

 On 19 May 2011 15:18, Peter Dietz pdiet...@gmail.com wrote:

 Hi Ladislav,
 I've noticed that our librarians here are happier with sorting when we use
 the collate of C as opposed to utf8/en_US.

 postgres=# create database dspace with owner = dspace encoding='utf8'
 tablespace=pg_default lc_collate = 'C' lc_ctype='en_US.UTF-8' template
 template0;

 I've add these three authors to a test collection that had some sample
 data in it, and it has the results you were expecting:
 == Author Name ==
 Cabanová, Zuzana
 Cablová, Barbora
 creatorlast, creatorfirst
 Čabla, Michael



 Peter Dietz



 On Thu, May 19, 2011 at 4:41 AM, Ladislav Kulhanek
 ladislav.kulha...@vsb.cz wrote:

 Hello everybody.

 We have data in our DSpace in czech language (code cs in accordance
 with ISO 639-1) and we have a problem with order in browsing by
 author, titles and subjects (order in search results is correct).
 There are letters with diacritic in czech alphabet, for example Č
 (0x010C code in unicode). This letter should be ordered between C
 and D, but in DSpace it is ordered to the same place as C. For
 example we have ordered list as

 Cabanová, Zuzana
 Čabla, Michael
 Cablová, Barbora

 and this list should be

 Cabanová, Zuzana
 Cablová, Barbora
 Čabla, Michael

 And czech alphabet contains letter Ch (it consists from two
 characters). This letter should be ordered between h and i. This
 letter is ordered in DSpace correctly. So it looks like DSpace order
 in accordance with czech alphabet, but ignore diacritics.
 We have DSpace 1.7.1, Manakin, db PostgreSQL 8.4 (database has
 Collation and Ctype set as cs_CZ.UTF-8), and in tomcat connector is
 URIEncoding=UTF-8. Any idea how to solve it? Thanks.

 Ladislav Kulhanek


 --
 What Every C/C++ and Fortran developer Should Know!
 Read this article and learn how Intel has extended the reach of its
 next-generation tools to help Windows* and Linux* C/C++ and Fortran
 developers boost performance applications - including clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



 --
 What Every C/C++ and Fortran developer Should Know!
 Read this article and learn how Intel has extended the reach of its
 next-generation tools to help Windows* and Linux* C/C++ and Fortran
 developers boost performance applications - including clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Problem with ordering in browsing

2011-05-19 Thread Ladislav Kulhanek
Hello everybody.

We have data in our DSpace in czech language (code cs in accordance
with ISO 639-1) and we have a problem with order in browsing by
author, titles and subjects (order in search results is correct).
There are letters with diacritic in czech alphabet, for example Č
(0x010C code in unicode). This letter should be ordered between C
and D, but in DSpace it is ordered to the same place as C. For
example we have ordered list as

Cabanová, Zuzana
Čabla, Michael
Cablová, Barbora

and this list should be

Cabanová, Zuzana
Cablová, Barbora
Čabla, Michael

And czech alphabet contains letter Ch (it consists from two
characters). This letter should be ordered between h and i. This
letter is ordered in DSpace correctly. So it looks like DSpace order
in accordance with czech alphabet, but ignore diacritics.
We have DSpace 1.7.1, Manakin, db PostgreSQL 8.4 (database has
Collation and Ctype set as cs_CZ.UTF-8), and in tomcat connector is
URIEncoding=UTF-8. Any idea how to solve it? Thanks.

Ladislav Kulhanek

--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay
___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Problem with ordering in browsing

2011-05-19 Thread Peter Dietz
Hi Ladislav,

I've noticed that our librarians here are happier with sorting when we use
the collate of C as opposed to utf8/en_US.


postgres=# create database dspace with owner = dspace
encoding='utf8' tablespace=pg_default lc_collate = 'C'
lc_ctype='en_US.UTF-8' template template0;


I've add these three authors to a test collection that had some sample data
in it, and it has the results you were expecting:
== Author Name ==
Cabanová, Zuzana
Cablová, Barbora
creatorlast, creatorfirst
Čabla, Michael




Peter Dietz



On Thu, May 19, 2011 at 4:41 AM, Ladislav Kulhanek ladislav.kulha...@vsb.cz
 wrote:

 Hello everybody.

 We have data in our DSpace in czech language (code cs in accordance
 with ISO 639-1) and we have a problem with order in browsing by
 author, titles and subjects (order in search results is correct).
 There are letters with diacritic in czech alphabet, for example Č
 (0x010C code in unicode). This letter should be ordered between C
 and D, but in DSpace it is ordered to the same place as C. For
 example we have ordered list as

 Cabanová, Zuzana
 Čabla, Michael
 Cablová, Barbora

 and this list should be

 Cabanová, Zuzana
 Cablová, Barbora
 Čabla, Michael

 And czech alphabet contains letter Ch (it consists from two
 characters). This letter should be ordered between h and i. This
 letter is ordered in DSpace correctly. So it looks like DSpace order
 in accordance with czech alphabet, but ignore diacritics.
 We have DSpace 1.7.1, Manakin, db PostgreSQL 8.4 (database has
 Collation and Ctype set as cs_CZ.UTF-8), and in tomcat connector is
 URIEncoding=UTF-8. Any idea how to solve it? Thanks.

 Ladislav Kulhanek


 --
 What Every C/C++ and Fortran developer Should Know!
 Read this article and learn how Intel has extended the reach of its
 next-generation tools to help Windows* and Linux* C/C++ and Fortran
 developers boost performance applications - including clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech

--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Problem with ordering in browsing

2011-05-19 Thread Graham Triggs
Please take a look at a previous post of mine on this subject:

http://dspace.2283337.n4.nabble.com/Browse-UTF-8-and-sorting-in-1-5-tp3281449p3281450.html

Regards,
G

On 19 May 2011 15:18, Peter Dietz pdiet...@gmail.com wrote:

 Hi Ladislav,

 I've noticed that our librarians here are happier with sorting when we use
 the collate of C as opposed to utf8/en_US.

 postgres=# create database dspace with owner = dspace encoding='utf8' 
 tablespace=pg_default lc_collate = 'C' lc_ctype='en_US.UTF-8' template 
 template0;


 I've add these three authors to a test collection that had some sample data
 in it, and it has the results you were expecting:
 == Author Name ==
 Cabanová, Zuzana
 Cablová, Barbora
 creatorlast, creatorfirst
 Čabla, Michael




 Peter Dietz




 On Thu, May 19, 2011 at 4:41 AM, Ladislav Kulhanek 
 ladislav.kulha...@vsb.cz wrote:

 Hello everybody.

 We have data in our DSpace in czech language (code cs in accordance
 with ISO 639-1) and we have a problem with order in browsing by
 author, titles and subjects (order in search results is correct).
 There are letters with diacritic in czech alphabet, for example Č
 (0x010C code in unicode). This letter should be ordered between C
 and D, but in DSpace it is ordered to the same place as C. For
 example we have ordered list as

 Cabanová, Zuzana
 Čabla, Michael
 Cablová, Barbora

 and this list should be

 Cabanová, Zuzana
 Cablová, Barbora
 Čabla, Michael

 And czech alphabet contains letter Ch (it consists from two
 characters). This letter should be ordered between h and i. This
 letter is ordered in DSpace correctly. So it looks like DSpace order
 in accordance with czech alphabet, but ignore diacritics.
 We have DSpace 1.7.1, Manakin, db PostgreSQL 8.4 (database has
 Collation and Ctype set as cs_CZ.UTF-8), and in tomcat connector is
 URIEncoding=UTF-8. Any idea how to solve it? Thanks.

 Ladislav Kulhanek


 --
 What Every C/C++ and Fortran developer Should Know!
 Read this article and learn how Intel has extended the reach of its
 next-generation tools to help Windows* and Linux* C/C++ and Fortran
 developers boost performance applications - including clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




 --
 What Every C/C++ and Fortran developer Should Know!
 Read this article and learn how Intel has extended the reach of its
 next-generation tools to help Windows* and Linux* C/C++ and Fortran
 developers boost performance applications - including clusters.
 http://p.sf.net/sfu/intel-dev2devmay
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech


--
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its 
next-generation tools to help Windows* and Linux* C/C++ and Fortran 
developers boost performance applications - including clusters. 
http://p.sf.net/sfu/intel-dev2devmay___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech