[Dspace-tech] Character encoding issues in Discovery search results

2014-12-09 Thread Alan Orth
Hi,

Our DSpace 4.2's Discovery search results displays snippets from the item's
full-text PDF extract, but we get mojibake (strange characters) in the
summaries (see attached photo).  Browsing to the item's PDF-extracted text
bitstream indeed shows the strange characters, and Firefox's developer
tools show the encoding is ISO-8859-1.  What's strange is, if I download
the file the resulting encoding is UTF-8, and these characters display
properly.

I have tried the following:
- Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
- Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
`filter-media' as well as `index-discovery -b'

What could I be missing?

Thanks!

-- 
Alan Orth
alan.o...@gmail.com
https://alaninkenya.org
https://mjanja.ch
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Character encoding issues in Discovery search results

2014-12-09 Thread Antoine Snyers

Hi Alan Orth

-Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
Here is the line:
https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75

Then rerun 'index-discovery -b'.
I believe this will resolve your problem.

Antoine Snyers

Alan Orth schreef op 09/12/14 14:49:

Hi,

Our DSpace 4.2's Discovery search results displays snippets from the 
item's full-text PDF extract, but we get mojibake (strange characters) 
in the summaries (see attached photo).  Browsing to the item's 
PDF-extracted text bitstream indeed shows the strange characters, and 
Firefox's developer tools show the encoding is ISO-8859-1.  What's 
strange is, if I download the file the resulting encoding is UTF-8, 
and these characters display properly.


I have tried the following:
- Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
- Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run 
`filter-media' as well as `index-discovery -b'


What could I be missing?

Thanks!

--
Alan Orth
alan.o...@gmail.com 
https://alaninkenya.org
https://mjanja.ch
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk


___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



--
logo
*Antoine Snyers*
/2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010/
/Esperantolaan 4, Heverlee 3001, Belgium/
www.atmire.com 
 



--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Character encoding issues in Discovery search results

2014-12-09 Thread Alan Orth
Antoine,

In this case the dspace script respects the environment's JAVA_OPTS if it
is set; the one in the script is only used if JAVA_OPTS is empty.

Alan

On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers  wrote:

>  Hi Alan Orth
>
> -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
> Here is the line:
> https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75
>
> Then rerun 'index-discovery -b'.
> I believe this will resolve your problem.
>
> Antoine Snyers
>
> Alan Orth schreef op 09/12/14 14:49:
>
>   Hi,
>
>  Our DSpace 4.2's Discovery search results displays snippets from the
> item's full-text PDF extract, but we get mojibake (strange characters) in
> the summaries (see attached photo).  Browsing to the item's PDF-extracted
> text bitstream indeed shows the strange characters, and Firefox's developer
> tools show the encoding is ISO-8859-1.  What's strange is, if I download
> the file the resulting encoding is UTF-8, and these characters display
> properly.
>
>  I have tried the following:
>  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
>  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
> `filter-media' as well as `index-discovery -b'
>
>   What could I be missing?
>
> Thanks!
>
>  --
>  Alan Orth
> alan.o...@gmail.com
> https://alaninkenya.org
> https://mjanja.ch
> "In heaven all the interesting people are missing." -Friedrich Nietzsche
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, 
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
>
>
> ___
> DSpace-tech mailing 
> listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette: 
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
>
>
> --
>   [image: logo]
>  *Antoine Snyers*
> *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
> *Esperantolaan 4, Heverlee 3001, Belgium*
> www.atmire.com
> 
>



-- 
Alan Orth
alan.o...@gmail.com
https://alaninkenya.org
https://mjanja.ch
"In heaven all the interesting people are missing." -Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Character encoding issues in Discovery search results

2015-01-02 Thread bender
Hi Alan:

Did you solved this issue?
And how? If you did.

Bender

2014-12-09 13:09 GMT-03:00 Alan Orth :

> Antoine,
>
> In this case the dspace script respects the environment's JAVA_OPTS if it
> is set; the one in the script is only used if JAVA_OPTS is empty.
>
> Alan
>
> On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers  wrote:
>
>>  Hi Alan Orth
>>
>> -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
>> Here is the line:
>> https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75
>>
>> Then rerun 'index-discovery -b'.
>> I believe this will resolve your problem.
>>
>> Antoine Snyers
>>
>> Alan Orth schreef op 09/12/14 14:49:
>>
>>   Hi,
>>
>>  Our DSpace 4.2's Discovery search results displays snippets from the
>> item's full-text PDF extract, but we get mojibake (strange characters) in
>> the summaries (see attached photo).  Browsing to the item's PDF-extracted
>> text bitstream indeed shows the strange characters, and Firefox's developer
>> tools show the encoding is ISO-8859-1.  What's strange is, if I download
>> the file the resulting encoding is UTF-8, and these characters display
>> properly.
>>
>>  I have tried the following:
>>  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
>>  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
>> `filter-media' as well as `index-discovery -b'
>>
>>   What could I be missing?
>>
>> Thanks!
>>
>>  --
>>  Alan Orth
>> alan.o...@gmail.com
>> https://alaninkenya.org
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." -Friedrich Nietzsche
>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>
>>
>> --
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, 
>> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>
>>
>>
>> ___
>> DSpace-tech mailing 
>> listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
>> List Etiquette: 
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>
>>
>>
>> --
>>   [image: logo]
>>  *Antoine Snyers*
>> *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
>> *Esperantolaan 4, Heverlee 3001, Belgium*
>> www.atmire.com
>> 
>>
>
>
>
> --
> Alan Orth
> alan.o...@gmail.com
> https://alaninkenya.org
> https://mjanja.ch
> "In heaven all the interesting people are missing." -Friedrich Nietzsche
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> ___
> DSpace-tech mailing list
> DSpace-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette:
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Character encoding issues in Discovery search results

2015-02-12 Thread Alan Orth
Hey, bender. No, we didn't figure this out. In fact, it's still an open
issue on our institution's GitHub issue tracker!

https://github.com/ilri/DSpace/issues/43

I've posted a few notes there but haven't come to any conclusion. :(

Alan

On Fri Jan 02 2015 at 8:54:00 PM bender 
wrote:

> Hi Alan:
>
> Did you solved this issue?
> And how? If you did.
>
> Bender
>
> 2014-12-09 13:09 GMT-03:00 Alan Orth :
>
> Antoine,
>>
>> In this case the dspace script respects the environment's JAVA_OPTS if it
>> is set; the one in the script is only used if JAVA_OPTS is empty.
>>
>> Alan
>>
>> On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers 
>> wrote:
>>
>>>  Hi Alan Orth
>>>
>>> -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
>>> Here is the line:
>>> https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75
>>>
>>> Then rerun 'index-discovery -b'.
>>> I believe this will resolve your problem.
>>>
>>> Antoine Snyers
>>>
>>> Alan Orth schreef op 09/12/14 14:49:
>>>
>>>   Hi,
>>>
>>>  Our DSpace 4.2's Discovery search results displays snippets from the
>>> item's full-text PDF extract, but we get mojibake (strange characters) in
>>> the summaries (see attached photo).  Browsing to the item's PDF-extracted
>>> text bitstream indeed shows the strange characters, and Firefox's developer
>>> tools show the encoding is ISO-8859-1.  What's strange is, if I download
>>> the file the resulting encoding is UTF-8, and these characters display
>>> properly.
>>>
>>>  I have tried the following:
>>>  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
>>>  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
>>> `filter-media' as well as `index-discovery -b'
>>>
>>>   What could I be missing?
>>>
>>> Thanks!
>>>
>>>  --
>>>  Alan Orth
>>> alan.o...@gmail.com
>>> https://alaninkenya.org
>>> https://mjanja.ch
>>> "In heaven all the interesting people are missing." -Friedrich Nietzsche
>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>>
>>>
>>> --
>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>> Get technology previously reserved for billion-dollar corporations, 
>>> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>>
>>>
>>>
>>> ___
>>> DSpace-tech mailing 
>>> listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
>>> List Etiquette: 
>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>>
>>>
>>>
>>> --
>>>   [image: logo]
>>>  *Antoine Snyers*
>>> *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
>>> *Esperantolaan 4, Heverlee 3001, Belgium*
>>> www.atmire.com
>>> 
>>>
>>
>>
>>
>> --
>> Alan Orth
>> alan.o...@gmail.com
>> https://alaninkenya.org
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." -Friedrich Nietzsche
>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>
>>
>> --
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> ___
>> DSpace-tech mailing list
>> DSpace-tech@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>> List Etiquette:
>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>
>
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Re: [Dspace-tech] Character encoding issues in Discovery search results

2015-02-13 Thread Aleksandar Stojanov
Hi,

I've visited the repository link (
https://cgspace.cgiar.org/handle/10568/51393) on the GitHub discussion page
and made some searching there. I've noticed that it happens on a lot of
pdf's there and always at the same place which is after page number. It
then inserts form feed character which is Unicode \u000c character for new
page or new line. Although, this is valid HTML, it's invalid XHTML and
recommended practice would be to threat it as zero-width character because
it has no semantic meaning.
http://www.w3.org/TR/unicode-xml/#White
We had similar problem with search results and weird characters and this
helped:
http://sourceforge.net/p/dspace/mailman/message/31212700/

Can you try that solution and post back the results? Also, don't forget to
make a back up first.

Cheers,
Aleksandar Stojanov

On Thu, Feb 12, 2015 at 10:53 AM, Alan Orth  wrote:

> Hey, bender. No, we didn't figure this out. In fact, it's still an open
> issue on our institution's GitHub issue tracker!
>
> https://github.com/ilri/DSpace/issues/43
>
> I've posted a few notes there but haven't come to any conclusion. :(
>
> Alan
>
> On Fri Jan 02 2015 at 8:54:00 PM bender 
> wrote:
>
>> Hi Alan:
>>
>> Did you solved this issue?
>> And how? If you did.
>>
>> Bender
>>
>> 2014-12-09 13:09 GMT-03:00 Alan Orth :
>>
>> Antoine,
>>>
>>> In this case the dspace script respects the environment's JAVA_OPTS if
>>> it is set; the one in the script is only used if JAVA_OPTS is empty.
>>>
>>> Alan
>>>
>>> On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers 
>>> wrote:
>>>
  Hi Alan Orth

 -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
 Here is the line:
 https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75

 Then rerun 'index-discovery -b'.
 I believe this will resolve your problem.

 Antoine Snyers

 Alan Orth schreef op 09/12/14 14:49:

   Hi,

  Our DSpace 4.2's Discovery search results displays snippets from the
 item's full-text PDF extract, but we get mojibake (strange characters) in
 the summaries (see attached photo).  Browsing to the item's PDF-extracted
 text bitstream indeed shows the strange characters, and Firefox's developer
 tools show the encoding is ISO-8859-1.  What's strange is, if I download
 the file the resulting encoding is UTF-8, and these characters display
 properly.

  I have tried the following:
  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
 `filter-media' as well as `index-discovery -b'

   What could I be missing?

 Thanks!

  --
  Alan Orth
 alan.o...@gmail.com
 https://alaninkenya.org
 https://mjanja.ch
 "In heaven all the interesting people are missing." -Friedrich
 Nietzsche
 GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration & more
 Get technology previously reserved for billion-dollar corporations, 
 FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk



 ___
 DSpace-tech mailing 
 listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
 List Etiquette: 
 https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



 --
   [image: logo]
  *Antoine Snyers*
 *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
 *Esperantolaan 4, Heverlee 3001, Belgium*
 www.atmire.com
 

>>>
>>>
>>>
>>> --
>>> Alan Orth
>>> alan.o...@gmail.com
>>> https://alaninkenya.org
>>> https://mjanja.ch
>>> "In heaven all the interesting people are missing." -Friedrich Nietzsche
>>> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>>>
>>>
>>> --
>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>>> Get technology previously reserved for billion-dollar corporations, FREE
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> ___
>>> DSpace-tech mailing list
>>> DSpace-tech@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>> List Etiquette:
>>> https://wiki.duraspace.org/displ

Re: [Dspace-tech] Character encoding issues in Discovery search results

2015-02-13 Thread Alan Orth
Hey, Aleksander.

Actually, I just fixed this... somehow on accident.

Today we did a batch metadata cleanup via SQL and modified the Discovery
sidebar facet configs, so I had to rebuild the indexes with
`index-discovery -b`. Also, we wanted to re-generate all of our PDF
thumbnails for DSpace 4's higher-quality versions (we upgraded a few months
ago but hadn't yet re-generated thumbnails for existing items).

I'm not sure why this didn't work before when I was doing my research in
December[0], but I'm glad that it's fixed!

Thanks for following up with me! I hope this helps someone else...

Alan

[0] https://github.com/ilri/DSpace/issues/43

On Fri Feb 13 2015 at 3:56:20 PM Aleksandar Stojanov 
wrote:

> Hi,
>
> I've visited the repository link (
> https://cgspace.cgiar.org/handle/10568/51393) on the GitHub discussion
> page and made some searching there. I've noticed that it happens on a lot
> of pdf's there and always at the same place which is after page number. It
> then inserts form feed character which is Unicode \u000c character for new
> page or new line. Although, this is valid HTML, it's invalid XHTML and
> recommended practice would be to threat it as zero-width character because
> it has no semantic meaning.
> http://www.w3.org/TR/unicode-xml/#White
> We had similar problem with search results and weird characters and this
> helped:
> http://sourceforge.net/p/dspace/mailman/message/31212700/
>
> Can you try that solution and post back the results? Also, don't forget to
> make a back up first.
>
> Cheers,
> Aleksandar Stojanov
>
> On Thu, Feb 12, 2015 at 10:53 AM, Alan Orth  wrote:
>
>> Hey, bender. No, we didn't figure this out. In fact, it's still an open
>> issue on our institution's GitHub issue tracker!
>>
>> https://github.com/ilri/DSpace/issues/43
>>
>> I've posted a few notes there but haven't come to any conclusion. :(
>>
>> Alan
>>
>> On Fri Jan 02 2015 at 8:54:00 PM bender 
>> wrote:
>>
>>> Hi Alan:
>>>
>>> Did you solved this issue?
>>> And how? If you did.
>>>
>>> Bender
>>>
>>> 2014-12-09 13:09 GMT-03:00 Alan Orth :
>>>
>>> Antoine,

 In this case the dspace script respects the environment's JAVA_OPTS if
 it is set; the one in the script is only used if JAVA_OPTS is empty.

 Alan

 On Tue, Dec 9, 2014 at 6:54 PM, Antoine Snyers 
 wrote:

>  Hi Alan Orth
>
> -Dfile.encoding=UTF-8 should be added to the "bin/dspace" command.
> Here is the line:
> https://github.com/DSpace/DSpace/blob/dspace-4.2/dspace/bin/dspace#L75
>
> Then rerun 'index-discovery -b'.
> I believe this will resolve your problem.
>
> Antoine Snyers
>
> Alan Orth schreef op 09/12/14 14:49:
>
>   Hi,
>
>  Our DSpace 4.2's Discovery search results displays snippets from the
> item's full-text PDF extract, but we get mojibake (strange characters) in
> the summaries (see attached photo).  Browsing to the item's PDF-extracted
> text bitstream indeed shows the strange characters, and Firefox's 
> developer
> tools show the encoding is ISO-8859-1.  What's strange is, if I download
> the file the resulting encoding is UTF-8, and these characters display
> properly.
>
>  I have tried the following:
>  - Confirmed our Tomcat connectors are using URIEncoding="UTF-8"
>  - Forced "-Dfile.encoding=UTF-8" in JAVA_OPTS and manually re-run
> `filter-media' as well as `index-discovery -b'
>
>   What could I be missing?
>
> Thanks!
>
>  --
>  Alan Orth
> alan.o...@gmail.com
> https://alaninkenya.org
> https://mjanja.ch
> "In heaven all the interesting people are missing." -Friedrich
> Nietzsche
> GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0
>
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, 
> FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>
>
>
> ___
> DSpace-tech mailing 
> listDSpace-tech@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/dspace-tech
> List Etiquette: 
> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>
>
>
> --
>   [image: logo]
>  *Antoine Snyers*
> *2888 Loker Avenue East, Suite 315, Carlsbad, CA. 92010*
> *Esperantolaan 4, Heverlee 3001, Belgium*
> www.atmire.com
> 
>



 --
 Alan Orth
 alan.o...@gmail.com
 https://alaninkenya.org
 https://m