As I suspected, it's the SOLR index that's messed up.
Executing this SOLR query:
http://localhost:8080/solr/search/select?indent=true&q=andr%C3%A9%20luiz%20lopes%20de%20alcantara&fq=-type:(not%C3%ADcia%20de%20jornal)<http://www2.senado.leg.br/solr/search/select?indent=true&q=andr%C3%A9%20luiz%20lopes%20de%20alcantara&fq=-type:(not%C3%ADcia%20de%20jornal)>

Resulted in this XML
https://dl.dropboxusercontent.com/u/4193365/solr-responde-encoding.xml

I'll try an update-discovery-index -b, but if anyone has any suggestion I'd
appreciate it =)

Ats,

Alcides Carlos de Moraes Neto
"Sometimes I think we're alone. Sometimes I think we're not. In either
case, the thought is staggering."
- R. Buckminster Fuller


2013/9/10 Alcides Carlos de Moraes Neto <alcides.n...@gmail.com>

> I ran update-discovery-index -f, but the results still show encoding
> issues.
>
>
> http://www2.senado.leg.br/bdsf/discover?filtertype_0=type&filter_relational_operator_0=notequals&filter_0=not%C3%ADcia+de+jornal&submit_apply_filter=Aplicar&query=andr%C3%A9+luiz+lopes+de+alcantara
>
> I'm stumped right now, what else can I do?
>
> Ats,
>
> Alcides Carlos de Moraes Neto
> "Sometimes I think we're alone. Sometimes I think we're not. In either
> case, the thought is staggering."
> - R. Buckminster Fuller
>
>
> 2013/9/6 Alcides Carlos de Moraes Neto <alcides.n...@gmail.com>
>
>> Just a follow up.
>> filter-media -f seems to have fixed the issue with the OCR txt.
>> But some search results still show encoding issues.
>>
>> I believe I need to regenerate the solr index.
>>
>> Ats,
>>
>> Alcides Carlos de Moraes Neto
>> "Sometimes I think we're alone. Sometimes I think we're not. In either
>> case, the thought is staggering."
>> - R. Buckminster Fuller
>>
>>
>> 2013/9/3 Alcides Carlos de Moraes Neto <alcides.n...@gmail.com>
>>
>>> Hello helix, thank you for your input.
>>>
>>> Indeed, it is a problem with the filter-media generated txt.
>>> A filter-media -f resolved the issue for this specific item. I scheduled
>>> a full filter-media -f of the repository tonight.
>>>
>>>
>>> Ats,
>>>
>>>
>>> Alcides Carlos de Moraes Neto
>>>
>>>
>>> 2013/9/3 helix84 <heli...@centrum.sk>
>>>
>>>> On Tue, Sep 3, 2013 at 1:24 AM, Alcides Carlos de Moraes Neto
>>>> <alcides.n...@gmail.com> wrote:
>>>> > I have checked the .txt media-filter generates, they are all UTF-8.
>>>>
>>>> What I see (see attachment) looks like double-encoded UTF-8 (it
>>>> happens when a charset converter is told that a file is to be encoded
>>>> from one character set to UTF-8, but it actually already was UTF-8) -
>>>> which would seem like valid UTF-8 to a machine, but has nonsensical
>>>> characters. What do you see?
>>>>
>>>> I don't have a solution yet, though.
>>>>
>>>> On Tue, Sep 3, 2013 at 2:28 AM, Keir Vaughan-Taylor <k...@usyd.edu.au>
>>>> wrote:
>>>> > Is it a problem that the extension is pdf.txt   ?
>>>>
>>>> No, that's normal.
>>>>
>>>>
>>>> Regards,
>>>> ~~helix84
>>>>
>>>> Compulsory reading: DSpace Mailing List Etiquette
>>>> https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
>>>>
>>>
>>>
>>
>
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to