[dspace-tech] Bad characters for OAI
Hello, We are working with DSpace 5.5 and Mirage theme. We are having problems with the characters for OAI. The words with accent... Please look: http://repositori.uvic.cat/oai/request?verb=ListSets Where can I look for try to fix this problem? Thanks in advance. -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
Re: [dspace-tech] Bad Characters making OAI returning mal formed XML
java version is 1-7_85, the char giving me problems is 0xdbc0 Em quinta-feira, 9 de junho de 2016 21:36:13 UTC+1, Stuart Yeates escreveu: > > The issue may be that the code blocks in question are not supported by the > version of java in use. Different versions of java support different > versions of unicode which include different sets of code blocks. Please > include the exact version of java and the code point / code block in any > bug reports. > > For example, for a long time 'Linear B' was a problem for us, but now it's > not. > > cheers > stuart > > -- > ...let us be heard from red core to black sky > > On Thu, Jun 9, 2016 at 11:09 PM, Tiago Guimarães> wrote: > >> DB is in UTF. >> Note that there is no error while using the JSPUI, only when trying to >> harvest the OAI, does that error appear. >> >> The log has that line that i posted: >> com.ctc.wstx.exc.WstxParsingException: Illegal character entity: >> expansion character (code 0xdbc0) not a valid XML character >> >> The really weird thing is that char code is not on the interval for >> invalid xml chars in the w3c documentation. >> >> From my understanding, this errors appear when somebody copy and pastes >> the the abstract from some pdf and carry over some weird chars. >> >> >> Em quinta-feira, 9 de junho de 2016 12:02:17 UTC+1, Luiz dos Santos >> escreveu: >>> >>> Hi, >>> >>>To me it seems a charset problem, are you sure that the database is >>> UTF-8? Do you see any error in the log? >>> >>> Best >>> Luiz >>> >>> On Thursday, June 9, 2016, Tiago Guimarães >>> wrote: >>> Hi all, I'm having problems with bad characters in OAI. It's the same as this JIRA ticket: https://jira.duraspace.org/projects/DS/issues/DS-2806 this is a problem that is appearing here, basicaly, OAI returns mal-formed XML because of weird chars Example: https://i.gyazo.com/22b7f355b0e71b830ec08378a9076c34.png Shouldn't dspace take care of that? At least warn the User when he pastes invalid chars when deposit an item. DSpace should probably have a feature that detects characters that break the OAI XML. I'm up to creating a PR that does that, but i need guidance. Also, according to this https://www.w3.org/TR/REC-xml/#NT-Char the char 0xdbc0 should be valid to XML, but OAI is giving me this: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0xdbc0) not a valid XML character -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout. >>> -- >> You received this message because you are subscribed to the Google Groups >> "DSpace Technical Support" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to dspace-tech...@googlegroups.com . >> To post to this group, send email to dspac...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/dspace-tech. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
Re: [dspace-tech] Bad Characters making OAI returning mal formed XML
The issue may be that the code blocks in question are not supported by the version of java in use. Different versions of java support different versions of unicode which include different sets of code blocks. Please include the exact version of java and the code point / code block in any bug reports. For example, for a long time 'Linear B' was a problem for us, but now it's not. cheers stuart -- ...let us be heard from red core to black sky On Thu, Jun 9, 2016 at 11:09 PM, Tiago Guimarães < tiagommguimarae...@gmail.com> wrote: > DB is in UTF. > Note that there is no error while using the JSPUI, only when trying to > harvest the OAI, does that error appear. > > The log has that line that i posted: > com.ctc.wstx.exc.WstxParsingException: Illegal character entity: > expansion character (code 0xdbc0) not a valid XML character > > The really weird thing is that char code is not on the interval for > invalid xml chars in the w3c documentation. > > From my understanding, this errors appear when somebody copy and pastes > the the abstract from some pdf and carry over some weird chars. > > > Em quinta-feira, 9 de junho de 2016 12:02:17 UTC+1, Luiz dos Santos > escreveu: >> >> Hi, >> >>To me it seems a charset problem, are you sure that the database is >> UTF-8? Do you see any error in the log? >> >> Best >> Luiz >> >> On Thursday, June 9, 2016, Tiago Guimarães>> wrote: >> >>> Hi all, >>> >>> >>> I'm having problems with bad characters in OAI. >>> >>> >>> It's the same as this JIRA ticket: >>> https://jira.duraspace.org/projects/DS/issues/DS-2806 >>> >>> >>> this is a problem that is appearing here, basicaly, OAI returns >>> mal-formed XML because of weird chars >>> >>> Example: https://i.gyazo.com/22b7f355b0e71b830ec08378a9076c34.png >>> >>> Shouldn't dspace take care of that? At least warn the User when he >>> pastes invalid chars when deposit an item. >>> >>> >>> DSpace should probably have a feature that detects characters that break >>> the OAI XML. I'm up to creating a PR that does that, but i need guidance. >>> >>> >>> >>> Also, according to this https://www.w3.org/TR/REC-xml/#NT-Char the char >>> 0xdbc0 should be valid to XML, but OAI is giving me this: >>> com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion >>> character (code 0xdbc0) not a valid XML character >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "DSpace Technical Support" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to dspace-tech+unsubscr...@googlegroups.com. >>> To post to this group, send email to dspace-tech@googlegroups.com. >>> Visit this group at https://groups.google.com/group/dspace-tech. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "DSpace Technical Support" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to dspace-tech+unsubscr...@googlegroups.com. > To post to this group, send email to dspace-tech@googlegroups.com. > Visit this group at https://groups.google.com/group/dspace-tech. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.
[dspace-tech] Bad Characters making OAI returning mal formed XML
Hi all, I'm having problems with bad characters in OAI. It's the same as this JIRA ticket: https://jira.duraspace.org/projects/DS/issues/DS-2806 this is a problem that is appearing here, basicaly, OAI returns mal-formed XML because of weird chars Example: https://i.gyazo.com/22b7f355b0e71b830ec08378a9076c34.png Shouldn't dspace take care of that? At least warn the User when he pastes invalid chars when deposit an item. DSpace should probably have a feature that detects characters that break the OAI XML. I'm up to creating a PR that does that, but i need guidance. Also, according to this https://www.w3.org/TR/REC-xml/#NT-Char the char 0xdbc0 should be valid to XML, but OAI is giving me this: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0xdbc0) not a valid XML character -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.