I'm going to chime in here as I've also recently been working with an issue related to (read: learning about) html content within dc tags and the rendering thereof within Manakin (cocoon).

Antonio, you and I are in the same situation. We both have "html" data (actually stored with entity references) within our databases that we need our end users' browsers to render as html. The trouble is that there is only one "parsing" step between our databases and what the user sees on his/her browser (that step would be the browser engine's actual parsing of the content, if I haven't missed my mark).

Parsing: transforms "&lt;" to "<" and "<" to interpretation as an actual node (<node/>).
Serializing: transforms nodes into "<" and "<" into "&lt;".
Two steps each way.

Since our source is of entity references and there is only one parsing step, it means that in order to have the browser understand our intention of outputting "nodes", we need to add another parsing step. Since DSpace uses Cocoon and Cocoon uses Xerces, that means Exslt (at least, that's the only extensions package I'm aware of for Xerces).

I think there are two potential ways to go about this.

1)  dyn:evaluate()
This would function like Saxon's saxon:parse() (I think). This is probably not the way to go unless it's the only option, as it can get to be a very expensive operation fairly quickly.

2) str:replace()
This is probably the way to go, but it might require adding a transformer (step 2.5) to the theme's sitemap.xmap to replace the entity references with actual symbols before it goes any further down the chain. It's possible that a transformer wouldn't be necessary and you could just add it in the xslt stylesheet, but I think the transformer might keep things more simplified.

Of course the third option would be to not have html data using entity references within the database, but for you that presents a security risk and for me it's just content I have very little control over.

Other than that, if anyone has any further comments on this issue or parsing/serializing as it relates to cocoon/dspace, the feedback would be appreciated!

 - Patrick

P.S. - Antonio, there's a good chance that I'll be exploring the options I listed above over the next couple weeks, but I'll be out most of this week, if you like I can keep you informed


On Jul 13, 2009, at 10:07 AM, Antonio Cuomo wrote:

Dear Mark, it's the common behavior with all the DSpace installation i have seen (MIT included).

The problem is that all the data in the field dc.description are saved as plain text for security issues.

so the data must to be reconverted in html before being pushed to the UI.

so, do you know what is the java class that retrieve the information from the database and pass it to the UI?


thank you very much
Antonio



On Mon, Jul 13, 2009 at 9:16 AM, Mark Diggory <mdigg...@atmire.com> wrote:
Antonio,

It is unclear why your case is not working.  I can assure you that a
default installation of DSpace Manakin XMLUI will allow you to place
html in the form fields for any Community/Collection and that will
render as HTML in the Community/collection views without being
escaped, this is expected behavior.  IT shouldn't require altering the
xslt templates to correct for your problem, there is apparently
something else going wrong with your installation.  HTML escaping is
not used when the content is stored in the database, it is stored as
plain old unescaped html text.  I suspect that there is something
concerning your environment different from a typical default
installation running on tomcat/linux that is giving rise to this
problem.

> the "problem" that the data inside the database are saved as text and
> special character are added to avoid SQL-injection

Are you running some sort of sql-injection filtering in-front of DSpace?

Mark

--
Mark R. Diggory
@mire - http://www.atmire.com

On Sat, Jul 11, 2009 at 3:33 AM, Antonio Cuomo<anto...@parliaments.info > wrote:
> Dear Mark, thank you for your reply,
>
> unfortunately i didn't worked.
>
> The problem is:
> cocoon throught a java class take the data from the data base and pass them
> directly to manakin without any change.
>
> manakin build up the layout the sent it to the browser with the data cocoon
> passed
>
>
> the "problem" that the data inside the database are saved as text and
> special character are added to avoid SQL-injection
> so if i write:
> <h3> Ciao <h3> <br/> <p> forever</p>
>
> the data appears in the data base in this way
>
> &lt;h3&gt;Ciao&lt;/h3&gt; &lt;br/&gt; &lt;p&gt;for ever &lt;p&gt;
>
> and so what the browser receive is
>
> &lt;h3&gt;Ciao&lt;/h3&gt; &lt;br/&gt; &lt;p&gt;for ever &lt;p&gt;
>
> that is visualized as
> <h3> Ciao <h3> <br/> <p> forever</p>
>
> what i need is to decode this special carachers &lt; and &gt; in < and >
>
> to do this i can try to modify the xsl DIM-Handler (do u know how?)
>
> or the java cocoon class (do you know witch one and how?)
>
> Thank you very much
> Antonio
>
>
> On Sat, Jul 11, 2009 at 3:02 AM, Mark Diggory <mdigg...@atmire.com> wrote:
>>
>> Use well formed xml here and try to wrap content with a <div> or <p>
>> tag and it should work better for you. You shouldn't require
>> alteration of the xslt for this.
>>
>> <div>
>> <h3> hello </h3>
>> <p>it is a description </p>
>> </div>
>>
>> Mark
>>
>> --
>> Mark R. Diggory
>> @mire - http://www.atmire.com
>>
>> 2009/7/10 Antonio Cuomo <anto...@parliaments.info>:
>> > dear D-Space developer/user
>> >
>> > i have a question:
>> >
>> > i have some html code in my Database in the description field, of course
>> > the
>> > html have been transformed in plain text.
>> > so the database entry is:
>> > <h3> hello </h3> </br><p>it is a description <p>
>> >
>> > when DSpace shows the database content it actually shows the text:
>> > <h3> hello </h3> </br><p>it is a description <p>
>> >
>> > while i wuold like to say the html resoults instead:
>> >
>> > hello
>> > it is a description
>> >
>> >
>> >
>> > How can i do it?
>> >
>> > i see two possibilities:
>> >
>> > - Overwrite the java class that take data from the database and send
>> > them
>> > to manakin in order to decode the html
>> >
>> >
>> > - working at Mankin level(but it seems me pretty much more
>> > complicated):in
>> > the file DIM-Handler.xsl
>> >
>> > <xsl:if test="dim:fie...@element='description' and not(@qualifier)]">
>> >              ...
>> > <xsl:copy-of select="./node()"/> <-- call
>> > some
>> > html decoder here
>> >              ...
>> >             </xsl:if>
>> >
>> >
>> > I'm sure i'm not the first one who had this need... and i can see some
>> > security issues concerned with the solution
>> > can somebody give me some indication or "a solution"?
>> >
>> > Thank you very much
>> > Antonio
>> >
>> >
>> > _______________________________________________
>> > Dspace-general mailing list
>> > dspace-gene...@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/dspace-general
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Enter the BlackBerry Developer Challenge
>> This is your chance to win up to $100,000 in prizes! For a limited time, >> vendors submitting new applications to BlackBerry App World(TM) will have >> the opportunity to enter the BlackBerry Developer Challenge. See full
>> prize
>> details at: http://p.sf.net/sfu/Challenge
>> _______________________________________________
>> Dspace-devel mailing list
>> dspace-de...@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dspace-devel
>
>

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: 
http://p.sf.net/sfu/Challenge_______________________________________________
Dspace-devel mailing list
dspace-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

---
Patrick K. Étienne
Systems Analyst
Digital Library Development
Library and Information Center
Georgia Institute of Technology
email: patrick.etie...@library.gatech.edu
phone: 404.385.8121

"Mediocre Writers Borrow; Great Writers Steal" - T.S. Eliot

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to