[
https://issues.apache.org/jira/browse/XMLBEANS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wing Yew Poon reassigned XMLBEANS-295:
--------------------------------------
Assignee: Cezar Andrei
> setLoadStripWhitespace() api errors when trimming white space characters
> ------------------------------------------------------------------------
>
> Key: XMLBEANS-295
> URL: https://issues.apache.org/jira/browse/XMLBEANS-295
> Project: XMLBeans
> Issue Type: Bug
> Components: Validator
> Affects Versions: Version 2.2.1
> Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2
> Reporter: David RR Webber
> Assignee: Cezar Andrei
> Fix For: TBD
>
>
> Situation Summary
> We implemented to production using the setLoadStripWhitespace() api in
> XMLBeans. After some days we started getting intermittent failures from
> occasional XML transactions.
> After a week of investigation we realized that flushText() method itself was
> the cause - having eliminated all other factors. Specifically we have
> determined that character strings containing the & character result in spaces
> being stripped immediately after the & - e.g. <company>B & H Photo</company>
> becomes <company>B &H Photo</company>.
> We realize that there is a patch available for & processing - and we are
> currently testing that to see if is cures the problem relating to &
> (http://issues.apache.org/jira/browse/XMLBEANS-274 )
> However we are also seeing an intermittent problem in our UNIX environment
> associated with colon : (could be other characters as well - we do not have
> definitive list). What we found is intermittent spaces being trimmed in
> various fields that do not contain "&" (the original XMLBEAN-274 bug
> reported). This one we cannot reproduce in our Windows development systems -
> but it is happening intermittently in SunOS.
> Again space either immediately following the colon or in subsequent string is
> stripped - for tokenized elements - e.g. <urgent>Yes: Y</urgent> becomes
> <urgent>Yes:Y</urgent> and then the object returns NULL value because this is
> then not a valid allowed value for the tokenized list. Similarly
> <location>USA: United States</location> became <location>USA:
> UnitedStates</location>. We suspect that there is a prior character before
> the colon that might be triggering this behaviour but we have not yet
> determined when or how. This illustrates how complex this issue is in terms
> of the current XMLBeans implementation approach.
> Analysis
> We have looked at how and where XMLBeans is doing the white space trim during
> the unmarshalling of the XML content. When it detects a white space - it
> then invokes a stripRight() method loop. We are not convinced that this is
> architecturally sound at the point it is employed - it is leading to
> complexity and obviously a lot of edge conditions and some combinations of
> characters that are not handled consistently and correctly.
> Our preferred approach would be to defer the white space trim until
> post-unmarshalling - so the initial process can treat the XML content "as is"
> between the angle brackets - then once extracted - then apply the trim(). At
> that point a simple java string object trim() can be employed. This could be
> provided as an alternate method call to the current setLoadStripWhitespace()
> api that would iterate through the entire structure of objects instead of the
> original XML stream. The only check that would be necessary is if the XML
> markup itself set the xml:space="preserve" attribute option for an element
> object - in which case the trim() would be automatically skipped for that
> content object item. What is happening right now is that the existing
> flushText() method is mixing up XML markup and the content - instead there
> needs to be a clear separation between the element angle brackets and
> attribute quotes - and the content itself.
> Again the caveat maybe here - maybe the current approach is intended to be
> prior to error checking on tokenized lists - to prevent failure there due to
> extra spaces? However - even so it is not cleanly enough separated - and
> clearly again it would be simpler to use a java string class trim method
> within the tokenized evaluation itself on just the string.
> Suggested Solution
> Re-factor the current white space setLoadStripWhitespace() api to delay
> string manipulation on content until after unpacking of the content and XML
> markup - instead of prior-to as is currently happening. This makes for much
> simpler white space trim logic (can simply use the Java string class method)
> that does not need to look for markup artifacts as well.
> We are not clear on who owns this particular feature in XMLBeans - whether
> they are currently available to assist on this - but we would be prepared to
> work with the team to develop a better solution here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]