Re: SDO - dealing with CDATA

2006-11-10 Thread Geoffrey Winn

On 08/11/06, Simon Laws [EMAIL PROTECTED] wrote:



OK, so I've take my option 1 approach (see previous mail) and implemented
a
solution in C++ SDO which allows CDATA sections and their strange markers
to
appear in SDO properties. In this way the API for reading and creating
CDATA
sections is the normal SDO string API. In a way this is just a preliminary
stab to allow us to play with CDATA and see whether this simple approach
is
satisfactory.

From the previous mails there was discussion of alternative approaches
where
special markers are introduced to indicate where CDATA appears hence
removing the need to maintain the CDATA markers in text. However there are
some tricky cases. Particularly where one or more CDATA sections appear
within primitive text string. As schema gives us no help in locating CDATA
sections this leaves the model at a bit of a loss in terms of representing
them. We would potentially end up adding scaffolding around primitive
string
types, or preferably create a new type, that is able to represent
accurately
the combination of text and CDATA sections.

Anyhow something more complex may be appropriate in the future but this
simple solution allows us to offer something to our PHP SDO users quickly
that I don't think causes us big problems for the future. If in the future
we have special flags we can always reproduce the CDATA markers if
required.


I created a JIRA to record progress on this issue (
http://issues.apache.org/jira/browse/TUSCANY-908)

Simon




Simon,

I've looked through your patch and I think that's a sensible approach. It
will obviously be a while (at least) before the specification has anything
definitive to say on this subject and what you are doing at least gives the
SDO user sight of the CDATA text. (Unlike the current situation where it is
silently discarded.)

Have you tried building your patch on Linux?

Regards,

Geoff.


Re: SDO - dealing with CDATA

2006-11-08 Thread Simon Laws

On 11/3/06, Simon Laws [EMAIL PROTECTED] wrote:




On 11/3/06, Frank Budinsky [EMAIL PROTECTED] wrote:

 Simon,

 The Sequence entry returns a special property:

 if (sequence.getProperty(i) == specialCDataProperty) {
   String cDataValue = sequence.getValue(i);
 }

 The problem is that this special property is currently inherited from
 the
 underlying EMF, so it's really not something we want clients to use.
 RIght
 now, this works fine in terms of reading in CDATA and not losing it when
 you reserialize, but there's really no proper way for an SDO client to
 actually access it. Without using EMF apis, the only way a client can
 decide that an entry is CDATA is by looking at the property name (and
 hope
 there's no real property with that name). Longer term, I'm not sure that

 even handling it this way is right. Maybe CDATA and other special XML
 things should look like mixed text in the sequence (property == null),
 and
 some XMLHelper method (or some Tuscany specific API for now) could be
 used
 to check if it's actually something special like CDATA. Maybe you should
 try to do something like that in the C++ impl, and if it looks
 promissing,
 we'll switch the Java impl to do the same.

 Frank

 Simon Laws [EMAIL PROTECTED] wrote on 11/03/2006 10:18:04
 AM:

  On 11/3/06, Frank Budinsky [EMAIL PROTECTED]  wrote:
  
   In the Tuscany Java implementation we expose CDATA as sequence
 entries
   (like mixed text) with a special CDATA property (we handle
 comments
 in a
   similar way). SDO doesn't define a special property for CDATA, so
 this
 is
   an implementation-specific feature. I'm not sure, long term, what
 should
   be the best (proper) way to do this.
  
   Frank.
  
   Simon Laws  [EMAIL PROTECTED] wrote on 11/03/2006
 09:41:25
 AM:
  
On 10/26/06, Simon Laws [EMAIL PROTECTED]  wrote:

 This is primarily a C++ question but I guess could apply to Java
 also.
   I'm
 trying to read a document into C++ SDO that contains a CDATA
 section.
   The
 corresponding CDATA doesn't make its way into the resulting SDO.
 I
 put
   the
 C++ SDO implementation in the debugger and found the reason why:


 sax2parser.cpp

 void sdo_cdataBlock(void *ctx, const xmlChar *value, int len)
 {
 }

 So the callback exists, gets called with the correct data during
 the
 parse, i.e. LibXML2 is doing the right thing, but the callback
 is
   ignored.
 Is there a good reason for this? I did a quick search of the C++
 and
   Java
 specs and they don't appear to discuss CDATA specifically. Can
 someone
 comment on whether the Java implementation handles CDATA
 successfully?

 Logically, from an SDO point of view, there is probably no need
 to
   treat
 CDATA specially as the SDO model dictates precisely the
 difference
   between
 data and structure. We may find that to make the XML DAS
 function
 work
   we
 have to know that a property potentially contains markup but I'd

 have
   to
 look closely at how the C++ SDO implementation of the XML DAS
 function
 streams out SDOs to XML when requested to do so.

 If CDATA hasn't been omitted for a good reason I'll come up with
 a
 proposal for C++ SDO.

 Regards

 Simon
   
   
   
I didn't get any response to this. Here are my further thoughts..
   
There are a number of options for representing CDATA in SDO, for
 example
   
1) Duplicate the CDATA string as is, including the ![CDATA[ and

 ]]
markers, to the appropriate property in the data object hiearchy
2) Duplicate the CDATA string excluding the ![CDATA[ and ]]
   markers
and instigate a special flag to indicate that CDATA is present.
   
CDATA is the specific concern of XML, i.e. the chracter entities
 that
   CDATA
protects an XML parser from are of no
concern to SDO because SDO is not intended to be tied directly to
 XML.
   So
given the example options above we
either expose the specifics of XML to the SDO core 2) or to the
 SDO
 user
   1).
   
Neither are particularly attractive.
   
1) appears to be the simplest approach to implement because it
 provides
   a
mechanism for the user to read, and
create CDATA without having to provide much special support in
 SDO.
 2)
   is
more involved particularly because
CDATA can appear mixed in with other text strings and so a
 sequence
 may
   need
to be used to represent properties
that have a mixture of text and CDATA marking those sequences
 entries
   that
are CDATA.
   
1) does require changes (at least in C++ SDO) because XML parsers
 tend
   to be
too helpful in this case for
processing CDATA. XML parsers, libxml2 in particular, recognize
 the
![CDATA[ and ]] sequence as a special
indicator and throw it away returning just the text it includes.
 We
   would
have to reintroduce it and store it in
the parameter value in question. The C++ SDO 

Re: SDO - dealing with CDATA

2006-11-03 Thread Frank Budinsky
In the Tuscany Java implementation we expose CDATA as sequence entries 
(like mixed text) with a special CDATA property (we handle comments in a 
similar way). SDO doesn't define a special property for CDATA, so this is 
an implementation-specific feature. I'm not sure, long term, what should 
be the best (proper) way to do this.

Frank.

Simon Laws [EMAIL PROTECTED] wrote on 11/03/2006 09:41:25 AM:

 On 10/26/06, Simon Laws [EMAIL PROTECTED] wrote:
 
  This is primarily a C++ question but I guess could apply to Java also. 
I'm
  trying to read a document into C++ SDO that contains a CDATA section. 
The
  corresponding CDATA doesn't make its way into the resulting SDO. I put 
the
  C++ SDO implementation in the debugger and found the reason why:
 
  sax2parser.cpp
 
  void sdo_cdataBlock(void *ctx, const xmlChar *value, int len)
  {
  }
 
  So the callback exists, gets called with the correct data during the
  parse, i.e. LibXML2 is doing the right thing, but the callback is 
ignored.
  Is there a good reason for this? I did a quick search of the C++ and 
Java
  specs and they don't appear to discuss CDATA specifically. Can someone
  comment on whether the Java implementation handles CDATA successfully?
 
  Logically, from an SDO point of view, there is probably no need to 
treat
  CDATA specially as the SDO model dictates precisely the difference 
between
  data and structure. We may find that to make the XML DAS function work 
we
  have to know that a property potentially contains markup but I'd have 
to
  look closely at how the C++ SDO implementation of the XML DAS function
  streams out SDOs to XML when requested to do so.
 
  If CDATA hasn't been omitted for a good reason I'll come up with a
  proposal for C++ SDO.
 
  Regards
 
  Simon
 
 
 
 I didn't get any response to this. Here are my further thoughts..
 
 There are a number of options for representing CDATA in SDO, for example
 
 1) Duplicate the CDATA string as is, including the ![CDATA[ and ]]
 markers, to the appropriate property in the data object hiearchy
 2) Duplicate the CDATA string excluding the ![CDATA[ and ]] 
markers
 and instigate a special flag to indicate that CDATA is present.
 
 CDATA is the specific concern of XML, i.e. the chracter entities that 
CDATA
 protects an XML parser from are of no
 concern to SDO because SDO is not intended to be tied directly to XML. 
So
 given the example options above we
 either expose the specifics of XML to the SDO core 2) or to the SDO user 
1).
 
 Neither are particularly attractive.
 
 1) appears to be the simplest approach to implement because it provides 
a
 mechanism for the user to read, and
 create CDATA without having to provide much special support in SDO.  2) 
is
 more involved particularly because
 CDATA can appear mixed in with other text strings and so a sequence may 
need
 to be used to represent properties
 that have a mixture of text and CDATA marking those sequences entries 
that
 are CDATA.
 
 1) does require changes (at least in C++ SDO) because XML parsers tend 
to be
 too helpful in this case for
 processing CDATA. XML parsers, libxml2 in particular, recognize the
 ![CDATA[ and ]] sequence as a special
 indicator and throw it away returning just the text it includes. We 
would
 have to reintroduce it and store it in
 the parameter value in question. The C++ SDO implementation uses a lot 
of
 XML string handling before the parameter
 value is actually stored which URL encodes parts of the CDATA markers so
 this would have to be fixed. When writing out the CDATA strings any 
string
 typed properties would have to be scanned for the markers so that the
 appropriate libxml2 functions can be called to get the CDATA sections in 
the
 right place.
 
 I have a test implementation of 1). If this is the way we want to go I 
would
 have to do more work to thread CDATA handling through the xml strings 
that
 are used to set parameters. Happy to do this but would like to discuss
 first.
 
 Thoughts (particularly on what Java SDO does with CDATA)?
 
 Simon


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: SDO - dealing with CDATA

2006-11-03 Thread Simon Laws

On 11/3/06, Frank Budinsky [EMAIL PROTECTED] wrote:


Simon,

The Sequence entry returns a special property:

if (sequence.getProperty(i) == specialCDataProperty) {
  String cDataValue = sequence.getValue(i);
}

The problem is that this special property is currently inherited from the
underlying EMF, so it's really not something we want clients to use. RIght
now, this works fine in terms of reading in CDATA and not losing it when
you reserialize, but there's really no proper way for an SDO client to
actually access it. Without using EMF apis, the only way a client can
decide that an entry is CDATA is by looking at the property name (and hope
there's no real property with that name). Longer term, I'm not sure that
even handling it this way is right. Maybe CDATA and other special XML
things should look like mixed text in the sequence (property == null), and
some XMLHelper method (or some Tuscany specific API for now) could be used
to check if it's actually something special like CDATA. Maybe you should
try to do something like that in the C++ impl, and if it looks promissing,
we'll switch the Java impl to do the same.

Frank

Simon Laws [EMAIL PROTECTED] wrote on 11/03/2006 10:18:04 AM:

 On 11/3/06, Frank Budinsky [EMAIL PROTECTED] wrote:
 
  In the Tuscany Java implementation we expose CDATA as sequence entries
  (like mixed text) with a special CDATA property (we handle comments
in a
  similar way). SDO doesn't define a special property for CDATA, so this
is
  an implementation-specific feature. I'm not sure, long term, what
should
  be the best (proper) way to do this.
 
  Frank.
 
  Simon Laws [EMAIL PROTECTED] wrote on 11/03/2006 09:41:25
AM:
 
   On 10/26/06, Simon Laws [EMAIL PROTECTED] wrote:
   
This is primarily a C++ question but I guess could apply to Java
also.
  I'm
trying to read a document into C++ SDO that contains a CDATA
section.
  The
corresponding CDATA doesn't make its way into the resulting SDO. I
put
  the
C++ SDO implementation in the debugger and found the reason why:
   
sax2parser.cpp
   
void sdo_cdataBlock(void *ctx, const xmlChar *value, int len)
{
}
   
So the callback exists, gets called with the correct data during
the
parse, i.e. LibXML2 is doing the right thing, but the callback is
  ignored.
Is there a good reason for this? I did a quick search of the C++
and
  Java
specs and they don't appear to discuss CDATA specifically. Can
someone
comment on whether the Java implementation handles CDATA
successfully?
   
Logically, from an SDO point of view, there is probably no need to
  treat
CDATA specially as the SDO model dictates precisely the difference
  between
data and structure. We may find that to make the XML DAS function
work
  we
have to know that a property potentially contains markup but I'd
have
  to
look closely at how the C++ SDO implementation of the XML DAS
function
streams out SDOs to XML when requested to do so.
   
If CDATA hasn't been omitted for a good reason I'll come up with a
proposal for C++ SDO.
   
Regards
   
Simon
  
  
  
   I didn't get any response to this. Here are my further thoughts..
  
   There are a number of options for representing CDATA in SDO, for
example
  
   1) Duplicate the CDATA string as is, including the ![CDATA[ and
]]
   markers, to the appropriate property in the data object hiearchy
   2) Duplicate the CDATA string excluding the ![CDATA[ and ]]
  markers
   and instigate a special flag to indicate that CDATA is present.
  
   CDATA is the specific concern of XML, i.e. the chracter entities
that
  CDATA
   protects an XML parser from are of no
   concern to SDO because SDO is not intended to be tied directly to
XML.
  So
   given the example options above we
   either expose the specifics of XML to the SDO core 2) or to the SDO
user
  1).
  
   Neither are particularly attractive.
  
   1) appears to be the simplest approach to implement because it
provides
  a
   mechanism for the user to read, and
   create CDATA without having to provide much special support in SDO.
2)
  is
   more involved particularly because
   CDATA can appear mixed in with other text strings and so a sequence
may
  need
   to be used to represent properties
   that have a mixture of text and CDATA marking those sequences
entries
  that
   are CDATA.
  
   1) does require changes (at least in C++ SDO) because XML parsers
tend
  to be
   too helpful in this case for
   processing CDATA. XML parsers, libxml2 in particular, recognize the
   ![CDATA[ and ]] sequence as a special
   indicator and throw it away returning just the text it includes. We
  would
   have to reintroduce it and store it in
   the parameter value in question. The C++ SDO implementation uses a
lot
  of
   XML string handling before the parameter
   value is actually stored which URL encodes parts of the CDATA
markers so
   this would have to be fixed. When writing out the CDATA strings any