Re: Vote for XMLBeans proposal

2003-07-06 Thread Aleksander Slominski
Berin Lautenbach wrote:

Ted Leung wrote:

Nicola Ken Barozzi wrote:

If XML.Apache is willing, as it seems, to cater for this project, 
I'll wait for a vote from them, an ACK from the Bea guys, and start 
preparing the hatcher :-)

I'm happy to invest some time in helping XMLBean get throught the 
incubator -- speaking with my XML PMC and ASF member hat on.


The idea of moving XMLBeans to incubation under the XML project and 
with the assistance of Ted gets a +1 from me with some caveats :

1. Current XMLBeans committers need to be comfortable with this 
resting with the XML project in the first instance. Note that I would 
hope that the umbrella project could be changed prior to exit from 
incubation if the feeling from the committers was that it should be.

If the initial preference is Jakarta then please indicate! I'm 
definitely not trying to push a line here, and it's easy to switch the 
vote over to the Jakarta PMC :>.

2. Committer issues that have previously been discussed will need to 
be worked through during incubation (although that's really what 
incubation is about :>).

+/- from other XML PMC members welcome.

Further discussion also welcome!
hi,

based on what I have seen when looking on XMLBeans source code and 
(limited) design documentation I beleive that this project is very 
interesting and useful for Web Services so I have CCed [EMAIL PROTECTED] 
mailing list too.

It seems that BEA folks are willing to solve all remaining problems and 
I think that it would be good if this project quickly gained more 
mementum and I am willing to help with it (even though I am not Apache 
XML commiter)

thanks,

alek

--
If everything seems under control, you're just not going fast enough. —Mario Andretti


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: XMLBeans performance and source code status [Re: Proposal: XMLBeans]

2003-07-06 Thread Aleksander Slominski
David Bau wrote:

>Adding a few links and other info -
>
>Eric Vasilik writes:
>
>>The synchronization described refers to the fact
>>that one may manipulate the XML via the XmlCursor
>>or the strongly typed XMLBean classes generated from
>>the schema
>
>
>As Eric says, we don't want to confuse the two uses of
>the word "synchronize".  But since Aleksander brought it
>up - here's some information on thread-synchronization
>too.
>
>We examined both with- and without-thread-synchronized
>access, and found that without-thread-sync, programmers
>fall into traps like working with XML config files on
>multiple threads in thread-unsafe ways without without
>being aware of it.  We found that it costs between 1%
>(strongly-typed access) and 10% (XmlCursor access) to
>synchronize. So we're currently synchronizing access
>to the data now, paying for more [app] stability with a
>little bit of perf. We'd like to provide the option to
>single-threaded (or savvy) users of not synchronizing
>to get the 1-10% back. That's future work.
>
hi,
did you consider "fail quickly" approach that is used in Java 
collections (so for example Iterator can detect if it is used from more 
than one thread and fails if it happens)? the other possibility would be 
to allow making some objects  (such as configuration) immutable so can 
be safely shared between multiple threads.

>As Eric pointed out, the key I think is not in what our
>current numbers are, but the fact that we've isolated
>our implementation from our interface so that we have
>the flexibility of reducing allocations, deferring work,
>and otherwise improving performance further in the future.
>Abstracting the primary store behind a cursor rather
>than a tree of objects with identity gives us some leeway
>in shuffling our implementation strategy in the future
>without restructing the APIs.
>
that sounds like very good strategy! however i winder what is really 
current state. when  i looked on source code and i could not see how 
layering could work (or it working already?): what parts are API a 
interfaces and how implementation is separated and can be switched - is 
this possible in current version to chose different implementation (by 
using for example factory pattern)? i can see it working for com.bea.xml 
(XSD types) and com.bea.xbean.values (implementation) - this is very 
valuable set of Java classes providing XSD validation (even more if they 
were more abstract so could be used with any XML databinding).

thanks,

alek

--
If everything seems under control, you're just not going fast enough. 
—Mario Andretti



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: XMLBeans performance and source code status [Re: Proposal: XMLBeans]

2003-07-06 Thread David Bau
Adding a few links and other info -

Aleksander Slominski wrote:
> http://dev2dev.bea.com/articles/hitesh_seth.jsp that is
> good overview but has not enough technical details and
> other docs): as far as i can understand actual objects

Above you've linked to an XML Journal review reprint.

Here is a page the points to other information:

http://dev2dev.bea.com/technologies/xmlbeans/index.jsp

One of the links is a very brief summary of some
brutally transparent and upfront performance and
test compliance numbers:

http://workshop.bea.com/xmlbeans/schemaandperf.jsp

BTW, despite the fact that we posted the numbers on
pretty marketing pages on bea.com, the numbers above
are not marketing-varnished numbers - they are the
actual measurements that we developers track day-to-day.
Those are numbers we measure to help us focus on
use-cases that we're working on making faster.

The XML cursor access _without_ strong-type conversion is
between 10% and 58% faster than Xerces2 DOM access, going
to about 35% for large (1Mb) XML documents.  Xerces2, btw,
is extremely speedy, so we're proud to be on par with it
in any scenario!

Adding strong-type conversion (for example parsing xs:int
to java int and dates to Calendars) adds enough cost that
reading the data out of a document is between 0% and 48%
slower than reading out using (untyped) Xerces2 DOM.

Apples-to-apples, we measure ourselves significantly
faster than JAXB RI and Castor (140% to 282% and 66% to
800%). Please don't sue me - those are our real numbers,
but if performance is important to your application,
you should measure it for yourself.

We do fault-in object allocations when demanded, and
you can see in our memory test that when we fault-in
all the objects for a whole document, we take up more
memory than Xerces2 DOM.  One current project is to take
steps to reduce that number.  When we use XmlCursor and
don't fault-in all the objects, the memory number you
will find to be much slimmer. (I don't have a measurement
because our measurements focus on problem areas we're
actually working on.)

Eric Vasilik writes:
> The synchronization described refers to the fact
> that one may manipulate the XML via the XmlCursor
> or the strongly typed XMLBean classes generated from
> the schema

As Eric says, we don't want to confuse the two uses of
the word "synchronize".  But since Aleksander brought it
up - here's some information on thread-synchronization
too.

We examined both with- and without-thread-synchronized
access, and found that without-thread-sync, programmers
fall into traps like working with XML config files on
multiple threads in thread-unsafe ways without without
being aware of it.  We found that it costs between 1%
(strongly-typed access) and 10% (XmlCursor access) to
synchronize. So we're currently synchronizing access
to the data now, paying for more [app] stability with a
little bit of perf. We'd like to provide the option to
single-threaded (or savvy) users of not synchronizing
to get the 1-10% back. That's future work.

As Eric pointed out, the key I think is not in what our
current numbers are, but the fact that we've isolated
our implementation from our interface so that we have
the flexibility of reducing allocations, deferring work,
and otherwise improving performance further in the future.
Abstracting the primary store behind a cursor rather
than a tree of objects with identity gives us some leeway
in shuffling our implementation strategy in the future
without restructing the APIs.

David Bau

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Vote for XMLBeans proposal

2003-07-06 Thread Berin Lautenbach
Ted Leung wrote:
Nicola Ken Barozzi wrote:

If XML.Apache is willing, as it seems, to cater for this project, I'll 
wait for a vote from them, an ACK from the Bea guys, and start 
preparing the hatcher :-)

I'm happy to invest some time in helping XMLBean get throught the 
incubator -- speaking with my XML PMC and ASF member hat on.


The idea of moving XMLBeans to incubation under the XML project and with 
the assistance of Ted gets a +1 from me with some caveats :

1.  Current XMLBeans committers need to be comfortable with this resting 
with the XML project in the first instance.  Note that I would hope that 
the umbrella project could be changed prior to exit from incubation if 
the feeling from the committers was that it should be.

If the initial preference is Jakarta then please indicate!  I'm 
definitely not trying to push a line here, and it's easy to switch the 
vote over to the Jakarta PMC :>.

2.  Committer issues that have previously been discussed will need to be 
worked through during incubation (although that's really what incubation 
is about :>).

+/- from other XML PMC members welcome.

Further discussion also welcome!

Cheers,
Berin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: XMLBeans performance and source code status [Re: Proposal: XMLBeans]

2003-07-06 Thread Ted Leung
Eric,

What's the relationship between XmlCursor and the JSR-173 Streaming API 
for XML?

Ted

Eric Vasilik wrote:

When working with XMLBeans in a strongly typed way (with a Schema), individual objects are created for each piece of information, usually instances of simple and complex Schema types.  However, you can also access and manipulate the XML in a typeless manor.  What we've done with XMLBeans is provided access to the full XML Infoset via the XmlCursor interface.

XmlCursor provides functionality very similar to the DOM, but takes a very different tact.  Instead of creating an DOM Node for each element, attribute, text, etc, one may create a single XmlCursor and navigate that cursor about the XML instance, interrogating the XML: element/attr names, child/parent elements, text, comments, etc.  Also, one may modify the XML by removing elements and attrs, inserting text, for example.  All of this can be done by either not creating objects or reusing objects so that the number of objects needed to operate on the XML is constant, not on the order of the size of the XML like a DOM would require.

The kind of interface allows an implementer of an in memory XML store more freedom to implement the internal structure which represents the XML in memory.  One, for example, could simply store the XML as it was, for example, read in from disk and implement a cursor as an index into that string, parsing or modifying the parts of the string as necessary to satisfy the requests.  We don't go to quite this extreme.  In principle, we create one object for every leaf element or attribute and two objects for every interior element.  All text for attribute values, comments, procinst's and text between element markup is stored in a single character array.

We have found that creating fewer objects and batching text leads to loading the XML into memory faster as well as having a similar, if not slightly smaller, memory footprint when compared to the DOM.  Also, working with cursors seems to be an easier programming model than the DOM as it does not have text nodes and is more intuitive.

With respect to the synchronized access, the strongly typed schema XMLBeans objects cache values so that conversion to text does not occur until it is needed.  Likewise, when modifications are made to the XML Infoset, the strongly typed data (ints, for example) are not parsed from the text until requested.  In general the impact of synchronization is quite low because of the lazy approach we have taken along with the caching.  As I read your question again, I realize that you may have interpreted synchronized to mean "managing data among several threads".  The synchronization described refers to the fact that one may manipulate the XML via the XmlCursor or the strongly typed XMLBean classes generated from the schema, each mechanism capable of seeing the changes from the other in a tightly integrated way.

With respect to building XMLBeans, we plan to remove any dependency upon the jars you mentioned.  Indeed, there exists very little dependence on these.  Mostly just interfaces, not any classes needed for the implementation.

- Eric Vasilik

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Issues with XMLBeans proposal

2003-07-06 Thread Ted Leung
Nicola Ken Barozzi wrote:

Greg Stein wrote, On 04/07/2003 1.24:

On Thu, Jul 03, 2003 at 04:22:10PM -0400, Andrew C. Oliver wrote:

To that extent, I'd say it is an XML project.


There is another more simple rule. Who has shown that they want the 
project most? Apache.XML. Then let them have it.

However, I think it is mostly
up to the XMLBeans community to ask for one or the other. If that PMC 
says
"okay", then everything is fine. (and no... PMCs are not allowed to 
meet at
sundown to duel for an arriving project :-)


No? ;-P

If XML.Apache is willing, as it seems, to cater for this project, I'll 
wait for a vote from them, an ACK from the Bea guys, and start 
preparing the hatcher :-)

I'm happy to invest some time in helping XMLBean get throught the 
incubator -- speaking with my XML PMC and ASF member hat on.

Ted

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]