On Wed, 1 Jun 2016, Murphy, Mark wrote:
At work I have been using the SS side of POI, and have become fairly comfortable with it. I realize that there are some things still that need to be done, and some issues with XML Beans that have been discussed, but it seems fairly well organized. Recently I have also been working with the WP side as well, and it is obviously still a work in progress.

There's not a lot of link between HWPF and XWPF. I tried to put one in, but the formats have a surprising number of differences in concepts and approaches, more-so than HSSF/XSSF. Coupled with less XWPF contributions, and HWPF needing lots of love after the loss of the main developer, and that's how we end up in the situation today...

I have found that XWPF does not yet have a clear separation between the model and the usermodel.

For anything done by POI committers, it should do. However, we've taken a lot of community contributions, and many of those steer more towards "get it done" than "build a full solution perfectly". That's why you see a lot of "leakages" of the low-level XML stuff. It'd be great to wrap all of that stuff up! And required for dropping xmlbeans - we need to get everyone off the CT classes if we want to be able to replace them

I would like to propose a change to the POI architecture with respect to SS, as it already has a well-defined architecture. This change would allow us to more easily move away from XML Beans, and potentially reduce memory consumption in the XML format space. It seems to me that one of the reasons we use XML Beans is that it allows us to update XML documents in place.

On the whole, you can buy/beg/rent more memory, or faster machines. The resource we really lack in POI is contributors writing code or documentation or tests. xmlbeans makes development of the X??F stuff quicker, and that's what we tend to optimise for!

Unfortunately, XML is a highly inefficient format, and maybe it would be better, with respect to memory use, to model documents internally in a more efficient format, and at save time convert the document to its binary or XML format as necessary.

The binary and XML formats have more differences than you'd ideally expect or like, which in part is why we don't have more shared stuff between them. Not saying that this plan wouldn't work, just that it might not be as clean as you'd like especially for more fiddly stuff like formatting, colours or the like

The WP side is a perfect place to try this out since it does not really have a well-defined separation between model and usermodel. If I go on any more, this thought will totally fall apart, so I will leave this open for discussion, and I hope that no one feels that I am stepping on toes. That is not my intention.

As long as it doesn't make new contributions to POI harder or slower (we need more contributions!), and as long as you want to do the work, create a branch and start experimenting! :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to