I want to apologize in advance on this Stream of Consciousness post. I hope it makes sense to someone.
At work I have been using the SS side of POI, and have become fairly comfortable with it. I realize that there are some things still that need to be done, and some issues with XML Beans that have been discussed, but it seems fairly well organized. Recently I have also been working with the WP side as well, and it is obviously still a work in progress. Likely there are fewer developers contributing there. But as I sat here considering the best way to get the things done that I need, I thought about the need to have a common POI architecture between the pieces of the project. This may exist, I just haven't found it yet. I have found that XWPF does not yet have a clear separation between the model and the usermodel. For example, to build headers and footers, the user must drip into the model to get a key object that has not yet been exposed in the usermodel. And, significant parts still require use of CT and ST classes. This is likely due to the early level of development of the WP portion of POI, but I feel that this is a great place to start if we intend to replace XML Beans. I would like to propose a change to the POI architecture with respect to SS, as it already has a well-defined architecture. This change would allow us to more easily move away from XML Beans, and potentially reduce memory consumption in the XML format space. It seems to me that one of the reasons we use XML Beans is that it allows us to update XML documents in place. Unfortunately, XML is a highly inefficient format, and maybe it would be better, with respect to memory use, to model documents internally in a more efficient format, and at save time convert the document to its binary or XML format as necessary. In this case, the model would be the internal representation of the document, and the usermodel would be the API we expose to users of the library. In this manner we could have a single model and user model for each document type: spreadsheet, word processor, diagram, etc. Then on write we would convert to the binary or XML format as requested. In addition to the potential memory savings, this would enable a few things: We could more easily support additional formats (such as .ods and .csv) because we would not have to manipulate those formats internally. We could move XML Beans or its replacement to the periphery making it easier to swap out that piece. We would not run into issues such as the one we currently have with the swapRows() method in XSSF where the file data is hard to sort because of the tight coupling with XML Beans. The WP side is a perfect place to try this out since it does not really have a well-defined separation between model and usermodel. If I go on any more, this thought will totally fall apart, so I will leave this open for discussion, and I hope that no one feels that I am stepping on toes. That is not my intention. Mark Murphy