[ 
https://issues.apache.org/jira/browse/COR-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis E. Hamilton updated COR-18:
----------------------------------
    Description: 
MiniZip is a bit thin and, because of some changes needed, it might be better 
to replace it in the DocFormats/3rdparty/external/ folder, as @peterkelly 
observes at COR-26 (comment)

EASY STEPS

For now, it might be desirable to simply replace the current code with MiniZip 
1.1 from http://www.winimage.com/zLibDll/minizip.html
Since it is a simple dependency, this should work fine so long as there are no 
breaking API changes in between 1.0h and 1.1.

EVENTUALLY?

It would be good to have something behind a stable API that permits random 
access for reading file streams as Peter suggests. Ideally, that API would be 
aligned around the Document Container File (DCF) profile of the official PKWare 
specification that is used commonly among ePub, ODF, and the Open Packaging 
Conventions (OPC) used in OOXML and elsewhere. I don't know what the latest 
status of that profile is at ISO/IEC JTC1 SC34, but it will become a common 
international specification for these specialized usage of Zip as a compound 
document-format container file.

There are other places to look for ideas and possible sources of reusable code 
and API considerations, including in Apache OpenOffice, the Apache ODF Toolkit 
(using Java). , and the Microsoft open-sourcing of its OOXML-access layer (in 
.NET I think). And the Microsoft platform has some native support that it might 
be useful to be able to rely on in Windows-targeted builds.

There is also a CodePlex LibOPC project that is C code under a BSD-form license 
at https://libopc.codeplex.com/ One interesting feature of LibOPC that may 
interest Apache OpenOffice folk (i.e., @janiversen) is a python script for 
generating Visual Studio projects that can be used for manipulating and 
building on Windows.

One caveat. For ingesting Zip-based document files, there needs to be a fair 
amount of code to ensure resiliency and defense against DOS-ing of applications 
with malformed document files. That may have to be grown, with attention to the 
code footprint on limited-capacity devices (where presumably some of the 
heavy-lifting is off-loaded to the cloud). It is an interesting feature of the 
OPC specification is that it is also designed to support remoting of the 
document streams in a way where there is no requirement that a Zip file be 
transferred to the client. That may be very much eventually, but it is useful 
to think about having an API that would allow for that underneath.  [Ed.Note: 
COR-31 is related to this.]

LEST WE FORGET?

Although this is all .NET-fu, there may be useful ideas on this project,
https://github.com/OfficeDev/Open-Xml-Sdk
as a source of ideas (and some of the system-level dependencies may have Native 
Windows counterparts as well). This might be useful for mining for other ideas 
higher up in the API modeling too.

---


I didn't think to mention POI and whatever they use as a model close to the Zip 
packages.

I didn't realize until looking at the proposal to become an Apache incubator 
project that the sources for minizip and tidy-html5 are not pristine. It would 
be good to reconstruct the modification process and leave more footprints if 
the changes are not in the repository here. (Actually, it would be good to 
reconstruct the modification anyhow, but diffs from git would be helpful.)

I'm thinking that there is no hurry to replace these in early stages. If a 
better API is desired, the first step of getting that in place would be to 
build a shim that goes from that API to anything hand at first, such as minizip 
or some other library, and worry about fit and performance later.

jan: 

POI is in java, so they have other packages available.

I am currently working on expanding the platform part to also include zip and 
html, so that we can change the libraries at a later stage. I think your idea 
of using libOPC is valid and interesting...you, peter and svante knows better 
if it fits to the project.


  was:


MiniZip is a bit thin and, because of some changes needed, it might be better 
to replace it in the DocFormats/3rdparty/external/ folder, as @peterkelly 
observes at #26 (comment)
Easy Steps

For now, it might be desirable to simply replace the current code with MiniZip 
1.1 from http://www.winimage.com/zLibDll/minizip.html
Since it is a simple dependency, this should work fine so long as there are no 
breaking API changes in between 1.0h and 1.1.
Eventually?

It would be good to have something behind a stable API that permits random 
access for reading file streams as Peter suggests. Ideally, that API would be 
aligned around the Document Container File (DCF) profile of the official PKWare 
specification that is used commonly among ePub, ODF, and the Open Packaging 
Conventions (OPC) used in OOXML and elsewhere. I don't know what the latest 
status of that profile is at ISO/IEC JTC1 SC34, but it will become a common 
international specification for these specialized usage of Zip as a compound 
document-format container file.

There are other places to look for ideas and possible sources of reusable code 
and API considerations, including in Apache OpenOffice, the Apache ODF Toolkit 
(using Java). , and the Microsoft open-sourcing of its OOXML-access layer (in 
.NET I think). And the Microsoft platform has some native support that it might 
be useful to be able to rely on in Windows-targeted builds.

There is also a CodePlex LibOPC project that is C code under a BSD-form license 
at https://libopc.codeplex.com/ One interesting feature of LibOPC that may 
interest Apache OpenOffice folk (i.e., @janiversen) is a python script for 
generating Visual Studio projects that can be used for manipulating and 
building on Windows.

One caveat. For ingesting Zip-based document files, there needs to be a fair 
amount of code to ensure resiliency and defense against DOS-ing of applications 
with malformed document files. That may have to be grown, with attention to the 
code footprint on limited-capacity devices (where presumably some of the 
heavy-lifting is off-loaded to the cloud). It is an interesting feature of the 
OPC specification is that it is also designed to support remoting of the 
document streams in a way where there is no requirement that a Zip file be 
transferred to the client. That may be very much eventually, but it is useful 
to think about having an API that would allow for that underneath.
Lest we forget?

Although this is all .NET-fu, there may be useful ideas on this project,
https://github.com/OfficeDev/Open-Xml-Sdk
as a source of ideas (and some of the system-level dependencies may have Native 
Windows counterparts as well). This might be useful for mining for other ideas 
higher up in the API modeling too.

---


I didn't think to mention POI and whatever they use as a model close to the Zip 
packages.

I didn't realize until looking at the proposal to become an Apache incubator 
project that the sources for minizip and tidy-html5 are not pristine. It would 
be good to reconstruct the modification process and leave more footprints if 
the changes are not in the repository here. (Actually, it would be good to 
reconstruct the modification anyhow, but diffs from git would be helpful.)

I'm thinking that there is no hurry to replace these in early stages. If a 
better API is desired, the first step of getting that in place would be to 
build a shim that goes from that API to anything hand at first, such as minizip 
or some other library, and worry about fit and performance later.

jan: 

POI is in java, so they have other packages available.

I am currently working on expanding the platform part to also include zip and 
html, so that we can change the libraries at a later stage. I think your idea 
of using libOPC is valid and interesting...you, peter and svante knows better 
if it fits to the project.



> Replacing MiniZip
> -----------------
>
>                 Key: COR-18
>                 URL: https://issues.apache.org/jira/browse/COR-18
>             Project: Corinthia
>          Issue Type: Bug
>          Components: DocFormats - platform
>         Environment: source
>            Reporter: jan iversen
>            Assignee: jan iversen
>            Priority: Blocker
>             Fix For: 0.5
>
>
> MiniZip is a bit thin and, because of some changes needed, it might be better 
> to replace it in the DocFormats/3rdparty/external/ folder, as @peterkelly 
> observes at COR-26 (comment)
> EASY STEPS
> For now, it might be desirable to simply replace the current code with 
> MiniZip 1.1 from http://www.winimage.com/zLibDll/minizip.html
> Since it is a simple dependency, this should work fine so long as there are 
> no breaking API changes in between 1.0h and 1.1.
> EVENTUALLY?
> It would be good to have something behind a stable API that permits random 
> access for reading file streams as Peter suggests. Ideally, that API would be 
> aligned around the Document Container File (DCF) profile of the official 
> PKWare specification that is used commonly among ePub, ODF, and the Open 
> Packaging Conventions (OPC) used in OOXML and elsewhere. I don't know what 
> the latest status of that profile is at ISO/IEC JTC1 SC34, but it will become 
> a common international specification for these specialized usage of Zip as a 
> compound document-format container file.
> There are other places to look for ideas and possible sources of reusable 
> code and API considerations, including in Apache OpenOffice, the Apache ODF 
> Toolkit (using Java). , and the Microsoft open-sourcing of its OOXML-access 
> layer (in .NET I think). And the Microsoft platform has some native support 
> that it might be useful to be able to rely on in Windows-targeted builds.
> There is also a CodePlex LibOPC project that is C code under a BSD-form 
> license at https://libopc.codeplex.com/ One interesting feature of LibOPC 
> that may interest Apache OpenOffice folk (i.e., @janiversen) is a python 
> script for generating Visual Studio projects that can be used for 
> manipulating and building on Windows.
> One caveat. For ingesting Zip-based document files, there needs to be a fair 
> amount of code to ensure resiliency and defense against DOS-ing of 
> applications with malformed document files. That may have to be grown, with 
> attention to the code footprint on limited-capacity devices (where presumably 
> some of the heavy-lifting is off-loaded to the cloud). It is an interesting 
> feature of the OPC specification is that it is also designed to support 
> remoting of the document streams in a way where there is no requirement that 
> a Zip file be transferred to the client. That may be very much eventually, 
> but it is useful to think about having an API that would allow for that 
> underneath.  [Ed.Note: COR-31 is related to this.]
> LEST WE FORGET?
> Although this is all .NET-fu, there may be useful ideas on this project,
> https://github.com/OfficeDev/Open-Xml-Sdk
> as a source of ideas (and some of the system-level dependencies may have 
> Native Windows counterparts as well). This might be useful for mining for 
> other ideas higher up in the API modeling too.
> ---
> I didn't think to mention POI and whatever they use as a model close to the 
> Zip packages.
> I didn't realize until looking at the proposal to become an Apache incubator 
> project that the sources for minizip and tidy-html5 are not pristine. It 
> would be good to reconstruct the modification process and leave more 
> footprints if the changes are not in the repository here. (Actually, it would 
> be good to reconstruct the modification anyhow, but diffs from git would be 
> helpful.)
> I'm thinking that there is no hurry to replace these in early stages. If a 
> better API is desired, the first step of getting that in place would be to 
> build a shim that goes from that API to anything hand at first, such as 
> minizip or some other library, and worry about fit and performance later.
> jan: 
> POI is in java, so they have other packages available.
> I am currently working on expanding the platform part to also include zip and 
> html, so that we can change the libraries at a later stage. I think your idea 
> of using libOPC is valid and interesting...you, peter and svante knows better 
> if it fits to the project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to