Hi Jay,
The first thing you need to do is to check out the POI source tree and
to get used to building POI.
The Visio API is in the HDGF module, study the code and try to run Visio
text extractors and other utilities.
Unfortunately, there is no much information on the Visio file format.
Microsoft did not include it in the Open Specification Promise,
http://www.microsoft.com/interop/osp/default.mspx. Be prepared that most
of your work will involve reverse engineering.
A good summary of all publicly known information is
http://www.redferni.uklinux.net/visio/
There is a tool for reverse engineering the VSD/VSS files:
http://freshmeat.net/projects/vsdump/
See if OpenOffice can read Visio files. If yes, study how they do that.
All said above is about the binary VSD/VSS formats. I suspect that the
latest versions of MS Visio use XML-based formats, just like MS Office
2007.
Check if it is so. If yes, unzip a Visio file and study the XML in your
favorite editor. It should be self explained.
When you figure out how Visio stores shapes and connectors, try to wrap
low-level code into a usermodel API. I guess there will objects like
Shape and Connector but the design decision is up to you. You are always
welcome to ask questions in the mailing lists.
General instructions how to prepare / submit patches can be found at
http://poi.apache.org/getinvolved/index.html
Regards,
Yegor
Hi All,
Jay Macarty is writing software to assist blind persons and hoping to
use POI HDGF to translate Visio diagrams into a GUI program.
We exchanged some emails in private and he showed interest in
participating in the POI project, in particular, in the Visio part of it.
Below is a part of our communication. In the next email I will give
initial instructions where to start.
Yegor
-----Original Message-----
From: Yegor Kozlov [mailto:[email protected]]
Sent: Sunday, February 22, 2009 12:57 PM
To: Macarty, Jay {PBSG}
Subject: Re: Using POI to assist the blind
Hi Jay,
Yegor,
I am writing software to assist blind persons in using their computers
more effectively. One of the major challenges a blind person faces in
working in a professional environment is being able to read visio
diagrams. I am hoping to use the POI project to create software which
could read a Visio file and translate it into something a blind person
could read more easily. What I have in mind is a screen with 2 panes.
On the left is a JTree structure which represents objects and their
connections. For example, at level 0 might be an entry for an circle
with the level 1 entries under that being the arrows coming from that
circle object going to other shapes. The right hand pane would be a
text area which would contain the text of a selected object if text
were available. For example, if the level 0 circle object from the
prior example were selected, the right pane would contain any text
inside that circle.
I have a couple of questions regarding this project:
1. Do you think such an interpreter could be written using the POI API?
2. Is any such work already in progress that you are aware of that I
could become a part of? If someone has already started writing Visio
translation software for blind users, I'd like to become part of that.
3. Does the approach I have outlined seem to make sense to you?
Thanks for your feedback on this topic. I look forward to your
responses.
Thanks for the interest in the POI project. We have a prototype of an
API to access Visio format files, see
http://poi.apache.org/hdgf/index.html.
However, this module is very young and its capabilities are limited.
Currently, it can do the following:
- parse the pointers and streams and create a Java representation of
the main building blocks of Visio files
- provide a way to extract the textual content.
What is not supported:
- creation of new visio files
- modification of existing files
- usermodel API. There are no objects like "Shape" and "Connector", we
still have to figure out how to get them out of the low-level atoms.
Although the Visio format is based on OLE2, it is the only common
thing with XLS and other MS Office formats. The format structure and
the main building blocks of Visio files are completely different from
anything developed by Microsoft. As you probably know, Microsoft
acquired Visio Corporation in 2000. Prior to that time they developed
their own proprietary format and it in no way intersects with MS
Office. This makes it very difficult for us to re-use existing POI
modules to work with Visio files.
What is worse, the Visio format is still closed (XLS, PPT and DOC
formats were opened in July 2008). So, all the work is based on
reverse engineering.
So, there are many things to do order to use HDGF in applications like
yours. In its current state you are unlikely to derive any benefit
from it.
Still wanting to participate? Welcome aboard! :)
P.S. If you do not mind, I would like to continue further discussion
in poi-dev. This way other people can help.
Regards,
Yegor
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]