Hi Jay,

The first thing you need to do is to check out the POI source tree and to get used to building POI. The Visio API is in the HDGF module, study the code and try to run Visio text extractors and other utilities.

Unfortunately, there is no much information on the Visio file format. Microsoft did not include it in the Open Specification Promise, http://www.microsoft.com/interop/osp/default.mspx. Be prepared that most of your work will involve reverse engineering.

A good summary of all publicly known information is http://www.redferni.uklinux.net/visio/

There is a tool for reverse engineering the VSD/VSS files:
http://freshmeat.net/projects/vsdump/

See if OpenOffice can read Visio files. If yes, study how they do that.

All said above is about the binary VSD/VSS formats. I suspect that the latest versions of MS Visio use XML-based formats, just like MS Office 2007. Check if it is so. If yes, unzip a Visio file and study the XML in your favorite editor. It should be self explained.

When you figure out how Visio stores shapes and connectors, try to wrap low-level code into a usermodel API. I guess there will objects like Shape and Connector but the design decision is up to you. You are always welcome to ask questions in the mailing lists.

General instructions how to prepare / submit patches can be found at http://poi.apache.org/getinvolved/index.html

Regards,
Yegor

Hi All,

Jay Macarty is writing software to assist blind persons and hoping to use POI HDGF to translate Visio diagrams into a GUI program. We exchanged some emails in private and he showed interest in participating in the POI project, in particular, in the Visio part of it. Below is a part of our communication. In the next email I will give initial instructions where to start.

Yegor


-----Original Message-----
From: Yegor Kozlov [mailto:[email protected]]
Sent: Sunday, February 22, 2009 12:57 PM
To: Macarty, Jay {PBSG}
Subject: Re: Using POI to assist the blind

Hi Jay,

Yegor,

I am writing software to assist blind persons in using their computers
more effectively. One of the major challenges a blind person faces in
working in a professional environment is being able to read visio
diagrams. I am hoping to use the POI project to create software which
could read a Visio file and translate it into something a blind person
could read more easily. What I have in mind is a screen with 2 panes.
On the left is a JTree structure which represents objects and their
connections. For example, at level 0 might be an entry for an circle
with the level 1 entries under that being the arrows coming from that
circle object going to other shapes. The right hand pane would be a
text area which would contain the text of a selected object if text
were available. For example, if the level 0 circle object from the
prior example were selected, the right pane would contain any text
inside that circle.

I have a couple of questions regarding this project:

1. Do you think such an interpreter could be written using the POI API?

2. Is any such work already in progress that you are aware of that I
could become a part of? If someone has already started writing Visio
translation software for blind users, I'd like to become part of that.

3. Does the approach I have outlined seem to make sense to you?

Thanks for your feedback on this topic. I look forward to your responses.

Thanks for the interest in the POI project. We have a prototype of an API to access Visio format files, see http://poi.apache.org/hdgf/index.html.
However, this module is very young and its capabilities are limited.
Currently, it can do the following:
- parse the pointers and streams and create a Java representation of the main building blocks of Visio files
- provide a way to extract the textual content.

What is not supported:
- creation of new visio files
- modification of existing files
- usermodel API. There are no objects like "Shape" and "Connector", we still have to figure out how to get them out of the low-level atoms.

Although the Visio format is based on OLE2, it is the only common thing with XLS and other MS Office formats. The format structure and the main building blocks of Visio files are completely different from anything developed by Microsoft. As you probably know, Microsoft acquired Visio Corporation in 2000. Prior to that time they developed their own proprietary format and it in no way intersects with MS Office. This makes it very difficult for us to re-use existing POI modules to work with Visio files.

What is worse, the Visio format is still closed (XLS, PPT and DOC formats were opened in July 2008). So, all the work is based on reverse engineering.

So, there are many things to do order to use HDGF in applications like yours. In its current state you are unlikely to derive any benefit from it.

Still wanting to participate? Welcome aboard! :)

P.S. If you do not mind, I would like to continue further discussion in poi-dev. This way other people can help.

Regards,
Yegor



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to