Thanks Nick!

On Jul 26, 2013, at 11:46 AM, Nick Burch <apa...@gagravarr.org> wrote:

> On Fri, 26 Jul 2013, Mike Hugo wrote:
>> I'm looking into basic support (text extraction) for MS OneNote.  I found
>> this bug https://issues.apache.org/bugzilla/show_bug.cgi?id=50750 that has
>> some sample files attached.  Does anyone have any pointers as to where I
>> should get started?
>
> Use POIFSLister to work out if they have a single POIFS/OLE2 stream or 
> multiple. If loads, assume it's like Outlook (HSMF), use POIFSDump to look at 
> the parts. If one, use POIFSViewer and docs and try to work out if it's 
> streams of records (eg HSSF), nested records (HSLF, DDF), or streams (HWPF).
>
> Once you know that, try to do something to do a basic processing of the file 
> structure. Then add some .dev. tools to print the structure (look at visio, 
> outlook etc for an idea of how we've done that). Use your own dev tool to 
> play with the structure more. Finally, flesh out the implementation to cover 
> all the key bits, and write lots of unit tests!
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
> For additional commands, e-mail: dev-h...@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@poi.apache.org
For additional commands, e-mail: dev-h...@poi.apache.org

Reply via email to