as last resort,
if you don't mind the structure too much, you can always extract the DOCX file - it's a ZIP file.
and retrieve your particular item using lxml, beautifulsoup, plain html2text or even regular expressions.



On 05/05/2013 13:26, Amit Aronovitch wrote:

On Sun, May 5, 2013 at 12:24 PM, גל וין gal vine <[email protected]> wrote:

Hey all,

I'm trying to use the docx package to parse docx files, and use Hebrew in it, not very successfully.

Also I need to be using templates and\or bookmarks on those documents, -

But the package is of low documentation, and I need some assistance.

Nope - never done that, just general comments:
 The project seems active on github. If something breaks while using Hebrew, but works otherwise you can try opening an issue
 https://github.com/mikemaccana/python-docx/issues
   (also let us know - there's a local forum for promoting Hebrew/bidi related issues in free software)
 

Any of you had the pleasure of using that,

Or working (successfully) with win32com.client to parse word documents?

Haven't done that either (I did use win32com to communicate with Outlook - but that was long time ago).
Another idea which you might try, in case the python ODF tools are more mature:
 Convert to ODF (using the libreoffice/openoffice docx filter, which is quite good), then try ezodf/odfpy...

Just my 2c,

    AA

_______________________________________________
Python-il mailing list
[email protected]
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

לענות