|
as last resort,
if you don't mind the structure too much, you can always extract
the DOCX file - it's a ZIP file.
and retrieve your particular item using lxml, beautifulsoup, plain
html2text or even regular expressions.
On 05/05/2013 13:26, Amit Aronovitch wrote:
On Sun, May 5, 2013 at 12:24 PM, גל וין
gal vine <[email protected]>
wrote:
Hey all,
I'm trying to use the docx package
to parse docx files, and use Hebrew in it, not very
successfully.
Also I need to be using templates
and\or bookmarks on those documents, -
But the package is of low
documentation, and I need some assistance.
Nope - never done that, just general comments:
The project seems active on github. If something breaks
while using Hebrew, but works otherwise you can try opening
an issue
https://github.com/mikemaccana/python-docx/issues
(also let us know - there's a local forum for promoting
Hebrew/bidi related issues in free software)
Any of you had the pleasure of
using that,
Or working (successfully) with
win32com.client to parse word documents?
Haven't done that either (I did use win32com to
communicate with Outlook - but that was long time ago).
Another idea which you might try, in case the python ODF
tools are more mature:
Convert to ODF (using the libreoffice/openoffice docx
filter, which is quite good), then try ezodf/odfpy...
Just my 2c,
AA
|
_______________________________________________
Python-il mailing list
[email protected]
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il