Re: extracting text from docx files
There are several docx converters online (google docx2pdf). Haven't tried them though. LibreOffice handles docx quite well. On Thu, Aug 11, 2011 at 12:22:22PM +0100, Anton Shterenlikht typed: > On Thu, Aug 11, 2011 at 12:14:51PM +0200, Polytropon wrote: > > On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote: > > > On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote: > > > > I often receive information in *.docx format > > > > from my MS using colleagues. Sometimes I can > > > > ask for a pdf (or similar) instead, but not always. > > > > > > You have a lot of nice options: > > > - Force them to use BSD/Linux ;) > > > - explain them, why docx is shit! > > > - don't read it > > > > I also suggest to combine this with reading the following > > article: > > > > http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents > > > > It's very polite and precise about why using "DOC" files > > is generally a bad idea. It can be easily concluded that > > it also applies to "DOCX" files. > > > > The document also discusses alternatives. > > That's not my war. It's not going to achive > much me telling all our admin and academic > staff that what they were tought throughout > their career might not be ideal, or even > not the only, tool in the universe. > Sometimes I can request pdf, sometimes I fail. > > I also sometimes try to get pdf from various > UK govt departments. Sometimes they only > make documents available in MS formats. > Again, sometimes they respond well, but > mostly, they ignore my requests. > > By the way, I tried abiword, and it couldn't > open my docx. > > -- > Anton Shterenlikht > Room 2.6, Queen's Building > Mech Eng Dept > Bristol University > University Walk, Bristol BS8 1TR, UK > Tel: +44 (0)117 331 5944 > Fax: +44 (0)117 929 4423 > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org" ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Thu, Aug 11, 2011 at 12:14:51PM +0200, Polytropon wrote: > On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote: > > On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote: > > > I often receive information in *.docx format > > > from my MS using colleagues. Sometimes I can > > > ask for a pdf (or similar) instead, but not always. > > > > You have a lot of nice options: > > - Force them to use BSD/Linux ;) > > - explain them, why docx is shit! > > - don't read it > > I also suggest to combine this with reading the following > article: > > http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents > > It's very polite and precise about why using "DOC" files > is generally a bad idea. It can be easily concluded that > it also applies to "DOCX" files. > > The document also discusses alternatives. That's not my war. It's not going to achive much me telling all our admin and academic staff that what they were tought throughout their career might not be ideal, or even not the only, tool in the universe. Sometimes I can request pdf, sometimes I fail. I also sometimes try to get pdf from various UK govt departments. Sometimes they only make documents available in MS formats. Again, sometimes they respond well, but mostly, they ignore my requests. By the way, I tried abiword, and it couldn't open my docx. -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote: > On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote: > > I often receive information in *.docx format > > from my MS using colleagues. Sometimes I can > > ask for a pdf (or similar) instead, but not always. > > You have a lot of nice options: > - Force them to use BSD/Linux ;) > - explain them, why docx is shit! > - don't read it I also suggest to combine this with reading the following article: http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents It's very polite and precise about why using "DOC" files is generally a bad idea. It can be easily concluded that it also applies to "DOCX" files. The document also discusses alternatives. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, 9 Aug 2011, Anton Shterenlikht wrote: On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote: But if you really, really need to read docx, you can try the web application from Microsoft. A few months ago, I got also a lot of docx and I opend it with the microsoft web app; this worked for me to extract the information... More information: http://office.microsoft.com/en-us/web-apps/ The downside: ?you have to sign up on a microsoft service :( Can also use libreoffice. It is in the ports system :) Without installing anything, Google Docs also opens *.docx files, if needed. There are other options too, but it depends on what Anton wants to install* or just view* & extract? Well.. I don't really want to install anything just to read docx. So probably something as small as possible. libreoffice (even if it's in ports, which I dearly love) looks like a monster of a package, so I'm not sure. Maybe an online service? If you don't have too many to convert at one time, and there's nothing secret in them, you could try http://www.doc2pdf.net/ - I've never used it, so caveat clicktor. -- Chris Hill ch...@monochrome.org ** [ Busy Expunging ] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, 9 Aug 2011, Anton Shterenlikht wrote: Well.. I don't really want to install anything just to read docx. So probably something as small as possible. libreoffice (even if it's in ports, which I dearly love) looks like a monster of a package, so I'm not sure. Although still relatively large, OpenOffice has fewer dependencies than LibreOffice. My system has OO.o 3.3 installed, and 'make missing' shows seventeen new dependencies needed by LibreOffice. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
> Well.. I don't really want to install anything > just to read docx. So probably something as > small as possible. libreoffice (even if it's in ports, > which I dearly love) looks like a monster of > a package, so I'm not sure. > > Thanks anyway > > > -- abiword is a word processor that opens docx files, and is in the ports :) You are welcome to check it out :) I mentioned libreoffice because it is a full suite but it is BIG :( It is not a MONSTER :) Regards, Antonio ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote: > > But if you really, really need to read docx, you can try the web > > application from Microsoft. A few months ago, I got also a lot of docx > > and I opend it with the microsoft web app; this worked for me to extract > > the information... > > > > More information: > > http://office.microsoft.com/en-us/web-apps/ > > > > The downside: ?you have to sign up on a microsoft service :( > > > > Can also use libreoffice. It is in the ports system :) > > Without installing anything, Google Docs also opens *.docx files, if > needed. There are other options too, but it depends on what Anton > wants to install* or just view* & extract? Well.. I don't really want to install anything just to read docx. So probably something as small as possible. libreoffice (even if it's in ports, which I dearly love) looks like a monster of a package, so I'm not sure. Thanks anyway -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 9, 2011 at 3:57 PM, Antonio Olivares wrote: >> But if you really, really need to read docx, you can try the web >> application from Microsoft. A few months ago, I got also a lot of docx >> and I opend it with the microsoft web app; this worked for me to extract >> the information... >> just a thought here but if docx is XML why not just find/build some XSLT that extracts what you need into another format? you probably have libxml2 and libxslt already in your system, and the command line utility: xsltproc there are probably already existing XSLT to transform to RTF and plain text. -- Alejandro Imass ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote: > > But if you really, really need to read docx, you can try the web > > application from Microsoft. A few months ago, I got also a lot of docx > > and I opend it with the microsoft web app; this worked for me to extract > > the information... > > > > More information: > > http://office.microsoft.com/en-us/web-apps/ > > > > The downside: ?you have to sign up on a microsoft service :( > > > > Can also use libreoffice. It is in the ports system :) Sure. But libreoffice is a matter of opinion. *I* would never ever install this bloated, buggy software product @_@ But, I must admit that I am very petted: vim + LaTeX _rocks_ > > Without installing anything, Google Docs also opens *.docx files, if > needed. There are other options too, but it depends on what Anton > wants to install* or just view* & extract? I have a google account but I never used Google Docs. Nice to know... > > Regards, > > Antonio > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org" -- Christian Barthel Public-Key: http://bc.user-mode.org/bc.asc Mail: b...@nyx.user-mode.org Web: http://bc.user-mode.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
> But if you really, really need to read docx, you can try the web > application from Microsoft. A few months ago, I got also a lot of docx > and I opend it with the microsoft web app; this worked for me to extract > the information... > > More information: > http://office.microsoft.com/en-us/web-apps/ > > The downside: you have to sign up on a microsoft service :( > Can also use libreoffice. It is in the ports system :) Without installing anything, Google Docs also opens *.docx files, if needed. There are other options too, but it depends on what Anton wants to install* or just view* & extract? Regards, Antonio ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote: > I often receive information in *.docx format > from my MS using colleagues. Sometimes I can > ask for a pdf (or similar) instead, but not always. You have a lot of nice options: - Force them to use BSD/Linux ;) - explain them, why docx is shit! - don't read it > > Usually I unzip a docx and then search > through all *xml files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. But if you really, really need to read docx, you can try the web application from Microsoft. A few months ago, I got also a lot of docx and I opend it with the microsoft web app; this worked for me to extract the information... More information: http://office.microsoft.com/en-us/web-apps/ The downside: you have to sign up on a microsoft service :( cheers -- Christian Barthel Public-Key: http://bc.user-mode.org/bc.asc Mail: b...@nyx.user-mode.org Web: http://bc.user-mode.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
El día Tuesday, August 09, 2011 a las 10:25:30AM -0700, Kurt Buff escribió: > My installation of OpenOffice 3.3 on my Win7 machine will open a > Winword 2010 .docx file. > > I'm guessing it will do the same on FreeBSD, but I don't have an > install with a GUI running at the moment. It does, using OpenOffice 3.4.0 in 9-CURENT. matthias -- Matthias Apitz t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211 e - w http://www.unixarea.de/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 9, 2011 at 06:36, Anton Shterenlikht wrote: > I often receive information in *.docx format > from my MS using colleagues. Sometimes I can > ask for a pdf (or similar) instead, but not always. > > Usually I unzip a docx and then search > through all *xml files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. My installation of OpenOffice 3.3 on my Win7 machine will open a Winword 2010 .docx file. I'm guessing it will do the same on FreeBSD, but I don't have an install with a GUI running at the moment. Kurt ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, Aug 09, 2011 at 09:40:26AM -0400, Rod Person wrote: > On Tue, 9 Aug 2011 14:36:32 +0100 > Anton Shterenlikht wrote: > > > Usually I unzip a docx and then search > > through all *xml files to find the > > useful data. However, I can't find any > > xml styles to use, so I have to convert > > the relevant xml file(s) to plain text > > by hand. I wonder if anybody can suggest > > a better way. Perhaps there's something > > in ports that can help. > > You could try this for just plain text conversion > http://docx2txt.sourceforge.net/ Thank you Anton -- Anton Shterenlikht Room 2.6, Queen's Building Mech Eng Dept Bristol University University Walk, Bristol BS8 1TR, UK Tel: +44 (0)117 331 5944 Fax: +44 (0)117 929 4423 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: extracting text from docx files
On Tue, 9 Aug 2011 14:36:32 +0100 Anton Shterenlikht wrote: > Usually I unzip a docx and then search > through all *xml files to find the > useful data. However, I can't find any > xml styles to use, so I have to convert > the relevant xml file(s) to plain text > by hand. I wonder if anybody can suggest > a better way. Perhaps there's something > in ports that can help. You could try this for just plain text conversion http://docx2txt.sourceforge.net/ -- Rod ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"