Re: [SLUG] Weird Word docs

2009-12-01 Thread Piers Rowan


Peter Chubb wrote:



Rodolfo> How do you read .docx documents without OOo? Or you install
Rodolfo> OOo libraries and convert the document to other format?

-

Alternatively if you already have a gnome desktop, abiword works most
of the time, but its user interface is modelled on that of Word(TM) (and is
therefore horrible to me).

  


If you just want the text of the document then unzip the file and 
convert the document.xml to text.


Below is some (old) code which does this.


In PHP:

   function importDOCX($file_info)
   {
   shell_exec('mkdir /tmp/docx');
   shell_exec("cp " . DOC_ROOT . 'documents/' . str_replace(' ', 
'_', $file_info['file_group'])  . '/' . $file_info['sub_folder_path'] . 
'/' . str_replace(')', '\)', str_replace('(', '\(', str_replace(' ', '\ 
', $file_info['filename']) .' /tmp/docx/')));


   shell_exec('unzip /tmp/docx/' . str_replace(')', '\)', 
str_replace('(', '\(', str_replace(' ', '\ ', $file_info['filename']))) 
. ' -d /tmp/docx/');

   $file = file_get_contents('/tmp/docx/word/document.xml');
   shell_exec('rm -fR /tmp/docx');


   $file = str_replace('  
   }



Also you can use antiword from the command line:


   function importDOC($file_info)
   {
   return shell_exec("/usr/bin/antiword " . DOC_ROOT . 'documents/' 
. str_replace(' ', '_', $file_info['file_group'])  . '/' . 
$file_info['sub_folder_path'] . '/' . str_replace(')', '\)', 
str_replace('(', '\(', str_replace(' ', '\ ', $file_info['filename'];

   }


These are both converts to text for searching purposes from some legacy 
code. The do meet the criteria on minimum install.


Cheers

P


--


*Piers Rowan
Managing Director
Recruit Online



piers.ro...@recruitonline.com.au 
Mob: 0418 770 331

www.recruitonline.com.au 
*

   * *Recruitment, Advertising, Document Managment, CRM, Online Storage
 and Search hosted services.*

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-12-01 Thread Peter Chubb
> "Rodolfo" == Rodolfo Martínez  writes:


Rodolfo> How do you read .docx documents without OOo? Or you install
Rodolfo> OOo libraries and convert the document to other format?

wvMime does the `right thing' most of the time.  It converts to LaTeX
then to postscript or html.

Alternatively if you already have a gnome desktop, abiword works most
of the time, but its user interface is modelled on that of Word(TM) (and is
therefore horrible to me).

--
Dr Peter Chubb  www.nicta.com.aupeter DOT chubb AT nicta.com.au
http://www.ertos.nicta.com.au   ERTOS within National ICT Australia
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-12-01 Thread Rodolfo Martínez
Hi Peter,

You can try catdoc instead of strings, you will get a nicer output.
catdoc will try to recognise tables and give some format.

unoconv seems to be a good option to convert doc[x] documents to some
other format, but I think you will need more than 1 GB after
installation since it pulls a lot of packages.


How do you read .docx documents without OOo? Or you install OOo
libraries and convert the document to other format?



Rodolfo Martínez



On Mon, Nov 30, 2009 at 4:09 PM, Peter Chubb  wrote:
>
> Hi,
>        Every now and then I get sent a Word document that I can't
> just ignore, or ask the sender to resend as plain text or PDF or
> something.
>
> Most of the time wvMime can translate the doc so I can read it and
> convert to LaTeX or something sensible.  /usr/bin/strings doesa a reasonable
> job where formatting isn't important.
>
> Sometimes however, I get sent documents (usually full of tables) that
> purport to be in portrait orientation, but are actually Landscape ---
> wvMime (and abiWord on my one box with Gnome on it) attempt to display
> the doc on a portrait oriented a4 page, with all the text truncated at
> the right margin.
>
> How do other people deal with these weird documents?  Is there an easy
> way to view or convert them (NOT using Gnome or OpenOffice or other
> heavyweight GUI tools)?
> --
> Dr Peter Chubb        www.nicta.com.au      peter DOT chubb AT nicta.com.au
> http://www.ertos.nicta.com.au           ERTOS within National ICT Australia
> From Imagination to Impact                       Imagining the (ICT) Future
> --
> SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
> Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
>
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-12-01 Thread peter
> "jam" == jam   writes:

jam> On Tuesday 01 December 2009 09:00:04 slug-requ...@slug.org.au
jam> wrote:
>> Every now and then I get sent a Word document that I can't just
>> ignore, or ask the sender to resend as plain text or PDF or
>> something.
>> 
>> Most of the time wvMime can translate the doc so I can read it and
>> convert to LaTeX or something sensible.  /usr/bin/strings doesa a
>> reasonable job where formatting isn't important.
>> 
>> Sometimes however, I get sent documents (usually full of tables)
>> that purport to be in portrait orientation, but are actually
>> Landscape --- wvMime (and abiWord on my one box with Gnome on it)
>> attempt to display the doc on a portrait oriented a4 page, with all
>> the text truncated at the right margin.
>> 
>> How do other people deal with these weird documents?  Is there an
>> easy way to view or convert them (NOT using Gnome or OpenOffice or
>> other heavyweight GUI tools)?

jam> Although the borg is utterly evil and to be shunned, he does
jam> produce some good software. crossover office and office 2007 work
jam> so nicely that she-who-must- be-obeyed uses it for her committee
jam> work every day with nary a glitch of any sort.

Well... I find the user interface completely arcane, both in
OpenOffice and in Word on a real windows box.

jam> And, as emotionally challanged as I am to say it, (thinks of
jam> Cleese, the window and a Fish Called Wanda) as she was an utter
jam> neophyte (with office tools) who has never used winders in any
jam> shape or form (so much hand holding was needed) it has been very
jam> much easier to teach her word than OO. (I too have never neen a
jam> winders user and have used OO since it was li'l) H.

Hmmm.  I used to teach vi and troff+mm to neophytes.  They got the
hang fairly quickly in general.

jam> The problem with OO is that it often screwed doc or docx
jam> documents so they look nothing like the orriginal. Not good when
jam> you're doing the minutes or agenda from templates.

the problem with Oo is it ooms on my laptop (128M ram), and has a
really weird structure to its menus and incomprehensible icons on my
desktop (plus pulling in half a gig of libraries, etc.)

# sudo apt-get install openoffice.org-writer
...
Need to get 576.1MB of archives.
After this operation, 1.2GB of additional disk space will be used.

Is there a lighter-weight way to view word documents?

Peter C
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-11-30 Thread jam
On Tuesday 01 December 2009 09:00:04 slug-requ...@slug.org.au wrote:
> Every now and then I get sent a Word document that I can't
> just ignore, or ask the sender to resend as plain text or PDF or
> something.
> 
> Most of the time wvMime can translate the doc so I can read it and
> convert to LaTeX or something sensible.  /usr/bin/strings doesa a
>  reasonable job where formatting isn't important.
> 
> Sometimes however, I get sent documents (usually full of tables) that
> purport to be in portrait orientation, but are actually Landscape ---
> wvMime (and abiWord on my one box with Gnome on it) attempt to display
> the doc on a portrait oriented a4 page, with all the text truncated at
> the right margin.
> 
> How do other people deal with these weird documents?  Is there an easy
> way to view or convert them (NOT using Gnome or OpenOffice or other
> heavyweight GUI tools)?

Although the borg is utterly evil and to be shunned, he does produce some good 
software. crossover office and office 2007 work so nicely that she-who-must-
be-obeyed uses it for her committee work every day with nary a glitch of any 
sort.

And, as emotionally challanged as I am to say it, (thinks of Cleese, the 
window and a Fish Called Wanda) as she was an utter neophyte (with office 
tools) who has never used winders in any shape or form (so much hand holding 
was needed) it has been very much easier to teach her word than OO. (I too 
have never neen a winders user and have used OO since it was li'l) H.

The problem with OO is that it often screwed doc or docx documents so they 
look nothing like the orriginal. Not good when you're doing the minutes or 
agenda from templates.

James
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-11-30 Thread Daniel Pittman
Ken Foskey  writes:
> On Tue, 2009-12-01 at 09:09 +1100, Peter Chubb wrote:
>
>> How do other people deal with these weird documents?  Is there an easy
>> way to view or convert them (NOT using Gnome or OpenOffice or other
>> heavyweight GUI tools)?
>
> Unfortunately Word is not a specification just a jungle of data that has
> grown from the first version till today.  This means that it is really hard
> to import these documents.  OOo hired the expert who reverse engineered the
> format for other FOSS tools and therefore it is a really robust strong
> product.  Tables are a major problem, even for OOo, I cannot import the
> Scouts forms because of their tables cleanly and they keep changing them
> every 3 months lately, total pain.
>
> OOo is your only real choice.

*nod*  I have the same issue, and the same preference to avoid OOo for just
reading the documents.  I never did find a solution that didn't require it,
though, which was sad.

OTOH, the 'unoconv' tool makes it a bit nicer: it wraps up the automation
required to use OOo to convert documents /without/ needing to show a GUI or
whatever, driven from the command-line.

http://dag.wieers.com/home-made/unoconv/

>From the original README:

Since OpenOffice 2.3 you do not need an X display for starting ooffice.
However you may need the openoffice.org-headless package from your
distribution. Since OpenOffice 2.4 nothing special is needed, running in
headless mode does not require X.


Regards,
Daniel
-- 
✣ Daniel Pittman✉ dan...@rimspace.net☎ +61 401 155 707
   ♽ made with 100 percent post-consumer electrons
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] Weird Word docs

2009-11-30 Thread Ken Foskey
On Tue, 2009-12-01 at 09:09 +1100, Peter Chubb wrote:

> How do other people deal with these weird documents?  Is there an easy
> way to view or convert them (NOT using Gnome or OpenOffice or other
> heavyweight GUI tools)?

Unfortunately Word is not a specification just a jungle of data that has
grown from the first version till today.  This means that it is really
hard to import these documents.  OOo hired the expert who reverse
engineered the format for other FOSS tools and therefore it is a really
robust strong product.  Tables are a major problem,  even for OOo, I
cannot import the Scouts forms because of their tables cleanly and they
keep changing them every 3 months lately, total pain.

OOo is your only real choice.

Ken  (Ex OOo developer)
 

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


[SLUG] Weird Word docs

2009-11-30 Thread Peter Chubb

Hi,
Every now and then I get sent a Word document that I can't
just ignore, or ask the sender to resend as plain text or PDF or
something.

Most of the time wvMime can translate the doc so I can read it and
convert to LaTeX or something sensible.  /usr/bin/strings doesa a reasonable
job where formatting isn't important.

Sometimes however, I get sent documents (usually full of tables) that
purport to be in portrait orientation, but are actually Landscape ---
wvMime (and abiWord on my one box with Gnome on it) attempt to display
the doc on a portrait oriented a4 page, with all the text truncated at
the right margin.

How do other people deal with these weird documents?  Is there an easy
way to view or convert them (NOT using Gnome or OpenOffice or other
heavyweight GUI tools)?
--
Dr Peter Chubbwww.nicta.com.au  peter DOT chubb AT nicta.com.au
http://www.ertos.nicta.com.au   ERTOS within National ICT Australia
From Imagination to Impact   Imagining the (ICT) Future
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html