Re: How to read a word 7 file?

1998-06-24 Thread sjc
On Wed, Jun 24, 1998 at 10:57:03AM +1000, Hamish Moffatt wrote:
> On Tue, Jun 23, 1998 at 09:34:00AM -0700, Luiz Otavio L. Zorzella wrote:
> > Why would strings show only the first, and not all of the texts?
> 
> It will show both. However, it could show you text that, as far as Word
> is concerned, is no longer in the document. Also the changes might
> be stored such that they can't be read by themselves but they change
> the meaning of the original significantly.

Actually thats not exactly true...its half true :)
It will show you stuff that "as far as word is concerned has been deleted."
It WILL NOT however show you the "added text" the added text is 
not stored as text (as far as I could see)...looked like it was maybe even 
compressed (probably not very well :) )

> Of course, as Ted points out, strings might show you all sorts of things
> the author didn't want you to see :-) I think this has been on comp.risks
> within the last few months.

I think I also pointed this out but:
as I read on BUGTRAQ, that ONLY goes for Mac versions...and it goes for ANY 
program which uses OLE extensions (ie only M$ products). This is actually
a bug in the OLE system...and was fixed for Windows a long time ago
(gues sthe Windows and Mac OLE source trees are completly split)
 
-Steve

-- 
** Stephen Carpenter ** ** ** ** ** ** ** ** ** ** ** ** [EMAIL PROTECTED] **
"I am an agnostic; I do not pretend to know what many ignorant men are 
sure of."
-- Clarance Darrow 


pgp0lcv4QZXLj.pgp
Description: PGP signature


Re: How to read a word 7 file?

1998-06-24 Thread Hamish Moffatt
On Tue, Jun 23, 1998 at 09:34:00AM -0700, Luiz Otavio L. Zorzella wrote:
> Why would strings show only the first, and not all of the texts?

It will show both. However, it could show you text that, as far as Word
is concerned, is no longer in the document. Also the changes might
be stored such that they can't be read by themselves but they change
the meaning of the original significantly.

But Ted is correct when he says it's (a) useful and (b) rough and ready
and I jumped on him a bit too much.

Of course, as Ted points out, strings might show you all sorts of things
the author didn't want you to see :-) I think this has been on comp.risks
within the last few months.

hamish
-- 
Hamish Moffatt, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Latest Debian packages at ftp://ftp.rising.com.au/pub/hamish. PGP#EFA6B9D5
CCs of replies from mailing lists are welcome.   http://hamish.home.ml.org


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


RE: How to read a word 7 file?

1998-06-24 Thread Ted Harding
On 23-Jun-98 Luiz Otavio L. Zorzella wrote:
> 
> "strings" would do a good job for me, but...
> 
>  > and then edit wordfile.txt to clean it up. Raw "strings" will skip
> sequences of
>  > fewer than 4 ASCII characters but these are unlikely to occur in a Word
>  > document. This method will suppress all formatting info except
> end-of-line, so
>  > you are likely to get long lines (= Word paragraphs). It will also fail to
>  > recognise any non-US-ASCII character codes (above 127) so accented
> characters
>  > and special symbols, etc, will be missed. But if you simply need to read
> the
>  > text content of a Word document containing plain English text, then this
> method
>  > works fine.
> 
> ... my text is in portuguese, and does have non-US chars. Is there a
> way to tell "strings" to accept some non-US chars?

Unfortunately not, or not well ... "strings" works by extracting sequences of
US-ASCII characters (codes 32-126) of length (by default) at least 4. If you
KNEW that codes outside this range really did represent characters (such as
the accented characters in Portuguese) then you wouldn't need to use "strings"!

However, in word-processor files (such as Word's or WordPerfect's) the
codes outside that range have various "binary" significances as well (in Word's
case) as representing "special" characters. The only way you could get at these
would be to interpret the binary codes so as to locate stretches of text.
Otherwise, an approach as simple as the one used by "strings" would simply
have to output every byte in the file. Useless. Sorry. And apologies for blindly
assuming you were after plain ASCII!

Without going for programs (such as those suggested by others) which really can
at least partially interpret a Word file, the best you could do with "strings"
would be to edit-in the missing accented characters afterwards. Possible, but
tedious, and maybe error-prone.

However, the handiness of "strings" as a quick utility is such that it might be
worth re-coding it so that, as well as the ASCII codes, it also included the
spcific codes for the accented characters in a specific language. This would
increase the amount of garbage in the output but, so long as one was selective,
perhaps by not too much. (If you're going to do this for Word, remember that
there are two different encodings for accented characters: "Win"-encoding and
"Mac"-encoding).

The best of luck with the other options!
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Date: 23-Jun-98   Time: 21:45:54



--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Ted Harding
On 23-Jun-98 Hamish Moffatt wrote:
> On Tue, Jun 23, 1998 at 11:26:18AM +0100, Ted Harding wrote:
>> A rough-and-ready way to do just what you're asking is to use the "strings"
>> command:
>> 
>>strings wordfile.doc > wordfile.txt
> 
> It's not quite so simple; Word's "fast save" mechanism actually appends
> changes to the document since the last save to the end of the file.
> Once some number of fast saves has been exceeded it will rewrite the whole
> lot.
> "strings" is only going to show the original at the last non-fast-save time.
> This could be completely different!

Point taken!  (Likewise the similar point, and the other most interesting
points, from [EMAIL PROTECTED] ("Steve")). I agree; when "Fast-Save" has piled 
up
a whole stack of changes you should look for a way to access the
Word-specific features of the document; but when it's Word-7 and you can only
handle say Word-(<=6) then it's a bit tricky. I did say "rough and ready"; and
so long as there are not too many changes you may be able to suss out what the
final version should be, and it can work for Word-7 even if you can't read it
directly.

When you look at the text content of a Word file using strings, you
are quite likely to find relics of deleted stuff, or even stuff from other
documents, that the person sending you the file did not wish you to see.
Anything from an order for embarrassing mechandise, to the part of the letter
to your boss that he cut out before sending you the rest.

Extract these, mail them back (anonymously if need be), and add a warning that
"Word Can Let You Down".

Cheers,
Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Date: 23-Jun-98   Time: 15:44:55



--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread servis
*-Luiz Otavio L. Zorzella (23 Jun)
| Hamish Moffatt writes:
|  > On Tue, Jun 23, 1998 at 11:26:18AM +0100, Ted Harding wrote:
|  > > A rough-and-ready way to do just what you're asking is to use the 
"strings"
|  > > command:
|  > > 
|  > >strings wordfile.doc > wordfile.txt
|  > 
|  > It's not quite so simple; Word's "fast save" mechanism actually appends
|  > changes to the document since the last save to the end of the file.
|  > Once some number of fast saves has been exceeded it will rewrite the whole 
lot.
|  > "strings" is only going to show the original at the last non-fast-save 
time.
|  > This could be completely different!
|  >
| 
| Why would strings show only the first, and not all of the texts?
| 

The 'new' text will be all out of order at the end of the file. The
word file keeps the orginal text at the begining of the file and then
keeps all the changes at the end of the document along with some
indexing mechanism to tell word were to put the new text(or remove s
ome text) into the orginal text.  Thus you could have a word file that
is huge on disk but is completely blank because word has all the orginal
info at the begining and all the info that the orginal has been deleted
at the end!  I personally turn off 'fast save' whenever I use word.

-- 
Brian 
-- 
Mechanical Engineering  [EMAIL PROTECTED]
Purdue University   http://www.ecn.purdue.edu/~servis


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread servis
*-Luiz Otavio L. Zorzella (23 Jun)
| 
| Well, maybe it is a 8.0 format, then. All I know is that So complains
| "This is not a word 6.0 file", so I guessed it was a 7.0.
| 
| BTW, how can I find out which version is it?
| 

strings  | grep 'Word\.Document'

examples:
% strings chap1.doc | grep 'Word\.Document'
Word.Document.8
% strings iac95abs.doc  | grep 'Word\.Document'
Word.Document.6


Brian 
-- 
Mechanical Engineering  [EMAIL PROTECTED]
Purdue University   http://www.ecn.purdue.edu/~servis


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Luiz Otavio L. Zorzella
Hamish Moffatt writes:
 > On Tue, Jun 23, 1998 at 11:26:18AM +0100, Ted Harding wrote:
 > > A rough-and-ready way to do just what you're asking is to use the "strings"
 > > command:
 > > 
 > >strings wordfile.doc > wordfile.txt
 > 
 > It's not quite so simple; Word's "fast save" mechanism actually appends
 > changes to the document since the last save to the end of the file.
 > Once some number of fast saves has been exceeded it will rewrite the whole 
 > lot.
 > "strings" is only going to show the original at the last non-fast-save time.
 > This could be completely different!
 >

Why would strings show only the first, and not all of the texts?

-- 
Luiz Otavio L. Zorzella Product Engineer
[EMAIL PROTECTED]  http://www.conexware.com


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


RE: How to read a word 7 file?

1998-06-23 Thread Luiz Otavio L. Zorzella
(Ted Harding) writes:
 > On 22-Jun-98 Luiz Otavio L. Zorzella wrote:
 > > 
 > > Hi, folks.
 > > 
 > > Is there any way to read in my linux box a word 7 .doc file?
 > > Mantaining the "indents" and "bolds" would be a plus, but mainly I
 > > just need to read the text in it.
 > > 
 > > I use StarOffice to read docs, but it only reads up to word 6 files
 > >:^<
 > 
 > A rough-and-ready way to do just what you're asking is to use the "strings"
 > command:
 > 
 >strings wordfile.doc > wordfile.txt
 >

"strings" would do a good job for me, but...

 > and then edit wordfile.txt to clean it up. Raw "strings" will skip sequences 
 > of
 > fewer than 4 ASCII characters but these are unlikely to occur in a Word
 > document. This method will suppress all formatting info except end-of-line, 
 > so
 > you are likely to get long lines (= Word paragraphs). It will also fail to
 > recognise any non-US-ASCII character codes (above 127) so accented characters
 > and special symbols, etc, will be missed. But if you simply need to read the
 > text content of a Word document containing plain English text, then this 
 > method
 > works fine.

... my text is in portuguese, and does have non-US chars. Is there a
way to tell "strings" to accept some non-US chars?

Thanks.

-- 
Luiz Otavio L. Zorzella Product Engineer
[EMAIL PROTECTED]  http://www.conexware.com


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Luiz Otavio L. Zorzella
Thomas Apel writes:
 > Luiz Otavio L. Zorzella wrote:
 > > 
 > > Hi, folks.
 > > 
 > > Is there any way to read in my linux box a word 7 .doc file?
 > > Mantaining the "indents" and "bolds" would be a plus, but mainly I
 > > just need to read the text in it.
 > > 
 > > I use StarOffice to read docs, but it only reads up to word 6 files
 > > :^<
 > 
 > Was SO definitely not able to read the file? I'm not sure but as I
 > remember Word 6.0 and 7.0 use the same file format. The version with the
 > new format is 8.0.
 >

Well, maybe it is a 8.0 format, then. All I know is that So complains
"This is not a word 6.0 file", so I guessed it was a 7.0.

BTW, how can I find out which version is it?

-- 
Luiz Otavio L. Zorzella Product Engineer
[EMAIL PROTECTED]  http://www.conexware.com


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Paul Reavis
catdoc and wordview both work for me - isn't word 7 the one in orifice
95? The both work on a doc I got from Those Who Manage in such a format.

-- 

Paul Reavis  [EMAIL PROTECTED]
Design Lead
Partner Software, Inc.http://www.partnersoft.com


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


RE: How to read a word 7 file?

1998-06-23 Thread Dennis Dai
As far as I know, word 7 is part of Office 95 and has the same file
format with word 6 (which is part of Office 4.2). The latest version of
word is 8 (which is part of Office 97) and they changed the file format
again.

So if you can read word 6 format (like in StartOffice 4.03), you should
be able to read word 7 format, though I've never tried it yet. BTW,
StarOffice 4.03 is such a resource hog.

Dennis


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Thomas Apel
Luiz Otavio L. Zorzella wrote:
> 
> Hi, folks.
> 
> Is there any way to read in my linux box a word 7 .doc file?
> Mantaining the "indents" and "bolds" would be a plus, but mainly I
> just need to read the text in it.
> 
> I use StarOffice to read docs, but it only reads up to word 6 files
> :^<

Was SO definitely not able to read the file? I'm not sure but as I
remember Word 6.0 and 7.0 use the same file format. The version with the
new format is 8.0.

-- 
Thomas Apel <[EMAIL PROTECTED]>
PGP Key IDs: 0x90B40401 (RSA) and 0x5B980B91 (DH/DSS)


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread sjc
On Tue, Jun 23, 1998 at 11:26:18AM +0100, Ted Harding wrote:
> On 22-Jun-98 Luiz Otavio L. Zorzella wrote:
> > 
> > Hi, folks.
> > 
> > Is there any way to read in my linux box a word 7 .doc file?
> > Mantaining the "indents" and "bolds" would be a plus, but mainly I
> > just need to read the text in it.
> > 
> > I use StarOffice to read docs, but it only reads up to word 6 files
> >:^<
> 
> A rough-and-ready way to do just what you're asking is to use the "strings"
> command:
> 
>strings wordfile.doc > wordfile.txt

This has one major flaw to it It may not give you a RECENT copy of the
data (unless the "QucikSave" option was OFF...but how often is that the case?)

A few months back I was called to check out a problem someone had with
a file, word document. Whenever they tried to open it they got
an "Out of Memory error" even tho it was at best 200-300 k in size.
(They also admitted to just recently cleaning a virus from it)

This was of course the ONLY copy of the file, on someones floppy 
disk, and of course there was a grant pending. I figured there
was nothing left to loose, so I opened it up in Simpletext (this
was a mac). I found a rather large text document, with long lines
but it was complete with bibliography et al. I told him 
"This is the best I can do for you"

The next thing the man did was practically cry. He said "This is the
original document I got 2 weeks ago, it doesn't have ANY of my changes"
He had worked 10 hours a day for 2 weeks on this 1 document, and now
he is back to square one with the grant deadline breathing down
is neck.

 ALways use full save, and better yet Don't use MS Word. 
else strings may not give you the output you want 

 The "Don't use MS Word" warning goes doubly so
for current Macintosh versions, just the other day there was a warning
on BUGTRAQ (thats where I saw it anyway) warning that MS Word
uses rather random bits of machine memory to fill in buffers...
so on one of those files, strings may even bring up some
passwords etc 

-Steve


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Hamish Moffatt
On Tue, Jun 23, 1998 at 11:26:18AM +0100, Ted Harding wrote:
> A rough-and-ready way to do just what you're asking is to use the "strings"
> command:
> 
>strings wordfile.doc > wordfile.txt

It's not quite so simple; Word's "fast save" mechanism actually appends
changes to the document since the last save to the end of the file.
Once some number of fast saves has been exceeded it will rewrite the whole lot.
"strings" is only going to show the original at the last non-fast-save time.
This could be completely different!

Hamish
-- 
Hamish Moffatt, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]
Latest Debian packages at ftp://ftp.rising.com.au/pub/hamish. PGP#EFA6B9D5
CCs of replies from mailing lists are welcome.   http://hamish.home.ml.org


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


RE: How to read a word 7 file?

1998-06-23 Thread Ted Harding
On 22-Jun-98 Luiz Otavio L. Zorzella wrote:
> 
> Hi, folks.
> 
> Is there any way to read in my linux box a word 7 .doc file?
> Mantaining the "indents" and "bolds" would be a plus, but mainly I
> just need to read the text in it.
> 
> I use StarOffice to read docs, but it only reads up to word 6 files
>:^<

A rough-and-ready way to do just what you're asking is to use the "strings"
command:

   strings wordfile.doc > wordfile.txt

and then edit wordfile.txt to clean it up. Raw "strings" will skip sequences of
fewer than 4 ASCII characters but these are unlikely to occur in a Word
document. This method will suppress all formatting info except end-of-line, so
you are likely to get long lines (= Word paragraphs). It will also fail to
recognise any non-US-ASCII character codes (above 127) so accented characters
and special symbols, etc, will be missed. But if you simply need to read the
text content of a Word document containing plain English text, then this method
works fine.

Hope this helps,

Ted.


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Date: 23-Jun-98   Time: 11:26:18



--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Guido Bozzetto
> Is there any way to read in my linux box a word 7 .doc file?
See: 
  http://www.csn.ul.ie/~caolan/docs/MSWordView.html
convert word 7 docs into HTML.
 Ciao.
-- 
+++
| Guido Bozzetto | Office phone & Fax: +39 432 548314 | Il sunadôr 
|   \/ I | E-mail:   :-)   mailto:[EMAIL PROTECTED] |   pajât prime
|   OO like  | Web:http://www.nauta.it/~bozzetto/ | nol à mai fat
|   -- Linux | Talk:[EMAIL PROTECTED] |   buine mùsiche!
+++


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


Re: How to read a word 7 file?

1998-06-23 Thread Shaleh
None that I am aware of.  To my knowledge nothing reads word 7 except
word 7, even in Windows land.  However Wordperfect is coming, and it
should be able to.  I use the curent wp7 and it works nicely.

Luiz Otavio L. Zorzella wrote:
> 
> Hi, folks.
> 
> Is there any way to read in my linux box a word 7 .doc file?
> Mantaining the "indents" and "bolds" would be a plus, but mainly I
> just need to read the text in it.
> 
> I use StarOffice to read docs, but it only reads up to word 6 files
> :^<
> 
> Thanks
> 
> --
> Luiz Otavio L. Zorzella Product Engineer
> [EMAIL PROTECTED]  http://www.conexware.com
> 
> --
> To UNSUBSCRIBE, email to [EMAIL PROTECTED]
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]