Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Ben Scott
On Wed, Feb 20, 2008 at 6:53 PM, Alex Hewitt <[EMAIL PROTECTED]> wrote:
> I just tried to read these files again with Word and it can read them.
> I'll see if there's a way to read/convert these files from a batch job.

  Not a traditional batch file, I don't think, but it should be very
possible with either (1) VBA or (2) AutoIt.  VBA is "Visual Basic for
Applications", is built-in to MS Office, and it can do just about
anything, provided you can stomach the syntax.  Still, for something
like this, it should do well.  AutoIt is a third-party, freeware
automation scripting tool for 'doze.  It's kind of like expect(1) for
the Windows GUI.  It's useful for scripting that which is not designed
to be scripting.

>  As usual Ben, you're right.
>  P.S. Kind of takes the fun out of heckling Ben ;^)

  If I didn't come up with good info now and again, nobody would put
up with me at all.  ;-)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Microsoft flooding sites with fake traffic

2008-02-20 Thread Coleman Kane
Arc Riley wrote:
> Do you happen to be running google analytics on your site?
No, I'm just parsing the logs. I use awstats
(http://awstats.sourceforge.net) for collecting stats from my logs. I'm
not really familiar with many of google.com's services.

--
Coleman Kane

>
> On Wed, Feb 20, 2008 at 6:08 PM, Coleman Kane <[EMAIL PROTECTED]
> > wrote:
>
> Coleman Kane wrote:
> > Arc Riley wrote:
> >
> >> Hey guys
> >>
> >> Do yourselves a favor and search your logs for connections from
> >> 131.107.* 65.52.* 65.53.* 65.54.* and 65.55.*
> >>
> >> I found a good % of traffic we got, not reported to Google
> Analytics
> >> so I didn't see it sooner, was referred from
> http://search.live.com/
> >> for search queries involving pornography, cars, drugs, and random
> >> gibberish.  The landing pages from these searches were subversion
> >> changesets, source code in the Trac browser, and other places those
> >> search queries certainly don't exist in.
> >>
> >> All of it, well 97.2%, from the above two subnets, belonging to
> >> Microsoft.  It'd be humorous if I didn't just purchase a new colo
> >> server to handle the large volume of traffic pysoy.org
> 
> >>  gets.  I can't tell if MS is trying to skew the
> >> statistics in favor of MSIE/Live/etc or if it's conducting a
> denial of
> >> service attack against free software project sites, perhaps
> both (two
> >> birds with one stone?).
> >>
> >> If you see the similar childish behavior in your logs, please
> join me
> >> in blocking them and being very vocal as to why.
> >>
> >>
> > An interesting find. I just checked my sites and I see the same
> thing,
> > however most of the search queries seem to be pretty pertinent
> to the
> > content of the pages that they reference. It is almost like
> theres some
> > script running on a farm of windows computers that just performs
> > single-word searches on their Windows LiveSearch database, and
> visits
> > the results (posting, of course, the LiveSearch referral in the
> request).
> >
> > Here's my distribution:
> >
> > cat apachelogs/*  | grep live.com   | cut -d\
>  -f1 | cut -d. -f1,2 | sort
> > | uniq -c | sort -rn
> >
> > 308 65.55
> >  10 131.107
> >   4 85.159
> >   3 142.161
> >   2 71.164
> >   2 68.95
> >   2 4.246
> >   2 207.224
> >   1 86.144
> >   1 84.202
> >
> > There are many, many more with single visits, but I left them
> off the
> > list because they probably represent normal livesearch users.
> >
> > --
> > Coleman Kane
> >
> Went a little further and found that all my 65.55 traffic comes
> from the
> 65.55.165 class C. I decided to pass all the visitors to the host
> program and found that all of the visitors have PTR records like this:
> livebot-65-55-165-87.search.live.com
> . The 131.107 traffic
> was all from
> two machines: tide525.microsoft.com 
> and tide526.microsoft.com 
>
> Maybe some others could look at their logs and pull information on the
> other subnets?
>
> --
> Coleman Kane
>
>

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Microsoft flooding sites with fake traffic

2008-02-20 Thread Ben Scott
On Wed, Feb 20, 2008 at 4:46 PM, Arc Riley <[EMAIL PROTECTED]> wrote:
> Do yourselves a favor and search your logs for connections from 131.107.*
> 65.52.* 65.53.* 65.54.* and 65.55.*

  On the GNHLUG web server in /var/log/httpd/ ...

liberty$ find -name access_log\* | xargs egrep '^(131\.107|65.5[2-5])' | wc -l
14293
liberty$ find -name access_log\* | xargs cat | wc -l
185492

  We keep logs going back a month, rotated weekly.

  We don't log referrals or user agents on the GNHLUG server.  Maybe we should.

> All of it, well 97.2%, from the above two subnets, belonging to Microsoft.

  Interesting.  And a relatively small number of unique hosts (152),
given that there are five /16's in question (327680, give or take).

liberty$ find -name access_log\* | xargs egrep -h
'^(131\.107|65.5[2-5])' | awk '{ print $1 }' | sort -u > /tmp/hostlist
liberty$ wc -l /tmp/hostlist
152 /tmp/hostlist

  The IP address DNS reverse to a few different things.  Matching
regexp patterns would be:

NXDOMAIN
tide[0-9]+.microsoft.com
b[ly]1sch[0-9]+.phx.gbl
livebot-65-55-[0-9]+-[0-9]+.search.live.com

  I also took a look at unique URLs:

liberty$ find -name access_log\* | xargs egrep -h
'^(131\.107|65.5[2-5])' | awk -F\" '{ print $2 }' | awk '{ print $2 }'
| sort -u > /tmp/urls
liberty$ wc -l /tmp/urls
7237 /tmp/urls

  The URLs themselves... hmmm, hard to know for sure with our site,
but it looks to me like something is walking the entire site,
following every link, including TWiki's search, history, and edit
links.  Think "wget -r".

> I can't tell if MS is trying to skew the statistics in favor of MSIE/Live/etc
> or if it's conducting a denial of service attack against free software
> project sites ...

  Those two both seem rather unlikely.  In particular, remember
Hanlon's razor.   My guess is some kind of crawler robot.  Possibly a
malfunctioning and/or poorly-designed one.  (That was my guess before
I started digging into logs, by the way.  I also guessed maybe a
botnet, but the small number of requesting hosts make that less
likely.)

  Our numbers show that as roughly 8% of our traffic, by object
request count.  In the distant past, when we did some traffic
analysis, the bulk of the traffic hitting the GNHLUG site was crawler
robots.  So that doesn't seem out of line.

  What's your robots.txt look like?  Does it forbid this kind of behavior?

  What's the rate like, in requests/time and bytes/time?  Are they
flooding your site, or slowly crawling it over time?

> ... please join me in blocking them ...

  You block Google from indexing your site, too, then, right?

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Microsoft flooding sites with fake traffic

2008-02-20 Thread Arc Riley
Do you happen to be running google analytics on your site?

On Wed, Feb 20, 2008 at 6:08 PM, Coleman Kane <[EMAIL PROTECTED]> wrote:

> Coleman Kane wrote:
> > Arc Riley wrote:
> >
> >> Hey guys
> >>
> >> Do yourselves a favor and search your logs for connections from
> >> 131.107.* 65.52.* 65.53.* 65.54.* and 65.55.*
> >>
> >> I found a good % of traffic we got, not reported to Google Analytics
> >> so I didn't see it sooner, was referred from http://search.live.com/
> >> for search queries involving pornography, cars, drugs, and random
> >> gibberish.  The landing pages from these searches were subversion
> >> changesets, source code in the Trac browser, and other places those
> >> search queries certainly don't exist in.
> >>
> >> All of it, well 97.2%, from the above two subnets, belonging to
> >> Microsoft.  It'd be humorous if I didn't just purchase a new colo
> >> server to handle the large volume of traffic pysoy.org
> >>  gets.  I can't tell if MS is trying to skew the
> >> statistics in favor of MSIE/Live/etc or if it's conducting a denial of
> >> service attack against free software project sites, perhaps both (two
> >> birds with one stone?).
> >>
> >> If you see the similar childish behavior in your logs, please join me
> >> in blocking them and being very vocal as to why.
> >>
> >>
> > An interesting find. I just checked my sites and I see the same thing,
> > however most of the search queries seem to be pretty pertinent to the
> > content of the pages that they reference. It is almost like theres some
> > script running on a farm of windows computers that just performs
> > single-word searches on their Windows LiveSearch database, and visits
> > the results (posting, of course, the LiveSearch referral in the
> request).
> >
> > Here's my distribution:
> >
> > cat apachelogs/*  | grep live.com  | cut -d\  -f1 | cut -d. -f1,2 | sort
> > | uniq -c | sort -rn
> >
> > 308 65.55
> >  10 131.107
> >   4 85.159
> >   3 142.161
> >   2 71.164
> >   2 68.95
> >   2 4.246
> >   2 207.224
> >   1 86.144
> >   1 84.202
> >
> > There are many, many more with single visits, but I left them off the
> > list because they probably represent normal livesearch users.
> >
> > --
> > Coleman Kane
> >
> Went a little further and found that all my 65.55 traffic comes from the
> 65.55.165 class C. I decided to pass all the visitors to the host
> program and found that all of the visitors have PTR records like this:
> livebot-65-55-165-87.search.live.com. The 131.107 traffic was all from
> two machines: tide525.microsoft.com and tide526.microsoft.com
>
> Maybe some others could look at their logs and pull information on the
> other subnets?
>
> --
> Coleman Kane
>
>
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Alex Hewitt

On Wed, 2008-02-20 at 17:23 -0500, Ben Scott wrote:
> On Wed, Feb 20, 2008 at 3:58 PM, Alex Hewitt <[EMAIL PROTECTED]> wrote:
> >  A friend of ours wrote a bunch of recipe files using something called
> >  Microsoft Write.
> 
>   Yah, "Windows Write" is/was one of the "accessories" that came with
> Windows 3.x.  It morphed into "WordPad" in Windows 95 and later.
> WordPad still exists.  It won't write the Write (hah) format anymore,
> but it can read it, and save in some variant of the RTF format.
> 
> >  Theoretically Microsoft Word is supposed to be able to read such files
> >  but I found that the version I was using (Word 2003) wouldn't.
> 
>   Curious.  My install of Word 2003 can.  Are you sure you installed
> all the import/export filters?  If you did a "Minimal" or "Custom"
> install (instead of the mondo-huge "Full"), I don't think those are
> all included by default.

As usual Ben, you're right. I just tried to read these files again with
Word and it can read them. I'll see if there's a way to read/convert
these files from a batch job. There are 83 files and I'd hate to need to
process them one at a time.

-Alex

P.S. Kind of takes the fun out of heckling Ben ;^)

> 
> > Writing a filter in Python was trivial and I was  able to convert the
> > files to plain text.
> 
>   For future reference, the strings(1) command can be used to much the
> same effect.
> 
> > ... the file itself in ASCII, a series of bytes  again in non-ASCII,
> > followed by a repeat of some of the original ASCII.
> 
>   That sounds very similar to the MS Word .DOC format, and I bet
> they're related.  DOC files do not interleave the formatting with the
> text, as (for example) HTML or Word Perfect did.  Instead, all the
> plain text is stored in one blob, and then the formatting information
> is stored in a different blob.  The formatting directives have
> "pointers" to the position of the text they effect.
> 
>   The "repeat" you describe is not actually a repeat, but a follow-on
> save.  Word and friends work in an interesting fashion.  You open the
> file, and it loads the base text blob described above.  You start
> making your changes.  Those changes go into an undo buffer.  That undo
> buffer is actually backstored on the disk in temporary files.
> (That's why a directory containing Word files people are busy editing
> accumulates lots of odd temp files until they close the original.)
> 
>   When you invoke "Save", the undo buffer -- essentially like a "diff"
> -- gets tacked on to the end of the main file.  This made saves fast
> on slow computers already overburdened by Microsoft bloatware.  Loads
> were slower, of course, but the reasoning was that people care about
> save speed more than load speed.As you can imagine, if there are
> lots of saves, rebuilding the text is not so easy as running
> strings(1) on it.
> 
>   In Word, if you turn off "Fast Saves", it writes out a full, unified
> version of the text instead.  This became the default at some point --
> I have no idea when.
> 
> > But the interesting thing was that I  couldn't easily find a Microsoft tool 
> > that
> > understood the format which originated with Windows 95 or an earlier version
> > of Windows.
> 
>   Start -> Programs -> Accessories -> WordPad
> 
>   My copy of Win XP Pro opens .WRI files automatically in WordPad.  I
> just double-click the file.
> 
>   WordPad is an optional component for Windows.  Perhaps the computer
> was installed with a "minimalist" attitude, so various optional tools
> were not there when you needed them?
> 
> -- Ben
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Alex Hewitt

On Wed, 2008-02-20 at 17:23 -0500, Ben Scott wrote:
> On Wed, Feb 20, 2008 at 3:58 PM, Alex Hewitt <[EMAIL PROTECTED]> wrote:
> >  A friend of ours wrote a bunch of recipe files using something called
> >  Microsoft Write.
> 
>   Yah, "Windows Write" is/was one of the "accessories" that came with
> Windows 3.x.  It morphed into "WordPad" in Windows 95 and later.
> WordPad still exists.  It won't write the Write (hah) format anymore,
> but it can read it, and save in some variant of the RTF format.
> 
> >  Theoretically Microsoft Word is supposed to be able to read such files
> >  but I found that the version I was using (Word 2003) wouldn't.
> 
>   Curious.  My install of Word 2003 can.  Are you sure you installed
> all the import/export filters?  If you did a "Minimal" or "Custom"
> install (instead of the mondo-huge "Full"), I don't think those are
> all included by default.

I sit corrected! ;^) Word 2003 complains about the file saying in effect
"this might be a virus but I have a converter that will convert it" and
it does. I think the original reason I wrote the filter was because our
friend didn't have Word and I didn't want to manually edit her 83 files.
I'll see if Word can be called from the command line to do the
converting.

-Alex

> 
> > Writing a filter in Python was trivial and I was  able to convert the
> > files to plain text.
> 
>   For future reference, the strings(1) command can be used to much the
> same effect.
> 
> > ... the file itself in ASCII, a series of bytes  again in non-ASCII,
> > followed by a repeat of some of the original ASCII.
> 
>   That sounds very similar to the MS Word .DOC format, and I bet
> they're related.  DOC files do not interleave the formatting with the
> text, as (for example) HTML or Word Perfect did.  Instead, all the
> plain text is stored in one blob, and then the formatting information
> is stored in a different blob.  The formatting directives have
> "pointers" to the position of the text they effect.
> 
>   The "repeat" you describe is not actually a repeat, but a follow-on
> save.  Word and friends work in an interesting fashion.  You open the
> file, and it loads the base text blob described above.  You start
> making your changes.  Those changes go into an undo buffer.  That undo
> buffer is actually backstored on the disk in temporary files.
> (That's why a directory containing Word files people are busy editing
> accumulates lots of odd temp files until they close the original.)
> 
>   When you invoke "Save", the undo buffer -- essentially like a "diff"
> -- gets tacked on to the end of the main file.  This made saves fast
> on slow computers already overburdened by Microsoft bloatware.  Loads
> were slower, of course, but the reasoning was that people care about
> save speed more than load speed.As you can imagine, if there are
> lots of saves, rebuilding the text is not so easy as running
> strings(1) on it.
> 
>   In Word, if you turn off "Fast Saves", it writes out a full, unified
> version of the text instead.  This became the default at some point --
> I have no idea when.
> 
> > But the interesting thing was that I  couldn't easily find a Microsoft tool 
> > that
> > understood the format which originated with Windows 95 or an earlier version
> > of Windows.
> 
>   Start -> Programs -> Accessories -> WordPad
> 
>   My copy of Win XP Pro opens .WRI files automatically in WordPad.  I
> just double-click the file.
> 
>   WordPad is an optional component for Windows.  Perhaps the computer
> was installed with a "minimalist" attitude, so various optional tools
> were not there when you needed them?
> 
> -- Ben
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Microsoft flooding sites with fake traffic

2008-02-20 Thread Coleman Kane
Coleman Kane wrote:
> Arc Riley wrote:
>   
>> Hey guys
>>
>> Do yourselves a favor and search your logs for connections from
>> 131.107.* 65.52.* 65.53.* 65.54.* and 65.55.*
>>
>> I found a good % of traffic we got, not reported to Google Analytics
>> so I didn't see it sooner, was referred from http://search.live.com/
>> for search queries involving pornography, cars, drugs, and random
>> gibberish.  The landing pages from these searches were subversion
>> changesets, source code in the Trac browser, and other places those
>> search queries certainly don't exist in.
>>
>> All of it, well 97.2%, from the above two subnets, belonging to
>> Microsoft.  It'd be humorous if I didn't just purchase a new colo
>> server to handle the large volume of traffic pysoy.org
>>  gets.  I can't tell if MS is trying to skew the
>> statistics in favor of MSIE/Live/etc or if it's conducting a denial of
>> service attack against free software project sites, perhaps both (two
>> birds with one stone?).
>>
>> If you see the similar childish behavior in your logs, please join me
>> in blocking them and being very vocal as to why.
>>
>> 
> An interesting find. I just checked my sites and I see the same thing,
> however most of the search queries seem to be pretty pertinent to the
> content of the pages that they reference. It is almost like theres some
> script running on a farm of windows computers that just performs
> single-word searches on their Windows LiveSearch database, and visits
> the results (posting, of course, the LiveSearch referral in the request).
>
> Here's my distribution:
>
> cat apachelogs/*  | grep live.com  | cut -d\  -f1 | cut -d. -f1,2 | sort
> | uniq -c | sort -rn
>
> 308 65.55
>  10 131.107
>   4 85.159
>   3 142.161
>   2 71.164
>   2 68.95
>   2 4.246
>   2 207.224
>   1 86.144
>   1 84.202
>
> There are many, many more with single visits, but I left them off the
> list because they probably represent normal livesearch users.
>
> --
> Coleman Kane
>   
Went a little further and found that all my 65.55 traffic comes from the
65.55.165 class C. I decided to pass all the visitors to the host
program and found that all of the visitors have PTR records like this:
livebot-65-55-165-87.search.live.com. The 131.107 traffic was all from
two machines: tide525.microsoft.com and tide526.microsoft.com

Maybe some others could look at their logs and pull information on the
other subnets?

--
Coleman Kane

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Microsoft flooding sites with fake traffic

2008-02-20 Thread Coleman Kane
Arc Riley wrote:
> Hey guys
>
> Do yourselves a favor and search your logs for connections from
> 131.107.* 65.52.* 65.53.* 65.54.* and 65.55.*
>
> I found a good % of traffic we got, not reported to Google Analytics
> so I didn't see it sooner, was referred from http://search.live.com/
> for search queries involving pornography, cars, drugs, and random
> gibberish.  The landing pages from these searches were subversion
> changesets, source code in the Trac browser, and other places those
> search queries certainly don't exist in.
>
> All of it, well 97.2%, from the above two subnets, belonging to
> Microsoft.  It'd be humorous if I didn't just purchase a new colo
> server to handle the large volume of traffic pysoy.org
>  gets.  I can't tell if MS is trying to skew the
> statistics in favor of MSIE/Live/etc or if it's conducting a denial of
> service attack against free software project sites, perhaps both (two
> birds with one stone?).
>
> If you see the similar childish behavior in your logs, please join me
> in blocking them and being very vocal as to why.
>
An interesting find. I just checked my sites and I see the same thing,
however most of the search queries seem to be pretty pertinent to the
content of the pages that they reference. It is almost like theres some
script running on a farm of windows computers that just performs
single-word searches on their Windows LiveSearch database, and visits
the results (posting, of course, the LiveSearch referral in the request).

Here's my distribution:

cat apachelogs/*  | grep live.com  | cut -d\  -f1 | cut -d. -f1,2 | sort
| uniq -c | sort -rn

308 65.55
 10 131.107
  4 85.159
  3 142.161
  2 71.164
  2 68.95
  2 4.246
  2 207.224
  1 86.144
  1 84.202

There are many, many more with single visits, but I left them off the
list because they probably represent normal livesearch users.

--
Coleman Kane

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Ben Scott
On Wed, Feb 20, 2008 at 3:58 PM, Alex Hewitt <[EMAIL PROTECTED]> wrote:
>  A friend of ours wrote a bunch of recipe files using something called
>  Microsoft Write.

  Yah, "Windows Write" is/was one of the "accessories" that came with
Windows 3.x.  It morphed into "WordPad" in Windows 95 and later.
WordPad still exists.  It won't write the Write (hah) format anymore,
but it can read it, and save in some variant of the RTF format.

>  Theoretically Microsoft Word is supposed to be able to read such files
>  but I found that the version I was using (Word 2003) wouldn't.

  Curious.  My install of Word 2003 can.  Are you sure you installed
all the import/export filters?  If you did a "Minimal" or "Custom"
install (instead of the mondo-huge "Full"), I don't think those are
all included by default.

> Writing a filter in Python was trivial and I was  able to convert the
> files to plain text.

  For future reference, the strings(1) command can be used to much the
same effect.

> ... the file itself in ASCII, a series of bytes  again in non-ASCII,
> followed by a repeat of some of the original ASCII.

  That sounds very similar to the MS Word .DOC format, and I bet
they're related.  DOC files do not interleave the formatting with the
text, as (for example) HTML or Word Perfect did.  Instead, all the
plain text is stored in one blob, and then the formatting information
is stored in a different blob.  The formatting directives have
"pointers" to the position of the text they effect.

  The "repeat" you describe is not actually a repeat, but a follow-on
save.  Word and friends work in an interesting fashion.  You open the
file, and it loads the base text blob described above.  You start
making your changes.  Those changes go into an undo buffer.  That undo
buffer is actually backstored on the disk in temporary files.
(That's why a directory containing Word files people are busy editing
accumulates lots of odd temp files until they close the original.)

  When you invoke "Save", the undo buffer -- essentially like a "diff"
-- gets tacked on to the end of the main file.  This made saves fast
on slow computers already overburdened by Microsoft bloatware.  Loads
were slower, of course, but the reasoning was that people care about
save speed more than load speed.As you can imagine, if there are
lots of saves, rebuilding the text is not so easy as running
strings(1) on it.

  In Word, if you turn off "Fast Saves", it writes out a full, unified
version of the text instead.  This became the default at some point --
I have no idea when.

> But the interesting thing was that I  couldn't easily find a Microsoft tool 
> that
> understood the format which originated with Windows 95 or an earlier version
> of Windows.

  Start -> Programs -> Accessories -> WordPad

  My copy of Win XP Pro opens .WRI files automatically in WordPad.  I
just double-click the file.

  WordPad is an optional component for Windows.  Perhaps the computer
was installed with a "minimalist" attitude, so various optional tools
were not there when you needed them?

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Jerry Feldman
On Wed, 20 Feb 2008 15:58:53 -0500
Alex Hewitt <[EMAIL PROTECTED]> wrote:

> A friend of ours wrote a bunch of recipe files using something called
> Microsoft Write. Files created with that tool have a .wri extension.
> Theoretically Microsoft Word is supposed to be able to read such files
> but I found that the version I was using (Word 2003) wouldn't. So I
> opened a few of the files with a binary editor and found that every file
> had an 84 hex byte prefix, the file itself in ASCII, a series of bytes
> again in non-ASCII, followed by a repeat of some of the original ASCII.
> Writing a filter in Python was trivial and I was  able to convert the
> files to plain text. Of course some of the lines were no run-on but
> overall the cleanup was simple. But the interesting thing was that I
> couldn't easily find a Microsoft tool that understood the format which
> originated with Windows 95 or an earlier version of Windows. Along the
> way Microsoft had basically given up on the format. I'm sure somewhere
> there is a tool that can read those files short of the original platform
> but we're only talking about perhaps a ten year span since the files
> were created and now are not readily readable.

This is one problem with proprietary formats.  Microsoft dropped
Microsoft Write a while back and as you found out, did not provide
filters in newer versions of Word. This is the type of thing that ODF
is designed to solve. Consider governments with a myriad of
departments, and a lack of intelligible communication between them. One
department might have had low end PCs with Microsoft Write because the
department did not have a budget for Word. Another department might use
another tool, such as Word Perfect.  Unfortunately, Microsoft's OOXML
is not really an open standard. BTW: There is a source forge project
for a Microsoft Write filter. 

-- 
--
Jerry Feldman <[EMAIL PROTECTED]>
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


signature.asc
Description: PGP signature
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Microsoft flooding sites with fake traffic

2008-02-20 Thread Arc Riley
Hey guys

Do yourselves a favor and search your logs for connections from 131.107.*
65.52.* 65.53.* 65.54.* and 65.55.*

I found a good % of traffic we got, not reported to Google Analytics so I
didn't see it sooner, was referred from http://search.live.com/ for search
queries involving pornography, cars, drugs, and random gibberish.  The
landing pages from these searches were subversion changesets, source code in
the Trac browser, and other places those search queries certainly don't
exist in.

All of it, well 97.2%, from the above two subnets, belonging to Microsoft.
It'd be humorous if I didn't just purchase a new colo server to handle the
large volume of traffic pysoy.org gets.  I can't tell if MS is trying to
skew the statistics in favor of MSIE/Live/etc or if it's conducting a denial
of service attack against free software project sites, perhaps both (two
birds with one stone?).

If you see the similar childish behavior in your logs, please join me in
blocking them and being very vocal as to why.
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Re: Free Software Replacement for Maple

2008-02-20 Thread paul.cour1
Statistics Open Source ...

My vote is for " R "
An amazing tool for statistics at:
http://www.r-project.org/

paulc


>From: Coleman Kane <[EMAIL PROTECTED]>
>Date: 2008/02/20 Wed PM 12:26:26 CST
>To: Gurhan <[EMAIL PROTECTED]>
>Cc: gnhlug-discuss@mail.gnhlug.org
>Subject: Re: Free Software Replacement for Maple

>Gurhan wrote:
>> On Wed, Feb 20, 2008 at 12:40 PM, Lori Nagel <[EMAIL PROTECTED]> wrote:
>>   
>>> Does anyone know of a Free-Software replacement for Maple?  My husband is
>>> taking a an electrical engineering graduate level statistics  class and says
>>> he needs  it to do  some of his  homework.  Having never gotten far enough
>>> in the maths myself,  I'm not  really sure what features it needs.  All I
>>> know is proprietary license keys are a real pain.
>>>
>>> 
>>
>>   Does it have to be maple-compatible? I mean will he need to turn in
>> maple code for his assignments? If you are just looking for a
>> mathematical software GNU Octave is an excellent one. It's intended to be
>> a free software clone of Matlab, and does the job pretty good.
>>
>> http://www.octave.org
>>
>> Thanks,
>> gurhan
>>   
>I've used Maxima in the past, and it has proven to be pretty good for me
>when I needed it.
>http://maxima.sourceforge.net/
>
>For a smaller CAS (a glorified solver), I like Mathomatic:
>http://mathomatic.orgserve.de/math/
>
>--
>Coleman Kane
>___
>gnhlug-discuss mailing list
>gnhlug-discuss@mail.gnhlug.org
>http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Alex Hewitt

On Wed, 2008-02-20 at 13:18 -0500, Ben Scott wrote:
> On Wed, Feb 20, 2008 at 11:52 AM, Michael ODonnell
> <[EMAIL PROTECTED]> wrote:
> >  Quite the tangled mess and very hard to write compliant FOSS
> >  apps against, but (at least on the surface) apparently not
> >  the result of an actively evil intent.
> 
>   A-yup.  Lots of people (me included) have been saying that for
> years.  It really comes down to Hanlon's Razor: "Never attribute to
> malice that which can be adequately explained by stupidity".  And
> let's face it, Microsoft has plenty enough stupidity to go around.
> 
>   In many ways, Microsoft suffers from the result as much as others.
> Can you imagine what having to work with the Windows or Office source
> code must be like?  Code going back decades, much of it poorly
> documented, coding practices evolving with time and marketing fads,
> early stuff written by people who clearly had no clue about how to
> design proper systems... it's a wonder it works at all.  (One could
> argue it doesn't.)  One of the original goals in Vista was to replace
> the legacy code still doing important stuff.  After struggling for two
> years, they *gave up*.  Microsoft's can afford more resources that
> just about any software development effort, and they still couldn't
> figure it out.

A friend of ours wrote a bunch of recipe files using something called
Microsoft Write. Files created with that tool have a .wri extension.
Theoretically Microsoft Word is supposed to be able to read such files
but I found that the version I was using (Word 2003) wouldn't. So I
opened a few of the files with a binary editor and found that every file
had an 84 hex byte prefix, the file itself in ASCII, a series of bytes
again in non-ASCII, followed by a repeat of some of the original ASCII.
Writing a filter in Python was trivial and I was  able to convert the
files to plain text. Of course some of the lines were no run-on but
overall the cleanup was simple. But the interesting thing was that I
couldn't easily find a Microsoft tool that understood the format which
originated with Windows 95 or an earlier version of Windows. Along the
way Microsoft had basically given up on the format. I'm sure somewhere
there is a tool that can read those files short of the original platform
but we're only talking about perhaps a ten year span since the files
were created and now are not readily readable.

-Alex

> 
>   Of course, many people still put their critical data in that mess.
> Now *that's* scary.  
> 
> -- Ben
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Free Software Replacement for Maple

2008-02-20 Thread Michael Costolo
On Wed, Feb 20, 2008 at 12:40 PM, Lori Nagel <[EMAIL PROTECTED]> wrote:

> Does anyone know of a Free-Software replacement for Maple?  My husband is
> taking a an electrical engineering graduate level statistics  class and says
> he needs  it to do  some of his  homework.  Having never gotten far enough
> in the maths myself,  I'm not  really sure what features it needs.  All I
> know is proprietary license keys are a real pain.
>
> --
>
>
For statistics you can't go wrong with R (http://www.r-project.org/).

-- 
"America is at that awkward stage. It's too late to work within the system,
but too early to shoot the bastards."
--Claire Wolfe
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Tom Buskey
On Wed, Feb 20, 2008 at 1:18 PM, Ben Scott <[EMAIL PROTECTED]> wrote:

> On Wed, Feb 20, 2008 at 11:52 AM, Michael ODonnell
> <[EMAIL PROTECTED]> wrote:
> >  Quite the tangled mess and very hard to write compliant FOSS
> >  apps against, but (at least on the surface) apparently not
> >  the result of an actively evil intent.
>
>  A-yup.  Lots of people (me included) have been saying that for
> years.  It really comes down to Hanlon's Razor: "Never attribute to
> malice that which can be adequately explained by stupidity".  And
> let's face it, Microsoft has plenty enough stupidity to go around.
>
>  In many ways, Microsoft suffers from the result as much as others.
> Can you imagine what having to work with the Windows or Office source
> code must be like?  Code going back decades, much of it poorly
> documented, coding practices evolving with time and marketing fads,
> early stuff written by people who clearly had no clue about how to
> design proper systems... it's a wonder it works at all.  (One could


IIRC Microsoft used some of the Samba docs to document the early LAN Manager
stuff internally.

The Linux kernel can (mostly) throw out old binary interfaces/APIs/etc.
They have the source and can (mostly) regenerate the drivers, etc for the
newer stuff (mostly).  Windows doesn't have that luxury so there's lots of
cruft to support the old ways.  And there's lots more written on top of
Windows that has used that old cruft because it wasn't documented, etc.
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Jon 'maddog' Hall
>One of the original goals in Vista was to replace
>the legacy code still doing important stuff.  After struggling for two
>years, they *gave up*.

I can relate to that.  Digital, at one time, tried to eliminate one of
the arcane data addressing modes in the VAX architecture.  They checked
with all the DEC compiler groups, all the third party compiler companies
and even with the GNU, BSD and AT&T compiler groups, and none of these
compilers generated any instructions that would use that addressing
mode.

They were just about to make their changes to the chips (saving a lot of
silicon space and testing time when I mentioned assembly-language
programs that might use it.pppthey found several
programs in the wild that did use that addressing mode...so they had to
maintain it.

And before people say "emulation", I remember that we considered
emulation, and emulating that particular addressing mode (and I can not
remember which one it was) would have been very ugly indeed.

(sigh)

md
-- 
Jon "maddog" Hall
Executive Director   Linux International(R)
email: [EMAIL PROTECTED] 80 Amherst St. 
Voice: +1.603.672.4557   Amherst, N.H. 03031-3032 U.S.A.
WWW: http://www.li.org

Board Member: Uniforum Association
Board Member Emeritus: USENIX Association (2000-2006)

(R)Linux is a registered trademark of Linus Torvalds in several
countries.
(R)Linux International is a registered trademark in the USA used
pursuant
   to a license from Linux Mark Institute, authorized licensor of Linus
   Torvalds, owner of the Linux trademark on a worldwide basis
(R)UNIX is a registered trademark of The Open Group in the USA and other
   countries.


___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Free Software Replacement for Maple

2008-02-20 Thread Coleman Kane
Gurhan wrote:
> On Wed, Feb 20, 2008 at 12:40 PM, Lori Nagel <[EMAIL PROTECTED]> wrote:
>   
>> Does anyone know of a Free-Software replacement for Maple?  My husband is
>> taking a an electrical engineering graduate level statistics  class and says
>> he needs  it to do  some of his  homework.  Having never gotten far enough
>> in the maths myself,  I'm not  really sure what features it needs.  All I
>> know is proprietary license keys are a real pain.
>>
>> 
>
>   Does it have to be maple-compatible? I mean will he need to turn in
> maple code for his assignments? If you are just looking for a
> mathematical software GNU Octave is an excellent one. It's intended to be
> a free software clone of Matlab, and does the job pretty good.
>
> http://www.octave.org
>
> Thanks,
> gurhan
>   
I've used Maxima in the past, and it has proven to be pretty good for me
when I needed it.
http://maxima.sourceforge.net/

For a smaller CAS (a glorified solver), I like Mathomatic:
http://mathomatic.orgserve.de/math/

--
Coleman Kane
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Ben Scott
On Wed, Feb 20, 2008 at 11:52 AM, Michael ODonnell
<[EMAIL PROTECTED]> wrote:
>  Quite the tangled mess and very hard to write compliant FOSS
>  apps against, but (at least on the surface) apparently not
>  the result of an actively evil intent.

  A-yup.  Lots of people (me included) have been saying that for
years.  It really comes down to Hanlon's Razor: "Never attribute to
malice that which can be adequately explained by stupidity".  And
let's face it, Microsoft has plenty enough stupidity to go around.

  In many ways, Microsoft suffers from the result as much as others.
Can you imagine what having to work with the Windows or Office source
code must be like?  Code going back decades, much of it poorly
documented, coding practices evolving with time and marketing fads,
early stuff written by people who clearly had no clue about how to
design proper systems... it's a wonder it works at all.  (One could
argue it doesn't.)  One of the original goals in Vista was to replace
the legacy code still doing important stuff.  After struggling for two
years, they *gave up*.  Microsoft's can afford more resources that
just about any software development effort, and they still couldn't
figure it out.

  Of course, many people still put their critical data in that mess.
Now *that's* scary.  

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Free Software Replacement for Maple

2008-02-20 Thread Gurhan
On Wed, Feb 20, 2008 at 12:40 PM, Lori Nagel <[EMAIL PROTECTED]> wrote:
> Does anyone know of a Free-Software replacement for Maple?  My husband is
> taking a an electrical engineering graduate level statistics  class and says
> he needs  it to do  some of his  homework.  Having never gotten far enough
> in the maths myself,  I'm not  really sure what features it needs.  All I
> know is proprietary license keys are a real pain.
>

  Does it have to be maple-compatible? I mean will he need to turn in
maple code for his assignments? If you are just looking for a
mathematical software GNU Octave is an excellent one. It's intended to be
a free software clone of Matlab, and does the job pretty good.

http://www.octave.org

Thanks,
gurhan

>
>  
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.
> ___
>  gnhlug-discuss mailing list
>  gnhlug-discuss@mail.gnhlug.org
>  http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>
>
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Free Software Replacement for Maple

2008-02-20 Thread Lori Nagel
Does anyone know of a Free-Software replacement for Maple?  My husband is 
taking a an electrical engineering graduate level statistics  class and says he 
needs  it to do  some of his  homework.  Having never gotten far enough in the 
maths myself,  I'm not  really sure what features it needs.  All I know is 
proprietary license keys are a real pain. 

   
-
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


ARTICLE - Why the MS Office file formats is so complicated

2008-02-20 Thread Michael ODonnell

A description of the layout of the recently published Microsoft
Office file formats along with some illuminating comments
about the various historical influences that lead up to their
current states:

   http://www.joelonsoftware.com/items/2008/02/19.html

Quite the tangled mess and very hard to write compliant FOSS
apps against, but (at least on the surface) apparently not
the result of an actively evil intent.
 
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


You though *you* had power problems?

2008-02-20 Thread Paul Lussier


 http://tinyurl.com/2rsngr
-- 
Seeya,
Paul
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Boston Linux Meeting Tonight, February 20, 2008 Rooftop WLAN Redux

2008-02-20 Thread Jerry Feldman
When: February 20, 2008 7:00PM (6:30 for Q&A)
Topic: Rooftop WLAN Redux
Moderator: Kurt Keville, Systems Admin, MIT Clinical Research Center 
Location:  MIT Building E51 Room 335 (Note room change)

Kurt discusses progress on the Rooftop WLAN project at MIT, and today's
best practices in wireless Linux. 
Wireless Access Points have become powerful embedded Linux appliances
in recent years. Much of the functionality of standalone boxes or
general servers that would reside at a layer above the wireless access
equipment has now been brought down to the "first mile" point. Access
to these features have, in turn, inspired programmers to do
considerable development in packet routing protocols and network
topology designs. Kurt Keville will give an overview of contemporary
best practices in wireless Linux with particular emphasis on his
involvement with the Rooftop WLAN project at MIT. He will put special
emphasis on advanced routing features and non-RFC compliant
functionality which has been the primary technology enabler in many
Municipal Wireless initiatives.


For much more information, and Parking please
refer to http://www.blu.org/cgi-bin/calendar/2007-dec
There is a parking lot adjacent to the E-51 at 2 Amherst St. 

Note: The after-meeting meeting will be at The Cambridge Brewery. 

--
Jerry Feldman <[EMAIL PROTECTED]>
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


signature.asc
Description: PGP signature
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


[GNHLUG] MerriLUG Nashua, Thur 21 Feb, MySQL: The Whys, Whats, and Watch-outs

2008-02-20 Thread Jim Kuzdrall

Who  : Marc Nozell, MySQL Conference presenter, "Officially" certified
What : MySQL: The Whys, Whats, and Watch-outs 
Where: Martha's Exchange
Day  : Thur 21 Feb **Tomorrow**
Time : 6:00 PM for grub, 7:30 PM for discussion (usually upstairs)

:: Overview

     Marc Nozell will review the MySQL database system complete with 
some "whys and wherefores" that reference books overlook.  Marc shares 
with us the "gotchas", under-emphasized essential facts, and 
"why-this-not-that" savvy that comes from professional applications 
experience.

 What can MySQL do for me?  What books are recommended?  What is an 
API, and do I need one?  How does command-line and in-program coding 
differ?  Under what circumstances should I use a GUI?  Which one?  What 
kind of data can I store?  When shouldn't I use MySQL?

Have you been using MySQL?  Marc may have the answer to that vexing 
question or a time-saving tip for your specific application. 

 >>> RSVP to Jim Kuzdrall for dinner to assure adequate seating. <<<
 !!! If you are not a "Regular Attendee" (50%), please let me know. !!!

Driving directions:
http://wiki.gnhlug.org/twiki2/bin/view/Www/PlaceMarthasExchange

Thanks,

Jim Kuzdrall
[EMAIL PROTECTED]

___
gnhlug-announce mailing list
[EMAIL PROTECTED]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-announce/

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/