Re: [Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Mark Sapiro
On 11/30/2015 02:37 PM, Dave Arndt wrote:
> 
> The mystery is this:  How is it possible to edit out text from an mbox
> file, verify that it is NOT there with grep, then see it reappear in the
> resulting html file when bin/arch is run?
> 
> It's almost as if editing the file with VI left the original text and
> only hid it with escape sequences or something.


But vi doesn't do that.


> Whatever it is... the mbox file that got uploaded to the new site HAD to
> have had the original text, even though that text does not show up with
> when running grep against that same mbox file (and is also not visible
> when editing the same file with VI)... 
> 
> Strange.


Yes it's strange, and if everything is as you say, I can't explain it.

If you want to investigate further, create a new list on your local
installation and then run

bin/arch --wipe NEW_LISTNAME /path/to/edited/mbox

and see what appears in NEW_LISTNAME's archive. If the elided text is
not there, then I suggest that the archive wasn't built on the new
server with the same mbox. If the edlided text is there, it must be in
the mbox. berhaps grep doesn't find it because it is split across lines
or some other reason.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Dave Arndt
In this case, there are NO existing HTML files.  It's being rebuilt on a
brand new installation.

I also just heard from the person managing the site that they did use the
--wipe option (so I guess that's all moot).

The mystery is this:  How is it possible to edit out text from an mbox
file, verify that it is NOT there with grep, then see it reappear in the
resulting html file when bin/arch is run?

It's almost as if editing the file with VI left the original text and only
hid it with escape sequences or something.

Whatever it is... the mbox file that got uploaded to the new site HAD to
have had the original text, even though that text does not show up with
when running grep against that same mbox file (and is also not visible when
editing the same file with VI)...

Strange.






On Mon, Nov 30, 2015 at 5:03 PM, Mark Sapiro  wrote:

> On 11/30/2015 01:22 PM, Dave Arndt wrote:
> >
> > I did not do the rebuild myself - but I would assume they just ran
> bin/arch
> >
> > How would the text re-appear if it was removed, as per step #1?
> >
> > in other words, How would the text still be in the file after removing
> > it, and it doesn't appear with grep?
>
>
> If you have an existing HTML archive and the corresponding .mbox, and
> you run bin/arch without --wipe, every message in the mbox will be added
> to the HTML archive, but they won't be indexed because the Message-IDs
> are duplicates.
>
> For example if there are a total of 10 messages in the archive, the HTML
> messages will have names like 00.html, 01.html, ...,
> 09.html. If you then run bin/arch without --wipe, you will add files
> 10.html, 11,html, ..., 19.html which may or may not be a bit
> different if you modified the mbox. Now, when the archiver added say
> 10.html, its Message-ID is the same as that of 00.html, so it
> won't be indexed and the index will still point to 00.html.
>
> The answer is if you want to rebuild and archive and not just add to it,
> you have to use --wipe to remove the existing HTML archive before adding.
>
> --
> Mark Sapiro The highway is for gamblers,
> San Francisco Bay Area, Californiabetter use your sense - B. Dylan
>
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Mark Sapiro
On 11/30/2015 01:22 PM, Dave Arndt wrote:
> 
> I did not do the rebuild myself - but I would assume they just ran bin/arch
> 
> How would the text re-appear if it was removed, as per step #1?
> 
> in other words, How would the text still be in the file after removing
> it, and it doesn't appear with grep?


If you have an existing HTML archive and the corresponding .mbox, and
you run bin/arch without --wipe, every message in the mbox will be added
to the HTML archive, but they won't be indexed because the Message-IDs
are duplicates.

For example if there are a total of 10 messages in the archive, the HTML
messages will have names like 00.html, 01.html, ...,
09.html. If you then run bin/arch without --wipe, you will add files
10.html, 11,html, ..., 19.html which may or may not be a bit
different if you modified the mbox. Now, when the archiver added say
10.html, its Message-ID is the same as that of 00.html, so it
won't be indexed and the index will still point to 00.html.

The answer is if you want to rebuild and archive and not just add to it,
you have to use --wipe to remove the existing HTML archive before adding.

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Dave Arndt
On Mon, Nov 30, 2015 at 4:15 PM, Mark Sapiro  wrote:

>
> Exactly what did you do at this step. If you used bin/arch did you use
> the --wipe option. If not, you didn't remove anything from an existing
> archive
>

Hi Mark,

Thanks for the speedy reply.  The steps were as outlined.  As for the last
step...

I did not do the rebuild myself - but I would assume they just ran bin/arch

How would the text re-appear if it was removed, as per step #1?

in other words, How would the text still be in the file after removing it,
and it doesn't appear with grep?
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


Re: [Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Mark Sapiro
On 11/30/2015 01:09 PM, Dave Arndt wrote:
> This is an odd one. I'm hoping there's a straight forward answer:
> 
> 1) I edited our mbox archive (with vi) to remove some offensive content.
> (x, xx)
> 
> 2) Saved the file.
> 
> 3) Did a *grep* on the text that was removed - not found.
> 
> 4) Uploaded the mbox file to a brand new server
> 
> 5) Rebuilt the html archives using that mbox file.


Exactly what did you do at this step. If you used bin/arch did you use
the --wipe option. If not, you didn't remove anything from an existing
archive

-- 
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org


[Mailman-Users] Text "Reappears" when MBox Archive Rebuilt

2015-11-30 Thread Dave Arndt
This is an odd one. I'm hoping there's a straight forward answer:

1) I edited our mbox archive (with vi) to remove some offensive content.
(x, xx)

2) Saved the file.

3) Did a *grep* on the text that was removed - not found.

4) Uploaded the mbox file to a brand new server

5) Rebuilt the html archives using that mbox file.

The text that was deleted in step #1 appears in the newly generated HTML
archives!!!

How could this be possible?

Out of paranoia I went back to the mbox file that I uploaded, grep'd again
- and also loaded the file into vi and searched for any of the text that
was removed.  Not found.

Any ideas?  This seems really, really strange.

- Caught in a Parallel Universe
--
Mailman-Users mailing list Mailman-Users@python.org
https://mail.python.org/mailman/listinfo/mailman-users
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-users/archive%40jab.org