Re: downloading web sites

2000-04-27 Thread AlphaByte

Thanks Edward,

I found another Java based application that seems to work pretty well on Linux.

One problem I found with wget was that it did not like some directory formats,
in particular those with slashes that go the Win way (/ cf \). And Vidiot was a
little correct in that some files did not download because they seemed to be
called by some kind of CGI script. However this application gets around all
these issues and I don't have to bother anyone who uses Win to get it for me.
It is called WEBsaver -- it does need a runtime installed, I use Blackdown.

Incidentally if you have a preference for Win I did find a freeware app called
SiteStripper. Seems to be fast and complete.

Alan

On Wed, 26 Apr 2000, you wrote regarding Re: downloading web sites:
> Under Windoze there is a pretty good program called SiteSnagger from PC 
> Mag--it's free.  I've snagged some pretty big sites with it.

> I should add that it has an option to download multimedia and then fix the 
> links and indexes it so it will work on an independent environment.

-- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-25 Thread Thomas Ribbrock \(Design/DEG\)

On Sat, Apr 22, 2000 at 07:40:11AM -0500, Vidiot wrote:
[...] 
> There are no man pages for wget, only info pages, which I hate.  The interface
> to traverse info pages is not intuitive.  It is easy to get lost, hard to get
> back to where you were, etc.
[...]

Hint: "info2html" comes in most helpful in these cases... :-)

HTH,

Thomas
-- 
 "Look, Ma, no obsolete quotes and plain text only!"

 Thomas Ribbrock | http://www.bigfoot.com/~kaytan | ICQ#: 15839919
   "You have to live on the edge of reality - to make your dreams come true!"


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-24 Thread AlphaByte

On Sun, 23 Apr 2000, brian davison wrote regarding Re: downloading web sites:
> Mosaic had a function to do what you are suggesting.  I saw an rpm for
> mosaic o one of the sites...  haven't tried it on linux yet tho.
> brian ;)
> ***

Actually I have an old old PC with Win3.11 on it that has Mosaic, I'll have a
look. But, do they still make Mosaic?? ;-)

Alan

-- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-24 Thread AlphaByte

On Sun, 23 Apr 2000, Vidiot wrote regarding Re: downloading web sites:
> >On the contrary, at this time most sites are HTML based and not XML. The ones I
> >am interested in not likely to be the kind of dynamically driven sites you are
> >referring to, and anyway if that is the case I have another method of
> >extracting that info (which is unfortunately not available on my Linux box and
> >so I am using a friend who uses Windows) -- I just want this because it is
> >preferrable to that other method.
> >Alan
> 
> I wasn't talking about XML.  I'm talking about CGI programs that build the HTML
> on the fly.
> 
> I don't know what site you are after, so I do not know how complicated the
> HTML code is for the site.
> 
> If you were to try traversing my site and downloading it, it would take
> approximately 5 GB of space.  I wonder what other sites are like that are
> corporate sites.
> 
> Good luck.  You are going to need it.
> 
 Umm, thanks. Well yes I admit that these are an issue. But then I am not going
to do all my browsing in this way -- it is just to gather available material
that can be read off-line. Other material can be printed as pdf's using
Acrobat, but it is better to have the original stuff, I think.

Alan

 -- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-24 Thread AlphaByte

On Sun, 23 Apr 2000, Vidiot wrote regarding Re: downloading web sites:
> 
> There are no man pages for wget, only info pages, which I hate.  The interface
> to traverse info pages is not intuitive.  It is easy to get lost, hard to get
> back to where you were, etc.
> 

Oh yes there are. I read'em. And they were very helpful in the outset.

> Who thought up that weird interface anyway?
> 
> Oh great, I just tried "info wget" and even that doesn't work.  The wget
> info pages are installed in /usr/local/info and info won't find it.  I
> probably need an environment variable to make info wrap in stuff from there,
> but even that won't help the weird (a polite word for what I think of the
> interface) info command set.
> 
> I'll print a manual before I try getting lost with info.
> 



-- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-23 Thread Vidiot

>Vidiot wrote:
>> There are no man pages for wget, only info pages, which I hate.  
>
>wget has a man page on my computer...
>
>wget-1.5.3-6

Must have been added, since I did not find one with wget-1.5.3.  Guess I
need to update.

Really doesn't matter, since the PostScript version is much better anyway.

MB
-- 
e-mail: [EMAIL PROTECTED]
Bart: Hey, why is it destroying other toys?  Lisa: They must have
programmed it to eliminate the competition.  Bart: You mean like
Microsoft?  Lisa: Exactly.  [The Simpsons - 12/18/99]
Visit - URL:http://www.vidiot.com/  (Your link to Star Trek and UPN)


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread Gordon Messmer

Vidiot wrote:
> There are no man pages for wget, only info pages, which I hate.  

wget has a man page on my computer...

wget-1.5.3-6


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread brian davison

Mosaic had a function to do what you are suggesting.  I saw an rpm for
mosaic o one of the sites...  haven't tried it on linux yet tho.
brian ;)
***



At 11:27 AM 4/22/00 +1200, you wrote:
>On Sat, 22 Apr 2000, you wrote regarding Re: downloading web sites:
>
>> 
>> You have to be joking.  A properly written website will cause all kinds of
>> trouble trying to view offline.  Let's see, for starters you won't have any
>> of the images.  Even if you did download all of them, it still might not work
>> as the pages may be written to access the web site with absolute links, vs.
>> relative links.  Then there is the problem is dynamically built web sites.
>> 
>Well, I think you just answered my question -- below :-)
>
>> You might be able to get away with it downloading my web site, because I
>> have it built for speed, minimal graphics so that the user can get what they
>> are after quickly.  There are many others that I have visited that will not
>> work very well at all offline.
>> 
>> I shudder at the thought of attempting to do what you are wanting to do.
>> 
>
>On the contrary, at this time most sites are HTML based and not XML. The ones I
>am interested in not likely to be the kind of dynamically driven sites you are
>referring to, and anyway if that is the case I have another method of
>extracting that info (which is unfortunately not available on my Linux box and
>so I am using a friend who uses Windows) -- I just want this because it is
>preferrable to that other method.
>
>Cheers
>Alan
>-- 
>AlphaByte: PO Box 1941, Auckland, New Zealand
>Specialising in:Graphic Design, Education and Training,
>Technical Documentation, Consulting.
>http://www.alphabyte.co.nz
>
>
>-- 
>To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
>as the Subject.
>
>
>
---
 [EMAIL PROTECTED]


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread Vidiot

>On Sat, 22 Apr 2000, you wrote regarding Re: downloading web sites:
>> wget
>> 
>> read the man page so you don't accidentally try to copy the
>> entire web to your harddrive. :-)

There are no man pages for wget, only info pages, which I hate.  The interface
to traverse info pages is not intuitive.  It is easy to get lost, hard to get
back to where you were, etc.

Who thought up that weird interface anyway?

Oh great, I just tried "info wget" and even that doesn't work.  The wget
info pages are installed in /usr/local/info and info won't find it.  I
probably need an environment variable to make info wrap in stuff from there,
but even that won't help the weird (a polite word for what I think of the
interface) info command set.

I'll print a manual before I try getting lost with info.

MB
-- 
e-mail: [EMAIL PROTECTED]
Bart: Hey, why is it destroying other toys?  Lisa: They must have
programmed it to eliminate the competition.  Bart: You mean like
Microsoft?  Lisa: Exactly.  [The Simpsons - 12/18/99]
Visit - URL:http://www.vidiot.com/  (Your link to Star Trek and UPN)


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread Vidiot

>On the contrary, at this time most sites are HTML based and not XML. The ones I
>am interested in not likely to be the kind of dynamically driven sites you are
>referring to, and anyway if that is the case I have another method of
>extracting that info (which is unfortunately not available on my Linux box and
>so I am using a friend who uses Windows) -- I just want this because it is
>preferrable to that other method.
>Alan

I wasn't talking about XML.  I'm talking about CGI programs that build the HTML
on the fly.

I don't know what site you are after, so I do not know how complicated the
HTML code is for the site.

If you were to try traversing my site and downloading it, it would take
approximately 5 GB of space.  I wonder what other sites are like that are
corporate sites.

Good luck.  You are going to need it.

MB
-- 
e-mail: [EMAIL PROTECTED]
Bart: Hey, why is it destroying other toys?  Lisa: They must have
programmed it to eliminate the competition.  Bart: You mean like
Microsoft?  Lisa: Exactly.  [The Simpsons - 12/18/99]
Visit - URL:http://www.vidiot.com/  (Your link to Star Trek and UPN)


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread AlphaByte

Peter and Mike,

Thanks guys, looks like what I am looking for.

Alan

On Sat, 22 Apr 2000, you wrote regarding Re: downloading web sites:
> wget
> 
> read the man page so you don't accidentally try to copy the
> entire web to your harddrive. :-)
> 

> I believe wget will do what you are looking for.  I've only used it a time
> or two, I'm sure someone else on the list can help you more than me.  Or of
> course check the man pages
> 
-- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.




Re: downloading web sites

2000-04-22 Thread AlphaByte

On Sat, 22 Apr 2000, you wrote regarding Re: downloading web sites:

> 
> You have to be joking.  A properly written website will cause all kinds of
> trouble trying to view offline.  Let's see, for starters you won't have any
> of the images.  Even if you did download all of them, it still might not work
> as the pages may be written to access the web site with absolute links, vs.
> relative links.  Then there is the problem is dynamically built web sites.
> 
Well, I think you just answered my question -- below :-)

> You might be able to get away with it downloading my web site, because I
> have it built for speed, minimal graphics so that the user can get what they
> are after quickly.  There are many others that I have visited that will not
> work very well at all offline.
> 
> I shudder at the thought of attempting to do what you are wanting to do.
> 

On the contrary, at this time most sites are HTML based and not XML. The ones I
am interested in not likely to be the kind of dynamically driven sites you are
referring to, and anyway if that is the case I have another method of
extracting that info (which is unfortunately not available on my Linux box and
so I am using a friend who uses Windows) -- I just want this because it is
preferrable to that other method.

Cheers
Alan
-- 
AlphaByte: PO Box 1941, Auckland, New Zealand
Specialising in:Graphic Design, Education and Training,
Technical Documentation, Consulting.
http://www.alphabyte.co.nz


-- 
To unsubscribe: mail [EMAIL PROTECTED] with "unsubscribe"
as the Subject.