Re: [CODE4LIB] archiving a wiki

2012-05-29 Thread Kåre Fiedler Christiansen
A thing to be careful of when web harvesting a wiki, is that it may harvest 
mope than you bargained for.

Most wikis (I don't know JSPWiki, sorry) can present earlier versions of pages, 
diffs, indexes, and sometimes the same pages under different URLs. This may or 
may not be what you want.

For taking a snapshot of the current versions of all pages on a wiki, I have 
had luck with using an index page (that is: a list of all pages in the wiki, 
not index.html) and harvesting that page with a recursion depth of one. This 
may or may not help you :-)

$ wget --html-extension -r -k -l1 wiki-index-page-url

Best of luck,
  Kåre

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Tom Keays
 Sent: Wednesday, May 23, 2012 3:27 PM
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] archiving a wiki
 
 I haven't tried it on a wiki, but the command-line Unix utility
 wget can be
 used to mirror a website.
 
 http://www.gnu.org/software/wget/manual/html_node/Advanced-
 Usage.html
 
 I usually call it like this:
 
 wget -m -p http://www.site.com/
 
 common flags:
-m = mirroring on/off
-p = page_requisites on/off
-c = continue - when download is interrupted
-l5 = reclevel - Recursion level (depth) default = 5
 
 On Tue, May 22, 2012 at 5:04 PM, Carol Hassler
 carol.hass...@wicourts.govwrote:
 
  My organization would like to archive/export our internal wiki in
 some
  kind of end-user friendly format. The concept is to copy the wiki
  contents annually to a format that can be used on any standard
 computer
  in case of an emergency (i.e. saved as an HTML web-style archive,
 saved
  as PDF files, saved as Word files).
 
  Another way to put it is that we are looking for a way to export
 the
  contents of the wiki into a printer-friendly format - to a
 document that
  maintains some organization and formatting and can be used on any
  standard computer.
 
  Is anybody aware of a tool out there that would allow for this
 sort of
  automated, multi-page export? Our wiki is large and we would
 prefer not
  to do this type of backup one page at a time. We are using
 JSPwiki, but
  I'm open to any option you think might work. Could any of the web
  harvesting products be adapted to do the job? Has anyone else
 backed up
  a wiki to an alternate format?
 
  Thanks!
 
 
  Carol Hassler
  Webmaster / Cataloger
  Wisconsin State Law Library
  (608) 261-7558
  http://wilawlibrary.gov/
 
 


Re: [CODE4LIB] archiving a wiki

2012-05-23 Thread raffaele messuti
On Tue, May 22, 2012 at 11:04 PM, Carol Hassler
carol.hass...@wicourts.gov wrote:
 My organization would like to archive/export our internal wiki in some
 kind of end-user friendly format. The concept is to copy the wiki
 contents annually to a format that can be used on any standard computer
 in case of an emergency (i.e. saved as an HTML web-style archive, saved
 as PDF files, saved as Word files).

take a look at wikiteam
their activity is mainly related to mediawiki
maybe they could help in a solution for your wiki

http://archiveteam.org/index.php?title=WikiTeam
http://code.google.com/p/wikiteam/

ciao

--
raffaele


Re: [CODE4LIB] archiving a wiki

2012-05-23 Thread Keith Jenkins
Many organizations are using Archive-It, the Internet Archive's
service for harvesting and preserving specific websites.  I think it
can be used to produce public or private archives.

http://www.archive-it.org/

Keith


On Tue, May 22, 2012 at 5:04 PM, Carol Hassler
carol.hass...@wicourts.gov wrote:
 My organization would like to archive/export our internal wiki in some
 kind of end-user friendly format. The concept is to copy the wiki
 contents annually to a format that can be used on any standard computer
 in case of an emergency (i.e. saved as an HTML web-style archive, saved
 as PDF files, saved as Word files).


Re: [CODE4LIB] archiving a wiki

2012-05-23 Thread Tom Keays
I haven't tried it on a wiki, but the command-line Unix utility wget can be
used to mirror a website.

http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html

I usually call it like this:

wget -m -p http://www.site.com/

common flags:
   -m = mirroring on/off
   -p = page_requisites on/off
   -c = continue - when download is interrupted
   -l5 = reclevel - Recursion level (depth) default = 5

On Tue, May 22, 2012 at 5:04 PM, Carol Hassler
carol.hass...@wicourts.govwrote:

 My organization would like to archive/export our internal wiki in some
 kind of end-user friendly format. The concept is to copy the wiki
 contents annually to a format that can be used on any standard computer
 in case of an emergency (i.e. saved as an HTML web-style archive, saved
 as PDF files, saved as Word files).

 Another way to put it is that we are looking for a way to export the
 contents of the wiki into a printer-friendly format - to a document that
 maintains some organization and formatting and can be used on any
 standard computer.

 Is anybody aware of a tool out there that would allow for this sort of
 automated, multi-page export? Our wiki is large and we would prefer not
 to do this type of backup one page at a time. We are using JSPwiki, but
 I'm open to any option you think might work. Could any of the web
 harvesting products be adapted to do the job? Has anyone else backed up
 a wiki to an alternate format?

 Thanks!


 Carol Hassler
 Webmaster / Cataloger
 Wisconsin State Law Library
 (608) 261-7558
 http://wilawlibrary.gov/




Re: [CODE4LIB] archiving a wiki

2012-05-23 Thread Misty De Meo
And while this is veering off-topic, it's also worth noting that the
development version of wget has support for WARC, the website archiving
format that the wayback machine is based around.


On 12-05-23 8:27 AM, Tom Keays tomke...@gmail.com wrote:

I haven't tried it on a wiki, but the command-line Unix utility wget can
be
used to mirror a website.

http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html

I usually call it like this:

wget -m -p http://www.site.com/

common flags:
   -m = mirroring on/off
   -p = page_requisites on/off
   -c = continue - when download is interrupted
   -l5 = reclevel - Recursion level (depth) default = 5

On Tue, May 22, 2012 at 5:04 PM, Carol Hassler
carol.hass...@wicourts.govwrote:

 My organization would like to archive/export our internal wiki in some
 kind of end-user friendly format. The concept is to copy the wiki
 contents annually to a format that can be used on any standard computer
 in case of an emergency (i.e. saved as an HTML web-style archive, saved
 as PDF files, saved as Word files).

 Another way to put it is that we are looking for a way to export the
 contents of the wiki into a printer-friendly format - to a document that
 maintains some organization and formatting and can be used on any
 standard computer.

 Is anybody aware of a tool out there that would allow for this sort of
 automated, multi-page export? Our wiki is large and we would prefer not
 to do this type of backup one page at a time. We are using JSPwiki, but
 I'm open to any option you think might work. Could any of the web
 harvesting products be adapted to do the job? Has anyone else backed up
 a wiki to an alternate format?

 Thanks!


 Carol Hassler
 Webmaster / Cataloger
 Wisconsin State Law Library
 (608) 261-7558
 http://wilawlibrary.gov/




Re: [CODE4LIB] archiving a wiki

2012-05-22 Thread Dave Caroline
On Tue, May 22, 2012 at 10:04 PM, Carol Hassler
carol.hass...@wicourts.gov wrote:
 My organization would like to archive/export our internal wiki in some
 kind of end-user friendly format. The concept is to copy the wiki
 contents annually to a format that can be used on any standard computer
 in case of an emergency (i.e. saved as an HTML web-style archive, saved
 as PDF files, saved as Word files).

something like ?
http://www.mediawiki.org/wiki/Extension:DumpHTML

Dave Caroline