Re: [CODE4LIB] archiving a wiki
A thing to be careful of when web harvesting a wiki, is that it may harvest mope than you bargained for. Most wikis (I don't know JSPWiki, sorry) can present earlier versions of pages, diffs, indexes, and sometimes the same pages under different URLs. This may or may not be what you want. For taking a snapshot of the current versions of all pages on a wiki, I have had luck with using an index page (that is: a list of all pages in the wiki, not index.html) and harvesting that page with a recursion depth of one. This may or may not help you :-) $ wget --html-extension -r -k -l1 wiki-index-page-url Best of luck, Kåre -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Tom Keays Sent: Wednesday, May 23, 2012 3:27 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] archiving a wiki I haven't tried it on a wiki, but the command-line Unix utility wget can be used to mirror a website. http://www.gnu.org/software/wget/manual/html_node/Advanced- Usage.html I usually call it like this: wget -m -p http://www.site.com/ common flags: -m = mirroring on/off -p = page_requisites on/off -c = continue - when download is interrupted -l5 = reclevel - Recursion level (depth) default = 5 On Tue, May 22, 2012 at 5:04 PM, Carol Hassler carol.hass...@wicourts.govwrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). Another way to put it is that we are looking for a way to export the contents of the wiki into a printer-friendly format - to a document that maintains some organization and formatting and can be used on any standard computer. Is anybody aware of a tool out there that would allow for this sort of automated, multi-page export? Our wiki is large and we would prefer not to do this type of backup one page at a time. We are using JSPwiki, but I'm open to any option you think might work. Could any of the web harvesting products be adapted to do the job? Has anyone else backed up a wiki to an alternate format? Thanks! Carol Hassler Webmaster / Cataloger Wisconsin State Law Library (608) 261-7558 http://wilawlibrary.gov/
Re: [CODE4LIB] archiving a wiki
On Tue, May 22, 2012 at 11:04 PM, Carol Hassler carol.hass...@wicourts.gov wrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). take a look at wikiteam their activity is mainly related to mediawiki maybe they could help in a solution for your wiki http://archiveteam.org/index.php?title=WikiTeam http://code.google.com/p/wikiteam/ ciao -- raffaele
Re: [CODE4LIB] archiving a wiki
Many organizations are using Archive-It, the Internet Archive's service for harvesting and preserving specific websites. I think it can be used to produce public or private archives. http://www.archive-it.org/ Keith On Tue, May 22, 2012 at 5:04 PM, Carol Hassler carol.hass...@wicourts.gov wrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files).
Re: [CODE4LIB] archiving a wiki
I haven't tried it on a wiki, but the command-line Unix utility wget can be used to mirror a website. http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html I usually call it like this: wget -m -p http://www.site.com/ common flags: -m = mirroring on/off -p = page_requisites on/off -c = continue - when download is interrupted -l5 = reclevel - Recursion level (depth) default = 5 On Tue, May 22, 2012 at 5:04 PM, Carol Hassler carol.hass...@wicourts.govwrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). Another way to put it is that we are looking for a way to export the contents of the wiki into a printer-friendly format - to a document that maintains some organization and formatting and can be used on any standard computer. Is anybody aware of a tool out there that would allow for this sort of automated, multi-page export? Our wiki is large and we would prefer not to do this type of backup one page at a time. We are using JSPwiki, but I'm open to any option you think might work. Could any of the web harvesting products be adapted to do the job? Has anyone else backed up a wiki to an alternate format? Thanks! Carol Hassler Webmaster / Cataloger Wisconsin State Law Library (608) 261-7558 http://wilawlibrary.gov/
Re: [CODE4LIB] archiving a wiki
And while this is veering off-topic, it's also worth noting that the development version of wget has support for WARC, the website archiving format that the wayback machine is based around. On 12-05-23 8:27 AM, Tom Keays tomke...@gmail.com wrote: I haven't tried it on a wiki, but the command-line Unix utility wget can be used to mirror a website. http://www.gnu.org/software/wget/manual/html_node/Advanced-Usage.html I usually call it like this: wget -m -p http://www.site.com/ common flags: -m = mirroring on/off -p = page_requisites on/off -c = continue - when download is interrupted -l5 = reclevel - Recursion level (depth) default = 5 On Tue, May 22, 2012 at 5:04 PM, Carol Hassler carol.hass...@wicourts.govwrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). Another way to put it is that we are looking for a way to export the contents of the wiki into a printer-friendly format - to a document that maintains some organization and formatting and can be used on any standard computer. Is anybody aware of a tool out there that would allow for this sort of automated, multi-page export? Our wiki is large and we would prefer not to do this type of backup one page at a time. We are using JSPwiki, but I'm open to any option you think might work. Could any of the web harvesting products be adapted to do the job? Has anyone else backed up a wiki to an alternate format? Thanks! Carol Hassler Webmaster / Cataloger Wisconsin State Law Library (608) 261-7558 http://wilawlibrary.gov/
Re: [CODE4LIB] archiving a wiki
On Tue, May 22, 2012 at 10:04 PM, Carol Hassler carol.hass...@wicourts.gov wrote: My organization would like to archive/export our internal wiki in some kind of end-user friendly format. The concept is to copy the wiki contents annually to a format that can be used on any standard computer in case of an emergency (i.e. saved as an HTML web-style archive, saved as PDF files, saved as Word files). something like ? http://www.mediawiki.org/wiki/Extension:DumpHTML Dave Caroline