Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
On Wednesday, December 15, 2010, Tim Starling wrote: There were some changes made to the page text that weren't represented in diff_log, specifically changing certain camel-case links to free links. It appears my problems were related to some CR/LF issues not round-tripping between diff and patch, but I hope to be able to address that. And yes, in addition to some of the CamelCase issues, I expect another problem is that if a page is blanked Describe the new page here. will reappear outside of the diff_log. ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
On 16/12/10 23:10, Joseph Reagle wrote: On Wednesday, December 15, 2010, Tim Starling wrote: There were some changes made to the page text that weren't represented in diff_log, specifically changing certain camel-case links to free links. It appears my problems were related to some CR/LF issues not round-tripping between diff and patch, but I hope to be able to address that. And yes, in addition to some of the CamelCase issues, I expect another problem is that if a page is blanked Describe the new page here. will reappear outside of the diff_log. I don't think that will be a problem. But there are other problems that I've encountered. UseMod had a deletion feature. It turns out to be easy enough to skip deleted pages, since they don't have a corresponding entry in rclog. It also had an admin-only rename feature, which optionally fixed links in all pages. This accounts for the free link changes I was seeing earlier. And it had a link replacement feature which could be invoked without a page move. These features were rarely used, due to the arcane interface, usually people just moved pages by copying and pasting. But during the free-link conversion, a lot of pages were renamed using the admin-only feature. All these admin-only features were unlogged, but it turns out to be possible to reconstruct page moves, because when a page was moved, its name was updated in rclog but not in diff_log. By finding the first diff_log entry with the new name, you can roughly work out when the page moves were done. Anyway, I'm developing a script which will import the dump into a modified MediaWiki instance, the idea being that I can then export XML from it. Once it works, I'll upload the XML to somewhere. I'm not sure when that will be. -- Tim Starling ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
I have the first 10K edits up reconstructed in their various pages at: http://cyber.law.harvard.edu/~reagle/wp-redux/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
This is amazing! Thanks for the work and effort, this reconstruction is a priceless resource for researchers. Lior On Thu, Dec 16, 2010 at 8:53 PM, Joseph Reagle joseph.2...@reagle.orgwrote: I have the first 10K edits up reconstructed in their various pages at: http://cyber.law.harvard.edu/~reagle/wp-redux/http://cyber.law.harvard.edu/%7Ereagle/wp-redux/ ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] [WikiEN-l] Old Wikipedia backups discovered
On Thursday, December 16, 2010, lior gimel wrote: This is amazing! And buggy! :-) Thanks for the work and effort, this reconstruction is a priceless resource for researchers. Thanks to Tim for providing the data, and for working on a much better version that I look forward to! ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[Wiki-research-l] Google ngrams
Hi all; I leave this link here... http://ngrams.googlelabs.com/datasets An example http://ngrams.googlelabs.com/graph?content=collaborativeyear_start=1920year_end=corpus=0smoothing=3 Regards, emijrp ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Google ngrams
I was just playing with this... remarkable. Someone should do the same with Wikipedia's text over time, which would provide even crisper comparisons [as within categories]. http://ngrams.googlelabs.com/graph?content=art,technology,wwwyear_start=1950year_end=2008corpus=5smoothing=4 On Thu, Dec 16, 2010 at 5:28 PM, emijrp emi...@gmail.com wrote: Hi all; I leave this link here... http://ngrams.googlelabs.com/datasets An example http://ngrams.googlelabs.com/graph?content=collaborativeyear_start=1920year_end=corpus=0smoothing=3 Regards, emijrp ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Samuel Klein identi.ca:sj w:user:sj +1 617 529 4266 ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Re: [Wiki-research-l] Google ngrams
Look at this one ; ) http://ngrams.googlelabs.com/graph?content=security%2Cfreedomyear_start=1950year_end=2008corpus=5smoothing=4 2010/12/17 Samuel Klein meta...@gmail.com I was just playing with this... remarkable. Someone should do the same with Wikipedia's text over time, which would provide even crisper comparisons [as within categories]. http://ngrams.googlelabs.com/graph?content=art,technology,wwwyear_start=1950year_end=2008corpus=5smoothing=4 On Thu, Dec 16, 2010 at 5:28 PM, emijrp emi...@gmail.com wrote: Hi all; I leave this link here... http://ngrams.googlelabs.com/datasets An example http://ngrams.googlelabs.com/graph?content=collaborativeyear_start=1920year_end=corpus=0smoothing=3 Regards, emijrp ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l -- Samuel Klein identi.ca:sj w:user:sj +1 617 529 4266 ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l ___ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l