Re: [aur-general] AUR Maintenance
On 1 March 2013 15:07, Connor Behan connor.be...@gmail.com wrote: Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR. While it is a lot of data, I agree that it shouldn't be that difficult to recover. What am I missing? Any chance those of us who aren't TU's can get access to the file?
Re: [aur-general] AUR Maintenance
On 3/3/13, Phillip Smith li...@fukawi2.nl wrote: While it is a lot of data, I agree that it shouldn't be that difficult to recover. What am I missing? Any chance those of us who aren't TU's can get access to the file? I also came close to that question, which indeed is kind of obvious. cheers! mar77i
Re: [aur-general] AUR Maintenance
On Fri, Mar 1, 2013 at 5:07 AM, Connor Behan connor.be...@gmail.com wrote: [...] INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0); Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR. That's why we use machines to do this kind of work for us. Also, a lovely idea to restore comments that are older than two years, that'll be extremely beneficial to the quality of the aur. Other thoughts on this, we don't need comments on packages that don't exist any more, that were deleted already or are made by users which aren't in the db any more. If I understood correctly, none of that data is currently in the aur's comments? or all? cheers! mar77i
Re: [aur-general] AUR Maintenance
On 01/03/13 06:02 AM, Martti Kühne wrote: On Fri, Mar 1, 2013 at 5:07 AM, Connor Behan connor.be...@gmail.com wrote: [...] INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0); Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR. That's why we use machines to do this kind of work for us. Also, a lovely idea to restore comments that are older than two years, that'll be extremely beneficial to the quality of the aur. Right, inserting this data into a db can be automated. It would just require minor syntax changes to account for the newer MySQL version. This hasn't been done, I gather, because the devs hold themselves to a high standard and don't want corrupted text littering the AUR comments. Fixing the encoding of the text is what might require reading. Loui Chang seemed to think there was a way to automate this as well but it would be nontrivial so the project got put on the back burner. I should ask him. Other thoughts on this, we don't need comments on packages that don't exist any more, that were deleted already or are made by users which aren't in the db any more. If I understood correctly, none of that data is currently in the aur's comments? or all? Whether a comment is a deleted comment is stored in the AUR database. Whether it belongs to a deleted package or a deleted user, I believe, is not. If you delete an AUR package, the PHP file will only delete the record for that package. Comments that were part of it stay in the db as orphan data. In fact, package tarballs don't even get deleted by the PHP file. This is done by a helper script that periodically runs a cleanup. However, if this 2010 backup does get imported into the AUR, I agree that we can take the liberty of removing such orphan data so there is less to import. cheers! mar77i signature.asc Description: OpenPGP digital signature
Re: [aur-general] AUR Maintenance
On 28.02.2013 07:14, Connor Behan wrote: I was stupid enough not to make a backup so can someone with access please put this on nymeria? Thank-you. I've put it in your home on nymeria. You're lucky SevenL didn't yet shut down sigurd. signature.asc Description: OpenPGP digital signature
Re: [aur-general] AUR Maintenance
On 28/02/13 01:54 PM, Phillip Smith wrote: I'd be willing to try and assist with this too. What is the format of that backup file? It is a 46MB text file of SQL commands; the kind you would get by running mysqldump. It only has 462 lines, but some of them are very long. The important lines are 97-117 that specify the PackageComments table: CREATE TABLE `PackageComments` ( `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `PackageID` int(10) unsigned NOT NULL DEFAULT '0', `UsersID` int(10) unsigned NOT NULL DEFAULT '0', `Comments` text NOT NULL, `CommentTS` bigint(20) unsigned NOT NULL DEFAULT '0', `DelUsersID` int(10) unsigned NOT NULL DEFAULT '0', PRIMARY KEY (`ID`), KEY `UsersID` (`UsersID`), KEY `PackageID` (`PackageID`), KEY `DelUsersID` (`DelUsersID`) ) ENGINE=MyISAM AUTO_INCREMENT=154508 DEFAULT CHARSET=latin1; /*!40101 SET character_set_client = @saved_cs_client */; -- -- Dumping data for table `PackageComments` -- LOCK TABLES `PackageComments` WRITE; /*!4 ALTER TABLE `PackageComments` DISABLE KEYS */; ID is a number identifying the comment, PackageID is the package to which it belongs, UsersID is the one who posted it, Comments is the actual text of it, CommentTS is the timestamp of when it was posted, DelUsersID is equal to the ID of the user who deleted the comment and 0 if it has not been deleted. The next important lines are 118-146 which state the actual comment data. An example of it is: INSERT INTO `PackageComments` VALUES (17,46,68,'ruby bindings for fastcgi',1113164127,68),(28,69,65,'A countdown timer applet for the GNOME panel.',1113178883,0); Except that line there is 161 characters and contains two comments (one comment deleted by its poster about Ruby and one non-deleted comment about GNOME). The line in the real file is a million characters and contains ~20k comments. And there are 28 such lines. Reading this would be like reading War And Peace 10 times but it would teach you a lot about the history of the AUR. signature.asc Description: OpenPGP digital signature
[aur-general] AUR Maintenance
Some of you may remember the lost comments fiasco of 2010. When this happened, a file on sigurd was created aur-20100205-1859.sql.fixed2.xz with AUR comments that had been lost. I volunteered (on the mailin list) to help manually restore them to the AUR [1]. Since I did not know SQL or how to speak the languages that were causing problems, my offer was laughed off as enthusiasm that wasn't helpful. This hasn't really changed. My help at this point would still probably be a bit useless. But that doesn't mean I don't still think about one day having a go at it. The zipped sql file was on sigurd as recently as a few months ago. But since the move to nymeria, I can't find it anymore. I was stupid enough not to make a backup so can someone with access please put this on nymeria? Thank-you. [1] https://mailman.archlinux.org/pipermail/aur-general/2010-May/008847.html signature.asc Description: OpenPGP digital signature
[aur-general] AUR maintenance works
Some of you might have noticed that the AUR has been in maintenance mode for the last three hours. We did a full backup of the server and prepared everything for a drive replacement that is scheduled for tomorrow, 28. Nov 2011. The server might be down again tomorrow for a couple of minutes. Sorry for the inconvenience!
[aur-general] AUR Maintenance
I'm going to be updating the AUR in the next few minutes. Don't be alarmed. Please stand by.
Re: [aur-general] AUR Maintenance
On Sun 19 Sep 2010 20:24 -0400, Loui Chang wrote: I'm going to be updating the AUR in the next few minutes. Don't be alarmed. Please stand by. Should be good now. Let me know if there are any issues. Cheers.
Re: [aur-general] AUR Maintenance
On Sun, Sep 19, 2010 at 8:09 PM, Loui Chang louipc@gmail.com wrote: On Sun 19 Sep 2010 20:24 -0400, Loui Chang wrote: I'm going to be updating the AUR in the next few minutes. Don't be alarmed. Please stand by. Should be good now. Let me know if there are any issues. Cheers. There are no issues.. thanks for updating the aur! Although I did notice that the header is inconsistent with the other headers for archlinux. (the words up top don't seem to be as bold)
Re: [aur-general] AUR Maintenance
On Sun 19 Sep 2010 20:40 -0500, Thomas Dziedzic wrote: On Sun, Sep 19, 2010 at 8:09 PM, Loui Chang louipc@gmail.com wrote: On Sun 19 Sep 2010 20:24 -0400, Loui Chang wrote: I'm going to be updating the AUR in the next few minutes. Don't be alarmed. Please stand by. Should be good now. Let me know if there are any issues. Cheers. There are no issues.. thanks for updating the aur! Although I did notice that the header is inconsistent with the other headers for archlinux. (the words up top don't seem to be as bold) Dammmit!!
Re: [aur-general] AUR Maintenance
On Sun, 19 Sep 2010 20:40:23 -0500 Thomas Dziedzic gos...@gmail.com wrote: On Sun, Sep 19, 2010 at 8:09 PM, Loui Chang louipc@gmail.com wrote: On Sun 19 Sep 2010 20:24 -0400, Loui Chang wrote: I'm going to be updating the AUR in the next few minutes. Don't be alarmed. Please stand by. Should be good now. Let me know if there are any issues. Cheers. There are no issues.. thanks for updating the aur! Although I did notice that the header is inconsistent with the other headers for archlinux. (the words up top don't seem to be as bold) You fell for the duck, man... :) http://stackoverflow.com/questions/2349378/new-programming-jargon-you-coined/2444361#2444361
Re: [aur-general] AUR Maintenance
On Mon, May 3, 2010 at 6:53 PM, Loui Chang louipc@gmail.com wrote: Hah. Thanks for your enthusiasm, but it wouldn't be very effective to go through it manually. Most of the scrambled strings are in languages other than English, and it would take waay too much work. I do plan on restoring them, but I haven't had the chance to look into it. I can't say when that will be though. Cheers. This is really becoming a great hindrance. The comments on the AUR had a lot of very valuable information on them. It's not like it's a twitter feed of inane babble- it's documentation, QA, a changelog, brainstorming and external references. This is hurting me much more than if the wiki was completely nuked. I have PKGBUILDs and other build data in version control, but I don't have any of that other stuff in the AUR comments anywhere. Anything you could do would be a great help here. I would rather 5% of the comments be corrupted or completely deleted because of encoding issues than have all of them missing for another 2 months. Unfortunately, I have no expertise with encoding issues, otherwise I would have immediately offered help at the time. Let us know if there is anything we can help with, at all (besides stop bugging me! :) Thanks, Slash
Re: [aur-general] AUR Maintenance
On Fri 30 Apr 2010 22:04 -0400, Connor Behan wrote: I read in the archives that comments were 95% repaired by Firmicus and the copy with a few illegal characters is: /home/francois/aur-20100205-1859.sql.fixed2.xz on a server to which I do not yet have access. Since the comments are not back, I gather there is more to be done. Could I please go through the comments and repair sentences that make sense? i.e. s/Th#s wor#s for kd#4 but #ot kdemod/This works for kde4 but not kdemod/ I will do this until everything is done or the only sentences left are unintelligible. The other problem is merging this with new comments that have been posted since the update. No one knows a reliable way to do this automatically, correct? And it would require adding countless database entries by hand? I am also prepared to get started on this brute force work. I will have several hours per week to devote to it. Please give me what I need to contribute! Hah. Thanks for your enthusiasm, but it wouldn't be very effective to go through it manually. Most of the scrambled strings are in languages other than English, and it would take waay too much work. I do plan on restoring them, but I haven't had the chance to look into it. I can't say when that will be though. Cheers.
[aur-general] AUR Maintenance
I read in the archives that comments were 95% repaired by Firmicus and the copy with a few illegal characters is: /home/francois/aur-20100205-1859.sql.fixed2.xz on a server to which I do not yet have access. Since the comments are not back, I gather there is more to be done. Could I please go through the comments and repair sentences that make sense? i.e. s/Th#s wor#s for kd#4 but #ot kdemod/This works for kde4 but not kdemod/ I will do this until everything is done or the only sentences left are unintelligible. The other problem is merging this with new comments that have been posted since the update. No one knows a reliable way to do this automatically, correct? And it would require adding countless database entries by hand? I am also prepared to get started on this brute force work. I will have several hours per week to devote to it. Please give me what I need to contribute! Thanks.
Re: [aur-general] AUR Maintenance
I can help out as well, though I am not a trusted user so I wonder how much I can really help. But my offer is there. :) On 2010-04-30, at 10:04 PM, Connor Behan wrote: I read in the archives that comments were 95% repaired by Firmicus and the copy with a few illegal characters is: /home/francois/aur-20100205-1859.sql.fixed2.xz on a server to which I do not yet have access. Since the comments are not back, I gather there is more to be done. Could I please go through the comments and repair sentences that make sense? i.e. s/Th#s wor#s for kd#4 but #ot kdemod/This works for kde4 but not kdemod/ I will do this until everything is done or the only sentences left are unintelligible. The other problem is merging this with new comments that have been posted since the update. No one knows a reliable way to do this automatically, correct? And it would require adding countless database entries by hand? I am also prepared to get started on this brute force work. I will have several hours per week to devote to it. Please give me what I need to contribute! Thanks.
Re: [aur-general] AUR Maintenance
So... is there an official word about what is the final decision on restoring these? Is it still being investigated how to fix this or it just being left? Allan
Re: [aur-general] AUR Maintenance
Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae: Is there any progress on fixing this? There are a lot of packaging notes on those pages that would be a shame to lose. It's very likely the same issue I had updating the wiki. This is caused by a mysql packaging change which switched the default encoding from latin1 to utf8. Here are some tips: http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL But I guess we lost the chance to fix this more or less easily because the AUR content has changed since the last backup. This requires some kind of script that imports and merges the old and new comments. -- Pierre Schmitz, https://users.archlinux.de/~pierre
Re: [aur-general] AUR Maintenance
On 29/03/2010 09:00, Pierre Schmitz wrote: Am Montag, 29. März 2010 01:21:09 schrieb Allan McRae: Is there any progress on fixing this? There are a lot of packaging notes on those pages that would be a shame to lose. I did it last Thursday. I've done my best to repair the mysql backup Loui pointed me at. I'd say it's 95% fixed now, but the procedure left a few isolated illegal characters in its trail (like this: �), especially within Cyrillic and CJK. The text should be legible however. You can compare the original on sigurd /srv/http/aur.archlinux.org/backup/aur-20100205-1859.sql.gz with my repaired version: /home/francois/aur-20100205-1859.sql.fixed2.xz and judge whether any further effort is needed or justified. It's very likely the same issue I had updating the wiki. This is caused by a mysql packaging change which switched the default encoding from latin1 to utf8. Here are some tips: http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL But I guess we lost the chance to fix this more or less easily because the AUR content has changed since the last backup. Indeed. Believe me, the encoding of the strings was in a terrible mess (mostly the comments, but also the names of users), so it was no longer simply a matter of doing a conversion from one charset to another. Basically what I did was to convert from windows-1252 (!) to UTF-8, and then repair all doubly-encoded UTF-8 characters using the perl module Encode::DoubleEncodedUTF8 (on CPAN). But as I said above, there is no way to automatically recover everything from that one backup alone. This requires some kind of script that imports and merges the old and new comments. The problem with that import and merge operation – unless it is done with a reliable and well-tested tool – is that it risks damaging the data more than it currently is ;) I'll leave it to Loui to decide whether it's worth the trouble. F
Re: [aur-general] AUR Maintenance
On 23/03/2010 22:24, Loui Chang wrote: On Tue 23 Mar 2010 16:51 -0400, Daenyth Blank wrote: On Tue, Mar 23, 2010 at 16:43, Loui Changlouipc@gmail.com wrote: It may be possible to restore most of the old comments, but that's something that we'd have to look into later. What's needed for this, and what ways could someone contribute? We need someone with a keen knowledge of mysql and encodings to be able to restore the backed up comments properly in utf8. I've done encoding conversions and repairs countless times (mostly using Perl). So perhaps I could help on this... (Not today though, but probably tomorrow). Contact me off-list and give me more detailed instructions of what the issue is. I do have access to sigurd but I can't look at the data right now as I am not in the mysql group. F
[aur-general] AUR Maintenance
Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while.
Re: [aur-general] AUR Maintenance
On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote: Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while. I've deleted existing comments from the AUR. I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though. It may be possible to restore most of the old comments, but that's something that we'd have to look into later. Cheers!
Re: [aur-general] AUR Maintenance
On Tue, Mar 23, 2010 at 9:43 PM, Loui Chang louipc@gmail.com wrote: On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote: Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while. I've deleted existing comments from the AUR. I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though. It may be possible to restore most of the old comments, but that's something that we'd have to look into later. Probably a stupid question but just to be sure : being able to look into it later supposes that there is an easy way to restore old comments by keeping the new ones ? (i.e. merging both) Or will the new ones be lost when restoring the old ones ?
Re: [aur-general] AUR Maintenance
On Tue, Mar 23, 2010 at 16:43, Loui Chang louipc@gmail.com wrote: It may be possible to restore most of the old comments, but that's something that we'd have to look into later. What's needed for this, and what ways could someone contribute?
Re: [aur-general] AUR Maintenance
On Tue 23 Mar 2010 16:51 -0400, Daenyth Blank wrote: On Tue, Mar 23, 2010 at 16:43, Loui Chang louipc@gmail.com wrote: It may be possible to restore most of the old comments, but that's something that we'd have to look into later. What's needed for this, and what ways could someone contribute? We need someone with a keen knowledge of mysql and encodings to be able to restore the backed up comments properly in utf8.
Re: [aur-general] AUR Maintenance
On Tue 23 Mar 2010 21:51 +0100, Xavier Chantry wrote: On Tue, Mar 23, 2010 at 9:43 PM, Loui Chang louipc@gmail.com wrote: On Tue 23 Mar 2010 15:09 -0400, Loui Chang wrote: Hello everyone! I'm going to look into fixing some issues with the AUR right now. Please don't be alarmed if the site isn't working for a little while. I've deleted existing comments from the AUR. I ran into a problem juggling the encodings, which was the problem I was trying to fix. The aur should properly display utf8 in comments now though. It may be possible to restore most of the old comments, but that's something that we'd have to look into later. Probably a stupid question but just to be sure : being able to look into it later supposes that there is an easy way to restore old comments by keeping the new ones ? (i.e. merging both) Or will the new ones be lost when restoring the old ones ? Old comments are mostly backed up but suffer from some encoding issues - that's the first hurdle. There should be a way to merge old and new comments. I'm not exactly sure how easy that would be however. Probably pretty easy for a real sysadmin. That I am not unfortunately. I'm not sure how much value is in the old comments, but it's not worth keeping the AUR locked down while I try to figure it out.
Re: [aur-general] AUR Maintenance
On Tue, Mar 23, 2010 at 10:39 PM, Loui Chang louipc@gmail.com wrote: I'm not sure how much value is in the old comments, but it's not worth keeping the AUR locked down while I try to figure it out. I would say there are 90% of crap and 10% that would be a shame to lose :) I hope someone more knowledgeable about mysql/encoding/sysadmin can help.
Re: [aur-general] AUR Maintenance
On Tue, Mar 23, 2010 at 17:41, Xavier Chantry chantry.xav...@gmail.com wrote: I would say there are 90% of crap and 10% that would be a shame to lose :) I hope someone more knowledgeable about mysql/encoding/sysadmin can help. I might throw up an announcement on the forums that they've been removed and a call for help on fixing it.