| While John Chambers is here, I may as well ask a couple of questions (and | apologise for top quoting)... | | I am a big fan of the resource but: | | Is there any feasable way to avoid duplicates? I often seem to get the same | version of a tune within search results. I guessing but do people lift abc | from one site and post it to another?
Yes, but it's not all that cheap. My search bot doesn't save all of each tune; it only notes a few things such as titles, key, meter. And each hostname is handled by a separate process, so identical tunes on two machines are difficult to spot at all. One thing that is in my index files is the size of each file that contains an abc tune. It wouldn't take much extra space to also keep the size of each tune. Then a separate pass could sort tunes with identical titles by size, fetch each of them again, and compare them. This would take a fair amount of code, solely for the purpose of eliminating duplicates. And I'd question whether this is a good idea. People do a lot of copying from each other. The common term for this on the web is "mirroring". Until we can make the entire Internet 100% reliable and equally fast from everywhere, and guarantee that no sites will ever die, mirroring is a good thing. If you can't get the first version of a tune, you can try the second. I know of a couple of abc sites that are mostly mirrors. A few people have asked me to mirror some of their tunes, which I've done. (And one fellow recently asked me to stop mirroring his tunes, due to his having a fast, reliable server finally.) | Related I'd guess, but is the time feature new? I can't remember the tune | but was looking for a specific version (or a version close to what I knew) | for someone the other day. I could tell in a matter of a second or two that | I needed to listen to the next MIDI but got told to wait a bit. Yeah; it's only a few weeks old. The problem was that the server got hit by yet another search site that flooded it with simultaneous requests from a small set of IP addresses. And the searcher was one of the new ones that has some idea of how to call cgi scripts. It was apparently running through a dictionary, requesting all tunes that contain each word. The load average was about 30, and nothing else was getting done. But the machine was alive enough that I could implement the time delay. That pretty much stopped the searcher in its tracks, and in minutes things were back to normal. I've discussed this with others on the machine (which is a departmental "guest" machine shared by several dozen people, of which typically 3 or 4 are logged in at any given time). They don't want to exclude search bots, since search sites are obviously useful to us all. The problem is how to get search bots to hold back. We do have a robots.txt file that tells bots to avoid my music directories, but this searcher ignored it. The time delay seems to have worked. I've lowered the time to 5 seconds, and that still seems effective. The searcher mostly still gets a lot of "too soon" replies, and it has backed off. It might just be confused. Or the authors may have noticed and realized that we were defending ourselves. I've been fairly careful to make sure that my searcher can't do this sort of thing. It is a strictly serial process that makes at most two requests per second, and it ignores all URLs that look like CGI scripts. But some of the search bots out there are fairly aggressive. OTOH, google is very unobtrusive. I've found its probes in our server logs, and it's difficult to find two of them in the same minute. So it's not a problem at all. | Unrelated: I run conversion routines at | http://www.folkinfo.org/songs/default.asp?X=1&S=&C=0&K=1 and | a couple of days ago had my ISP reporting to me that an instance of abcm2ps | was using nearlly 100% processor resources for 20 minutes! All abc testing | is now done off line (using the same program) but I'm wondering if anyone | else has experienced that or other conversion routines behaving that way. I found (and fixed) a few loops in my abc2ps clone. (And I found one in my search bot just yesterday.) The loops that I saw were in code that didn't handle some ill-formed abc, but I've forgotten what the problem was. Drawing music is a complex task. I've been quite impressed by how good a job Michael did on abc2ps. (And then I've fixed the bugs as I found them. ;-) To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html