| While John Chambers is here, I may as well ask a couple of questions (and
| apologise for top quoting)...
|
| I am a big fan of the resource but:
|
| Is there any feasable way to avoid duplicates? I often seem to get the same
| version of a tune within search results.  I guessing but do people lift abc
| from one site and post it to another?

Yes, but it's not all that cheap.  My search bot doesn't save all  of
each tune; it only notes a few things such as titles, key, meter. And
each hostname is handled by a separate process, so identical tunes on
two machines are difficult to spot at all.

One thing that is in my index files is the size  of  each  file  that
contains an abc tune.  It wouldn't take much extra space to also keep
the size of each tune.  Then a separate pass could  sort  tunes  with
identical titles by size, fetch each of them again, and compare them.
This would take a fair amount of code,  solely  for  the  purpose  of
eliminating duplicates.

And I'd question whether this is a good idea.  People  do  a  lot  of
copying  from  each  other.   The  common term for this on the web is
"mirroring".  Until we can make the entire Internet 100% reliable and
equally  fast  from everywhere, and guarantee that no sites will ever
die, mirroring is a good thing. If you can't get the first version of
a tune, you can try the second.  I know of a couple of abc sites that
are mostly mirrors.  A few people have asked me  to  mirror  some  of
their  tunes,  which I've done.  (And one fellow recently asked me to
stop mirroring his tunes, due to his having a fast,  reliable  server
finally.)

| Related I'd guess, but is the time feature new?  I can't remember the tune
| but was looking for a specific version (or a version close to what I knew)
| for someone the other day.  I could tell in a matter of a second or two that
| I needed to listen to the next MIDI but got told to wait a bit.

Yeah; it's only a few weeks old.  The problem was that the server got
hit  by  yet  another  search  site that flooded it with simultaneous
requests from a small set of IP addresses.  And the searcher was  one
of the new ones that has some idea of how to call cgi scripts. It was
apparently running through a dictionary, requesting  all  tunes  that
contain  each  word.  The load average was about 30, and nothing else
was getting done.

But the machine was alive enough that  I  could  implement  the  time
delay.   That  pretty much stopped the searcher in its tracks, and in
minutes things were back to normal.

I've  discussed  this  with  others  on  the  machine  (which  is   a
departmental "guest" machine shared by several dozen people, of which
typically 3 or 4 are logged in at any given time). They don't want to
exclude  search  bots,  since search sites are obviously useful to us
all. The problem is how to get search bots to hold back. We do have a
robots.txt  file  that  tells bots to avoid my music directories, but
this searcher ignored it.

The time delay seems to have worked.  I've  lowered  the  time  to  5
seconds,  and  that still seems effective.  The searcher mostly still
gets a lot of "too soon" replies, and it has backed  off.   It  might
just  be confused.  Or the authors may have noticed and realized that
we were defending ourselves.

I've been fairly careful to make sure that my searcher can't do  this
sort of thing. It is a strictly serial process that makes at most two
requests per second, and it ignores  all  URLs  that  look  like  CGI
scripts. But some of the search bots out there are fairly aggressive.
OTOH, google is very unobtrusive. I've found its probes in our server
logs,  and it's difficult to find two of them in the same minute.  So
it's not a problem at all.

| Unrelated:  I run conversion routines at
| http://www.folkinfo.org/songs/default.asp?X=1&S=&C=0&K=1 and
| a couple of days ago had my ISP reporting to me that an instance of abcm2ps
| was using nearlly 100% processor resources for 20 minutes! All abc testing
| is now done off line (using the same program) but I'm wondering if anyone
| else has experienced that or other conversion routines behaving that way.

I found (and fixed) a few loops in my abc2ps clone.  (And I found one
in  my  search bot just yesterday.) The loops that I saw were in code
that didn't handle some ill-formed abc, but I've forgotten  what  the
problem was.

Drawing music is a complex task.  I've been quite  impressed  by  how
good a job Michael did on abc2ps.  (And then I've fixed the bugs as I
found them. ;-)

To subscribe/unsubscribe, point your browser to: http://www.tullochgorm.com/lists.html

Reply via email to