Re: [slim] help with script to scrape album year data

2007-10-29 Thread vrobin

bklaas;238755 Wrote: 
> Digging my old CDs out for this would take WAY more time then just
> manually searching google for album release year. CDDB is not a good
> solution here, esp. because I don't have DiscIDs saved into the tags
> either. I started ripping music well before I understood why verbose
> tag metadata was a good idea. I'm trying to come up with a solution
> that doesn't involve the physical media.
> 
> I will give Musicbrainz a shot, though my prior experience with that
> service has not been good.
> 
> cheers,
> #!/ben

You can "fuzz search" cddb or musicbrainz with the media files, even if
the result are less exact than with the real disc/discid.

I think I remember musicbrainz and discogs information about release
date are not that bad, they may even include some "original release
date". If your albums are not too rare you could look at wikipedia.

But if I were you, I would a bot to query google with an algorithm like
this:


Code:


  search google with "full album name"
  do: fetch Nth result page
  in the Nth page look for  patterns near the album name
  collect all  you found in the page
  while at least XX  date fields are collected
  
  For each collected list of  analyze statistically (if a date is present 
95% of time keep it silently, 75% keep it with a Notice, 50% with a warning)
  



This algorithm can be fooled by re-release date, but if you select a
good pattern detection you can get good results...


-- 
vrobin

vrobin's Profile: http://forums.slimdevices.com/member.php?userid=11705
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread bklaas

Discogs has an API. Doesn't look too bad.

http://www.discogs.com/help/api


-- 
bklaas

"the Nokia770 skin guy"
http://www.last.fm/user/bklaas/

bklaas's Profile: http://forums.slimdevices.com/member.php?userid=58
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread radish

Thinking about it, Amazon Web Services might work well too - I know a
lot of apps use it for getting cover art, you can give it some pretty
vague queries and it will do it's best to match. The advantage that
would have would be that parsing neat XML is typically easier than
scraping HTML, in my experience.

http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_2/105-5797087-3222059?ie=UTF8&node=12738641&no=342430011&me=A36L942TSJ2AJA


-- 
radish

radish's Profile: http://forums.slimdevices.com/member.php?userid=77
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread bklaas

fantastico, radish! That looks like it might just work.

cheers,
#!/ben


-- 
bklaas

"the Nokia770 skin guy"
http://www.last.fm/user/bklaas/

bklaas's Profile: http://forums.slimdevices.com/member.php?userid=58
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread radish

Discogs.com?  They have a fairly google like search engine and the DB is
pretty damn extensive. In fact, the biggest problem you're likely to run
into is narrowing results down - even a pretty specific query like
this:

http://www.discogs.com/search?type=all&q=change+or+die+artist%3Asunscreem+country%3AUK+format%3Acd

gives 3 results. I guess you'd have to come up with something to pull
each result and take a best guess of the correct year.


-- 
radish

radish's Profile: http://forums.slimdevices.com/member.php?userid=77
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread bklaas

Benway;238717 Wrote: 
> CPAN has lots of modules for querying FreeDB.
> 
> The problem you'll find is that they tend to need the DiscID or
> the cdrom available as /dev/cdrom etc...
> 
> Searching through CPAN modules with "freedb id3" finds
> WebService::FreeDB which looks like it
> will do the trick.
> 
> However accuracy may be a problem when multiple records are found.

Digging my old CDs out for this would take WAY more time then just
manually searching google for album release year. CDDB is not a good
solution here, esp. because I don't have DiscIDs saved into the tags
either. I started ripping music well before I understood why verbose
tag metadata was a good idea. I'm trying to come up with a solution
that doesn't involve the physical media.

I will give Musicbrainz a shot, though my prior experience with that
service has not been good.

cheers,
#!/ben


-- 
bklaas

"the Nokia770 skin guy"
http://www.last.fm/user/bklaas/

bklaas's Profile: http://forums.slimdevices.com/member.php?userid=58
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread Benway

CPAN has lots of modules for querying FreeDB.

The problem you'll find is that they tend to need the DiscID or
the cdrom available as /dev/cdrom etc...

Searching through CPAN modules with "freedb id3" finds
WebService::FreeDB which looks like it
will do the trick.

However accuracy may be a problem when multiple records are found.


-- 
Benway

Benway

Benway's Profile: http://forums.slimdevices.com/member.php?userid=8944
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


Re: [slim] help with script to scrape album year data

2007-10-29 Thread snarlydwarf

Musicbrainz often has that information, the catch is you may not agreee
with their definition of release date.

(ie, re-issues and remasters are supposed to get a new release
date...)

Allmusic.com may also have it, but they want a license signed before
using their data.


-- 
snarlydwarf

snarlydwarf's Profile: http://forums.slimdevices.com/member.php?userid=1179
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss


[slim] help with script to scrape album year data

2007-10-29 Thread bklaas

Over the weekend I spent a little time trying to figure out a way to
populate the year field in ID3 tags where it's missing.

I wrote a Perl script (on Linux, but I think it's portable to other
OS's) to zip through my collection, find files with missing year tags,
and output the results one-line-per-artist/album to a file.

Now I'm looking for a way to scrape the missing year data from the web
somewhere by supplying an artist-album tuple. For example, I send
"artist=Beck&album=Sea+Change" to TBD website/whatever, and I parse the
year data from the result and write the tag accordingly.

Has anyone done this or have advice on where to look for these data?
I'm looking for something that would return structured data in an
as-simple-as-possible format for parsing.

cheers,
#!/ben

btw-- would be happy to share what I've got so far, but figure I'll
wait until I add the scraping code.


-- 
bklaas

"the Nokia770 skin guy"
http://www.last.fm/user/bklaas/

bklaas's Profile: http://forums.slimdevices.com/member.php?userid=58
View this thread: http://forums.slimdevices.com/showthread.php?t=39760

___
discuss mailing list
discuss@lists.slimdevices.com
http://lists.slimdevices.com/lists/listinfo/discuss