Thank you for your hard work on get_iplayer. Alan.
On 1 November 2014 00:45, dinkypumpkin <dinkypump...@gmail.com> wrote: > get_iplayer has been more or less repaired, but there are still some wounds. > I'm going to release what I have on Sunday. I'm on the road next week, so > I've run out of time to do more for the time being. Consider it a stopgap > until progress can be made on other fronts. This is where things are: > > 1. I've disabled code related to the discontinued feeds, so you shouldn't > get any more bogus values in your metadata tags. You should also see > thumbnails again in files < 7 days old downloaded via PID. > > 2. The new release will support entry of multiple PIDs. > > 3. I've more or less restored the 7 day cache for TV and radio. There are > still some holes in it: > > a. It is not possible to search for audiodescribed versions of programmes. > I haven't been able to source that information. If anyone has any clues on > the subject, chime in - but not if your suggestion is to scrape the iPlayer > site. That isn't on the table right just yet. > > You can still download audiodescribed versions, but you'll have to look for > them on the iPlayer site. Signed versions should still be flagged in the > get_iplayer cache, but some may be missing. Again, check the iPlayer site > if in doubt. > > I've changed get_iplayer to always scrape the related episode page to look > for audiodescribed/signed versions when requested, so hopefully more > downloads will be successful. I found a number of cases where the playlist > data for recent programmes didn't contain identifiers for audiodescribed > versions even though they existed on the iPlayer site. > > b. It is not possible to search radio programmes by category. TV programmes > still have category information. There is a source for radio category > information, but it uniformly foundered on Radio 4 and Radio 4 Extra, which > is where the categories are most meaningful. I know that is going to break > some PVR searches, but the alternative is a support headache I can't absorb. > > c. I can't vouch that every programme from the previous 7 days will show up > in the cache. As always, you can use the PID for any programme not in the > cache. By the same token, I can't vouch that every programme in the cache > will be downloadable. The new feeds contain noticeably more programmes, > some due to the inclusion of web-only stuff. With the heavier load, cache > refreshes are noticeably slower than with the old feeds, ca. 90 seconds for > me for tv+radio. > > 2. The more-or-less restored cache depends on some old data feeds lingering > at the BBC. Recent events have taught us that they could disappear without > warning, so I've implemented a fallback mechanism. There will be a new > option that will switch the cache to refresh from the channel schedule pages > instead of the old data feeds. However, this fallback is also limited: > > a. It is not possible to search for audiodescribed or signed versions of > programmes. That information isn't in the schedule pages. > > b. It is not possible to search TV or radio programmes by category. Again, > that information isn't in the schedule pages. > > c. Cache refresh is slow, ca. 4+ minutes for a full TV and radio refresh for > me. The time could be cut by about 1/3 by removing regional TV channel > variations, but it cuts out 50+ programmes, so I've left them in for the > present. > > d. It appears that fewer programmes from the previous 7 days get cached > compared to the feeds. Part of that is because the schedule pages don't > show most web-only programmes. Part of it may also be because I'm checking > availability info in the schedule pages more strictly than whatever produces > the data feeds. Again, you can use the PID for anything not in the cache. > > e. The only plus to using the schedule pages to populate the cache is that > it becomes possible to expand your cache out to 30 days. It seems to work > OK, if you have 10-15 minutes to refresh your cache. There will be an > option for this. > > f. I've given you enough rope to hang yourself, but don't put this fallback > option into regular use unless it becomes necessary - seriously. It's only > there to avoid weeks like this one. I won't be interested in hearing how > slow it is or how it doesn't locate some particular programme. And for > pete's sake *don't* use it with the Web PVR. If you insist on playing > around with it, you'll probably want to bump up --expiry to some gigantic > number and refresh your cache manually as needed. > > 3. Looking further ahead > > Some things that have been floated here in the past few days: > > a. Programme data services: If somebody implements something along these > lines, I'm sure get_iplayer could be integrated with it. It's clear that > get_iplayer would never be able to access Nitro if and when it's ever opened > up. But, if somebody can repackage Nitro data for wider use, that would be > pretty useful. > > b. iPlayer site scraping: This could also be the foundation of a programme > data service instead of Nitro. It is also the only real hope for > get_iplayer to regain a full-featured desktop cache, though I'm not sure it > will be practical. A full scrape is out of the question for local caching - > there are just too many programmes on the radio side. However, even caching > just the previous 7 days will be much much slower than with the old data > feeds. The number of requests and the amount of data to move over the wire > and parse would be vastly greater. Some sort of parallelisation might help. > The trick will be to figure out the right way to filter the listings down to > a practical volume. > > I started down this road, but it was way too slow for radio and it was going > to be too much work for the time available. Plus, it didn't seem worth > leaving get_iplayer crippled any longer than necessary. To do this properly > will likely mean adding some dependencies to get_iplayer as well as some > major reworking. I'm going to keep working in that direction just to see if > it can be done, but no idea if it will be of practical use. > > Also see Steven Maude's recent post for his take on the problem. > > c. External search/indexing applications: To my mind, it seems like a good > idea for some energetic person to split this out. get_iplayer badly needs > to lose weight, not gain it, and there is a pretty clear functional > separation between searching and downloading. get_iplayer needs a lot of > work in handling metadata that could make it a better downloader, so it > would be no bad thing to get out of the caching business. I'll have my pony > now, thanks. > > > _______________________________________________ > get_iplayer mailing list > get_iplayer@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/get_iplayer _______________________________________________ get_iplayer mailing list get_iplayer@lists.infradead.org http://lists.infradead.org/mailman/listinfo/get_iplayer