Aman and I have now launched CBFC Watch, an archive and explorer for over 1 lakh censorship records across close to 18k movies released in India since 2017: https://cbfc.watch
The code for the site itself and related analysis is up on GitHub: https://github.com/diagram-chasing/cbfc-watch On Thursday, 26 June 2025 at 09:45:01 UTC+5:30 Aman Bhargava wrote: > Update on this: The CBFC has blocked any such access to the data. While we > have the data between 2017 to June 2025, it is no longer possible to scrape > since the URLs have been changed from being sequential IDs to encrypted > strings. > > Aroon Deep of The Hindu wrote about this in today's paper: > https://www.thehindu.com/entertainment/movies/censor-board-discontinues-full-access-to-cuts-on-website/article69736377.ece > > On Wednesday, April 9, 2025 at 9:26:14 PM UTC+5:30 Aman Bhargava wrote: > >> Hello! >> Yes there is. The cuts are posted on the ecinepramaan website (for >> example, here is Aavesham >> <https://www.ecinepramaan.gov.in/cbfc/?a=Certificate_Detail&i=100090292400000155>). >> >> The IDs on the URL are sequential and not hashed, so we've figured out a >> hacky way to just scrape in a brute-force manner. This also means that >> we've not yet been able to figure out a way to get a specific movie ID, we >> scrape whatever we can and see if the movie we're interested in fell within >> that range (you can narrow it a little bit by year and such). We've been >> working on scraping data and cleaning it up for the last few months. >> >> Our work-in-progress repository is here: >> https://github.com/diagram-chasing/censor-board-cuts >> >> The data scraping logic is explained here: >> https://github.com/diagram-chasing/censor-board-cuts/tree/master/data-scripts/scrape >> >> This is not complete by any means. Issues are listed in the repository, >> as are our TODO items. But we've made some progress on scraping and >> structuring small samples, which you can see here >> <https://flatgithub.com/diagram-chasing/censor-board-cuts?filename=data%2Fmodifications.csv&sha=7ec2784f8dc7818a6fec27c38c2f1d2016290e0f>. >> >> The data sample is also only *a small subset* of what can be gotten and >> was scraped a few months ago. The end goal is to automate this and create a >> regularly updated explorer with some basic trend analysis (modifications, >> types of modifications etc.). >> >> On Wednesday, April 9, 2025 at 8:19:07 PM UTC+5:30 Thejesh GN wrote: >> >>> https://cbfcindia.gov.in/cbfcAdmin/search-film.php >>> >>> You can search for specific film and their certification but doesn't >>> list cursor changes. >>> >>> Is there a place I get that? May be every day for the latest films. This >>> is for trend analysis. >>> >>> >>> -- >>> Thejesh GN ⏚ ತೇಜೇಶ್ ಜಿ.ಎನ್ >>> http://thejeshgn.com >>> GPG ID : 0xBFFC8DD3C06DD6B0 >>> >> -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/datameet/b8662626-40f9-41dc-9e84-b45834876481n%40googlegroups.com.
