I have a script in python for Mission Antyodaya if any one want I can help you Because I scrape the data GP wise and questionnaire also
On Sun, 6 Feb, 2022, 17:48 Nikhil VJ, <nikhil...@gmail.com> wrote: > Hi, > > I don't think Selenium is required - this looks like it can be done with > just varying the request payload of one POST api call. > POST api call to URL: > https://missionantyodaya.nic.in/preloginVillageInfrastructureReports2020.html > the POST request content type is application/x-www-form-urlencoded > > at *state level*, request payload is like: > stateCode: 27 > stateName: MAHARASHTRA > districtCode: > districtName: > blockCode: > blockName: > gpCode: > gpName: > > It* district level* it becomes: > stateCode: 27 > stateName: MAHARASHTRA > districtCode: 469 > districtName: AURANGABAD > blockCode: > blockName: > gpCode: > gpName: > > then *block level*: > stateCode: 27 > stateName: MAHARASHTRA > districtCode: 469 > districtName: AURANGABAD > blockCode: 4315 > blockName: KHULTABAD > gpCode: > gpName: > > then* GP level:* > stateCode: 27 > stateName: MAHARASHTRA > districtCode: 469 > districtName: AURANGABAD > blockCode: 4315 > blockName: KHULTABAD > gpCode: 170584 > gpName: BODKHA > > If in python, one can use Beautifulscrape to capture the table data as > well as get the (code + name) pairs for the next level. > > -- > Cheers, > Nikhil VJ > https://nikhilvj.co.in > > > On Fri, Feb 4, 2022 at 1:42 PM Sanjay Bhangar <sanjaybhan...@gmail.com> > wrote: > >> Piyush - >> >> You could write a python (or your preferred language) script that just >> requests the HTML, parses it, and follows the hierarchy, without using >> selenium. This could be a bunch of work as the site doesn't use regular >> links with GET requests, but rather when you click on a state in the table, >> it uses Javascript to fill up hidden form fields with the state code, etc. >> and then does a form submit, causing a POST request to be made with those >> values. >> >> For eg. you can see the links in the table have an onClick handler like >> "selectState(2,'HIMACHAL >> PRADESH','preloginDistrictInfrastructureReports2020.html')" . >> >> Then, in the javascript, you can see the selectState function defined >> like so: >> >> function selectState(stateCode,stateName,action){ >> $("#stateCode").val(stateCode); >> $("#stateName").val(stateName); >> $("#reportForm").attr('action', action); >> $("#reportForm").submit(); >> >> } >> >> In this JS file: >> https://missionantyodaya.nic.in/resources/antyodaya/js/custom/prelogin/reports/preloginReport.js >> >> So this will make a POST request to >> preloginDistrictInfrastructureReports2020.html >> with stateCode=2, stateName=HIMACHAL PRADESH >> >> Similarly, there are different onCick handlers defined for selecting >> districts, etc. that you can follow down to see what URLs they are calling >> with what parameters. And in theory, you could write some HTML parsing code >> and some regex to go through the items in each table, parse out the >> parameters and URLs to call, and follow things down. >> >> So, in theory you could write this without mucking around with selenium, >> but it also seems like a lot more work than if the site was structured >> "normally" with unique URLs and GET requests. >> >> For the page numbering, this seems okay: the HTML outputs all the items >> across all the pages, and then the actual pagination on the page is purely >> client-side javascript - so if you were to read the HTML on the page via >> python or so, you would just get all the items in the table without having >> to worry about pagination. >> >> Unfortunately, this does seem like a lot of work and I don't really have >> the time to do anything, but it seemed like an interesting problem and I >> was curious so I took a look. Hope it could help a bit. >> >> All the best, >> Sanjay >> >> On Fri, Feb 4, 2022 at 1:03 PM Piyush Kumar <psh.kumar1...@gmail.com> >> wrote: >> >>> Could folks here suggest how to go about this? >>> >>> >>> https://missionantyodaya.nic.in/preloginStateInfrastructureReports2020.html >>> >>> When we click this link, we get data on village-level infrastructure put >>> within multiple HTML tables across many pages (separated into state, dist., >>> block etc.) >>> >>> Suppose I want to scrape data upto the village level for a particular >>> state, is there any way I can get it done without too much back and forth >>> over Selenium webdriver? Please note that to access village level data you >>> have to go through a nested hierarchy of links (gram panchyt within block, >>> which is within a district and so on). To make matters more complicated, >>> the pages have also not been numbered. >>> >>> Can someone in the know help me figure this out? >>> >>> Thanks in advance >>> Piyush >>> >>> -- >>> Datameet is a community of Data Science enthusiasts in India. Know more >>> about us by visiting http://datameet.org >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "datameet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to datameet+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> Datameet is a community of Data Science enthusiasts in India. Know more >> about us by visiting http://datameet.org >> --- >> You received this message because you are subscribed to the Google Groups >> "datameet" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to datameet+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com >> <https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google Groups > "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to datameet+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com > <https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAC0MVF8WsXH34dyUXdmG5ywdJY4o5bivt3-Dw4qOC_%2Bwx6CFQQ%40mail.gmail.com.