Hi,

I don't think Selenium is required - this looks like it can be done with
just varying the request payload of one POST api call.
POST api call to URL:
https://missionantyodaya.nic.in/preloginVillageInfrastructureReports2020.html
the POST request content type is application/x-www-form-urlencoded

at *state level*, request payload is like:
stateCode: 27
stateName: MAHARASHTRA
districtCode:
districtName:
blockCode:
blockName:
gpCode:
gpName:

It* district level* it becomes:
stateCode: 27
stateName: MAHARASHTRA
districtCode: 469
districtName: AURANGABAD
blockCode:
blockName:
gpCode:
gpName:

then *block level*:
stateCode: 27
stateName: MAHARASHTRA
districtCode: 469
districtName: AURANGABAD
blockCode: 4315
blockName: KHULTABAD
gpCode:
gpName:

then* GP level:*
stateCode: 27
stateName: MAHARASHTRA
districtCode: 469
districtName: AURANGABAD
blockCode: 4315
blockName: KHULTABAD
gpCode: 170584
gpName: BODKHA

If in python, one can use Beautifulscrape to capture the table data as well
as get the (code + name) pairs for the next level.

--
Cheers,
Nikhil VJ
https://nikhilvj.co.in


On Fri, Feb 4, 2022 at 1:42 PM Sanjay Bhangar <sanjaybhan...@gmail.com>
wrote:

> Piyush -
>
> You could write a python (or your preferred language) script that just
> requests the HTML, parses it, and follows the hierarchy, without using
> selenium. This could be a bunch of work as the site doesn't use regular
> links with GET requests, but rather when you click on a state in the table,
> it uses Javascript to fill up hidden form fields with the state code, etc.
> and then does a form submit, causing a POST request to be made with those
> values.
>
> For eg. you can see the links in the table have an onClick handler like 
> "selectState(2,'HIMACHAL
> PRADESH','preloginDistrictInfrastructureReports2020.html')" .
>
> Then, in the javascript, you can see the selectState function defined like
> so:
>
> function selectState(stateCode,stateName,action){     
>       $("#stateCode").val(stateCode); 
>       $("#stateName").val(stateName); 
>       $("#reportForm").attr('action', action);
>       $("#reportForm").submit();
>
> }
>
> In this JS file:
> https://missionantyodaya.nic.in/resources/antyodaya/js/custom/prelogin/reports/preloginReport.js
>
> So this will make a POST request to
> preloginDistrictInfrastructureReports2020.html
> with stateCode=2, stateName=HIMACHAL PRADESH
>
> Similarly, there are different onCick handlers defined for selecting
> districts, etc. that you can follow down to see what URLs they are calling
> with what parameters. And in theory, you could write some HTML parsing code
> and some regex to go through the items in each table, parse out the
> parameters and URLs to call, and follow things down.
>
> So, in theory you could write this without mucking around with selenium,
> but it also seems like a lot more work than if the site was structured
> "normally" with unique URLs and GET requests.
>
> For the page numbering, this seems okay: the HTML outputs all the items
> across all the pages, and then the actual pagination on the page is purely
> client-side javascript - so if you were to read the HTML on the page via
> python or so, you would just get all the items in the table without having
> to worry about pagination.
>
> Unfortunately, this does seem like a lot of work and I don't really have
> the time to do anything, but it seemed like an interesting problem and I
> was curious so I took a look. Hope it could help a bit.
>
> All the best,
> Sanjay
>
> On Fri, Feb 4, 2022 at 1:03 PM Piyush Kumar <psh.kumar1...@gmail.com>
> wrote:
>
>> Could folks here suggest how to go about this?
>>
>>
>> https://missionantyodaya.nic.in/preloginStateInfrastructureReports2020.html
>>
>> When we click this link, we get data on village-level infrastructure put
>> within multiple HTML tables across many pages (separated into state, dist.,
>> block etc.)
>>
>> Suppose I want to scrape data upto the village level for a particular
>> state, is there any way I can get it done without too much back and forth
>> over Selenium webdriver? Please note that to access village level data you
>> have to go through a nested hierarchy of links (gram panchyt within block,
>> which is within a district and so on). To make matters more complicated,
>> the pages have also not been numbered.
>>
>> Can someone in the know help me figure this out?
>>
>> Thanks in advance
>> Piyush
>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com
> <https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com.

Reply via email to