I have a script in python for Mission Antyodaya if any one want I can help
you
Because I scrape the data GP wise and questionnaire also

On Sun, 6 Feb, 2022, 17:48 Nikhil VJ, <nikhil...@gmail.com> wrote:

> Hi,
>
> I don't think Selenium is required - this looks like it can be done with
> just varying the request payload of one POST api call.
> POST api call to URL:
> https://missionantyodaya.nic.in/preloginVillageInfrastructureReports2020.html
> the POST request content type is application/x-www-form-urlencoded
>
> at *state level*, request payload is like:
> stateCode: 27
> stateName: MAHARASHTRA
> districtCode:
> districtName:
> blockCode:
> blockName:
> gpCode:
> gpName:
>
> It* district level* it becomes:
> stateCode: 27
> stateName: MAHARASHTRA
> districtCode: 469
> districtName: AURANGABAD
> blockCode:
> blockName:
> gpCode:
> gpName:
>
> then *block level*:
> stateCode: 27
> stateName: MAHARASHTRA
> districtCode: 469
> districtName: AURANGABAD
> blockCode: 4315
> blockName: KHULTABAD
> gpCode:
> gpName:
>
> then* GP level:*
> stateCode: 27
> stateName: MAHARASHTRA
> districtCode: 469
> districtName: AURANGABAD
> blockCode: 4315
> blockName: KHULTABAD
> gpCode: 170584
> gpName: BODKHA
>
> If in python, one can use Beautifulscrape to capture the table data as
> well as get the (code + name) pairs for the next level.
>
> --
> Cheers,
> Nikhil VJ
> https://nikhilvj.co.in
>
>
> On Fri, Feb 4, 2022 at 1:42 PM Sanjay Bhangar <sanjaybhan...@gmail.com>
> wrote:
>
>> Piyush -
>>
>> You could write a python (or your preferred language) script that just
>> requests the HTML, parses it, and follows the hierarchy, without using
>> selenium. This could be a bunch of work as the site doesn't use regular
>> links with GET requests, but rather when you click on a state in the table,
>> it uses Javascript to fill up hidden form fields with the state code, etc.
>> and then does a form submit, causing a POST request to be made with those
>> values.
>>
>> For eg. you can see the links in the table have an onClick handler like 
>> "selectState(2,'HIMACHAL
>> PRADESH','preloginDistrictInfrastructureReports2020.html')" .
>>
>> Then, in the javascript, you can see the selectState function defined
>> like so:
>>
>> function selectState(stateCode,stateName,action){    
>>      $("#stateCode").val(stateCode); 
>>      $("#stateName").val(stateName); 
>>      $("#reportForm").attr('action', action);
>>      $("#reportForm").submit();
>>
>> }
>>
>> In this JS file:
>> https://missionantyodaya.nic.in/resources/antyodaya/js/custom/prelogin/reports/preloginReport.js
>>
>> So this will make a POST request to
>> preloginDistrictInfrastructureReports2020.html
>> with stateCode=2, stateName=HIMACHAL PRADESH
>>
>> Similarly, there are different onCick handlers defined for selecting
>> districts, etc. that you can follow down to see what URLs they are calling
>> with what parameters. And in theory, you could write some HTML parsing code
>> and some regex to go through the items in each table, parse out the
>> parameters and URLs to call, and follow things down.
>>
>> So, in theory you could write this without mucking around with selenium,
>> but it also seems like a lot more work than if the site was structured
>> "normally" with unique URLs and GET requests.
>>
>> For the page numbering, this seems okay: the HTML outputs all the items
>> across all the pages, and then the actual pagination on the page is purely
>> client-side javascript - so if you were to read the HTML on the page via
>> python or so, you would just get all the items in the table without having
>> to worry about pagination.
>>
>> Unfortunately, this does seem like a lot of work and I don't really have
>> the time to do anything, but it seemed like an interesting problem and I
>> was curious so I took a look. Hope it could help a bit.
>>
>> All the best,
>> Sanjay
>>
>> On Fri, Feb 4, 2022 at 1:03 PM Piyush Kumar <psh.kumar1...@gmail.com>
>> wrote:
>>
>>> Could folks here suggest how to go about this?
>>>
>>>
>>> https://missionantyodaya.nic.in/preloginStateInfrastructureReports2020.html
>>>
>>> When we click this link, we get data on village-level infrastructure put
>>> within multiple HTML tables across many pages (separated into state, dist.,
>>> block etc.)
>>>
>>> Suppose I want to scrape data upto the village level for a particular
>>> state, is there any way I can get it done without too much back and forth
>>> over Selenium webdriver? Please note that to access village level data you
>>> have to go through a nested hierarchy of links (gram panchyt within block,
>>> which is within a district and so on). To make matters more complicated,
>>> the pages have also not been numbered.
>>>
>>> Can someone in the know help me figure this out?
>>>
>>> Thanks in advance
>>> Piyush
>>>
>>> --
>>> Datameet is a community of Data Science enthusiasts in India. Know more
>>> about us by visiting http://datameet.org
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "datameet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to datameet+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com
>> <https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com
> <https://groups.google.com/d/msgid/datameet/CAH7jeuNzEB%3DUVqgG0mYVtrKjWTHeAdN6d_%3DFnz9LLCsE4QH1eA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAC0MVF8WsXH34dyUXdmG5ywdJY4o5bivt3-Dw4qOC_%2Bwx6CFQQ%40mail.gmail.com.

Reply via email to