Piyush,

Took a look, looks like you can use these APIs. I'm providing the curl
requests. You can copy-paste them to https://curlconverter.com/ to convert
it into language of your choice :)

1. Get the blocks of a district

curl '
https://missionantyodaya.nic.in/getPreLoginAnalyticsData.html?stateCode=6&districtCode=61'
\
  -X 'POST' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 0' \
  -H 'Pragma: no-cache' \
  -H 'Cache-Control: no-cache' \
  -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google
Chrome";v="96"' \
  -H 'Accept: */*' \
  -H 'Content-Type: application/json' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'Origin: https://missionantyodaya.nic.in' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: https://missionantyodaya.nic.in/preloginAnalytics2020.html' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cookie:
JSESSIONID=obT6zCBsqbClJdpkAhrHxIbVaNog5IcQNt1WerzF.nqj1p-lxapp8-001' \
  --compressed


2. Get all the metrics for block

curl '
https://missionantyodaya.nic.in/getPreLoginAnalyticsData.html?stateCode=6&districtCode=61&blockCode=469'
\
  -X 'POST' \
  -H 'Connection: keep-alive' \
  -H 'Content-Length: 0' \
  -H 'Pragma: no-cache' \
  -H 'Cache-Control: no-cache' \
  -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google
Chrome";v="96"' \
  -H 'Accept: */*' \
  -H 'Content-Type: application/json' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36' \
  -H 'sec-ch-ua-platform: "Linux"' \
  -H 'Origin: https://missionantyodaya.nic.in' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: https://missionantyodaya.nic.in/preloginAnalytics2020.html' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cookie:
JSESSIONID=obT6zCBsqbClJdpkAhrHxIbVaNog5IcQNt1WerzF.nqj1p-lxapp8-001' \
  --compressed


I basically went to the analytics tab and looked for the API's being called.

On Fri, Feb 4, 2022 at 12:12 AM Sanjay Bhangar <sanjaybhan...@gmail.com>
wrote:

> Piyush -
>
> You could write a python (or your preferred language) script that just
> requests the HTML, parses it, and follows the hierarchy, without using
> selenium. This could be a bunch of work as the site doesn't use regular
> links with GET requests, but rather when you click on a state in the table,
> it uses Javascript to fill up hidden form fields with the state code, etc.
> and then does a form submit, causing a POST request to be made with those
> values.
>
> For eg. you can see the links in the table have an onClick handler like 
> "selectState(2,'HIMACHAL
> PRADESH','preloginDistrictInfrastructureReports2020.html')" .
>
> Then, in the javascript, you can see the selectState function defined like
> so:
>
> function selectState(stateCode,stateName,action){     
>       $("#stateCode").val(stateCode); 
>       $("#stateName").val(stateName); 
>       $("#reportForm").attr('action', action);
>       $("#reportForm").submit();
>
> }
>
> In this JS file:
> https://missionantyodaya.nic.in/resources/antyodaya/js/custom/prelogin/reports/preloginReport.js
>
> So this will make a POST request to
> preloginDistrictInfrastructureReports2020.html
> with stateCode=2, stateName=HIMACHAL PRADESH
>
> Similarly, there are different onCick handlers defined for selecting
> districts, etc. that you can follow down to see what URLs they are calling
> with what parameters. And in theory, you could write some HTML parsing code
> and some regex to go through the items in each table, parse out the
> parameters and URLs to call, and follow things down.
>
> So, in theory you could write this without mucking around with selenium,
> but it also seems like a lot more work than if the site was structured
> "normally" with unique URLs and GET requests.
>
> For the page numbering, this seems okay: the HTML outputs all the items
> across all the pages, and then the actual pagination on the page is purely
> client-side javascript - so if you were to read the HTML on the page via
> python or so, you would just get all the items in the table without having
> to worry about pagination.
>
> Unfortunately, this does seem like a lot of work and I don't really have
> the time to do anything, but it seemed like an interesting problem and I
> was curious so I took a look. Hope it could help a bit.
>
> All the best,
> Sanjay
>
> On Fri, Feb 4, 2022 at 1:03 PM Piyush Kumar <psh.kumar1...@gmail.com>
> wrote:
>
>> Could folks here suggest how to go about this?
>>
>>
>> https://missionantyodaya.nic.in/preloginStateInfrastructureReports2020.html
>>
>> When we click this link, we get data on village-level infrastructure put
>> within multiple HTML tables across many pages (separated into state, dist.,
>> block etc.)
>>
>> Suppose I want to scrape data upto the village level for a particular
>> state, is there any way I can get it done without too much back and forth
>> over Selenium webdriver? Please note that to access village level data you
>> have to go through a nested hierarchy of links (gram panchyt within block,
>> which is within a district and so on). To make matters more complicated,
>> the pages have also not been numbered.
>>
>> Can someone in the know help me figure this out?
>>
>> Thanks in advance
>> Piyush
>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/datameet/CAFtOtdujRhq36O4SW%3Dtie%2BSDH_6Pq1R87B6nVerzU4giQVka%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com
> <https://groups.google.com/d/msgid/datameet/CAG3W7ZE475WmeyR6Y9uXhKNh%3DLL7%3DhCwgeCjZ_fciEdWcfR_pA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAA%2BkmBBcOiXN%3D0qkarvnEAdtBVu84%2B5zQ_NpGmXtK7U%2BB7DnsA%40mail.gmail.com.

Reply via email to