[datameet] Re: Data digitization using python

2024-04-15 Thread Pradeep Vanga
Hi Shiv,

Do you mind sharing a couple of sample pdfs? Do they contain structured 
data like tables or some other type of data?

On Tuesday, April 9, 2024 at 11:33:28 PM UTC+5:30 Shiv Hastawala wrote:

> Hi data enthusiasts
>
> I have a lot of publicly available data which are pdf scans of old 
> publications. I wish to digitize them as a public service. I found that the 
> following python package is pretty efficient at doing this job:
>
> https://layout-parser.readthedocs.io/en/latest/
>
>
> However, since I am python-illiterate, I was wondering if any of you 
> python enthusiasts would be interested in writing the code for this 
> project? Obviously, this is voluntary work. 
>
> Please reply to me personally if you are interested. Thanks!
>
> Thanks and regards.
>
>
> Yours sincerely
>
> *Shiv Hastawala*
>
> (He/His/Him)
> Doctoral Candidate
> Department of Economics
> Binghamton University (State University of New York)
>
> Email ID: shastaw1[at]binghamton[dot]edu
>
> Zoom ID: 201 717 2613 <(201)%20717-2613>
>
> www.shivhastawala.com
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/52428962-6dc3-4bdf-b82d-25e7f4ecdcfcn%40googlegroups.com.


[datameet] Re: MoSPI Unit level data

2023-11-27 Thread Pradeep Vanga
Hi Parvathi,

I suppose you are referring to NSS rounds like the 49 and 64 rounds. Are 
you looking for information on how to access/analyze the unit-level data or 
for any published research on this data?

On Monday, November 27, 2023 at 1:24:49 AM UTC+5:30 Parvathi Benu wrote:

> Hello
>
> I'm Parvathi Benu, a journalist from BusinessLine. 
> Just checking, if anyone here accessed the unit-level data of Migration in 
> India from MoSPI?
> The insights could be interesting.
>
>
> Take care and stay safe,
> Parvathi
> Twitter: @ParBen24
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/d9909395-0dfd-413e-997c-6841e483002en%40googlegroups.com.


Re: [datameet] Re: Looking for list of census village names for CG and MP in Devanagari

2023-02-24 Thread Pradeep Vanga
That makes sense Vivek.

On Monday, February 13, 2023 at 8:15:17 AM UTC+5:30 Vivek Matthew wrote:

> Hi Pradeep,
>
> The 12k that you're seeing on GitHub seems to be a limitation with the 
> number of lines of a single gist that can render on the web interface. The 
> raw CSV file when downloaded should contain the 22k+ lines you're looking 
> for: 
> https://gist.githubusercontent.com/Vonter/dde3c47dfd3ca11e678cea61821aa099/raw/ba6b45cfcd8f1e7f17ef038ac1a08a70410630e0/villages.csv
>
> Regards,
> Vivek
>
> On Monday, 13 February 2023 at 03:08:09 UTC+5:30 Pradeep Vanga wrote:
>
>> Hi Vivek, it looks like the csv file contains only about 12k+ entries.
>>
>> (It looks like I replied to the author and not this thread earlier, I 
>> have also scraped the data and uploaded it here :)  
>> https://www.kaggle.com/datasets/vangap/madhya-pradesh-village-list )
>
>
>>
>> On Monday, February 6, 2023 at 9:49:48 PM UTC+5:30 Vivek Matthew wrote:
>>
> Hi Sharad,
>>>
>>> Nice catch regarding the switch to Hindi. The choice of English/Hindi 
>>> names returned by the server is based on the cookie sent with the request.
>>>
>>> I've scraped the village list and put it as a CSV and JSON here: 
>>> https://gist.github.com/Vonter/dde3c47dfd3ca11e678cea61821aa099
>>>
>>> There are 23170 villages in there, but based on my count it looks like 
>>> there's about a dozen of them without Devanagiri names.
>>>
>>> On Saturday, 4 February 2023 at 11:01:58 UTC+5:30 shara...@gmail.com 
>>> wrote:
>>>
>>>> [image: mndagofkncnplmoa.png]
>>>> But btw, there is an option on the main PRD website to switch to Hindi, 
>>>> and when I do that, then when I go searching for specific Gram Panchayats, 
>>>> I do get this search menu, which suggests that at the backend somewhere 
>>>> the 
>>>> Hindi lists also exist? Any ideas? 
>>>> On Saturday, February 4, 2023 at 10:32:59 AM UTC+5:30 Sharad Lele wrote:
>>>>
>>>>> Dear Nikhil,
>>>>>
>>>>> Thanks for your help and yes, I assumed (incorrectly) that if the 
>>>>> menus are in Hindi then the data will also be in Hindi/Devanagari! 
>>>>> Unfortunately, as you pointed, out, the data are still in English/Roman 
>>>>> script. 
>>>>>
>>>>> Which means I have to expand my request: any one who can find a 
>>>>> website that has village name lists in Hindi/Devanagari (for MP in 
>>>>> particular), please flag. If someone has the data already in Devanagari, 
>>>>> great!
>>>>>
>>>>> Sreeram pointed out that the list on the govt of India's LGDIR website 
>>>>> has devanagari names for some states, but in the case of MP, the column 
>>>>> for 
>>>>> names in Devanagari is very sporadically filled!
>>>>>
>>>>> Sharad
>>>>>
>>>>> On 04-Feb-23 10:11, Nikhil VJ wrote:
>>>>>
>>>>> Hi Sharad, 
>>>>>
>>>>> The site you linked is quite easy to scrape with basic GET api calls 
>>>>> (aka you can open the url in browser also) giving the data in a proper 
>>>>> structure that can be directly used by a program.
>>>>>
>>>>> *But : the data is all in English only.*
>>>>>
>>>>> Anyways, in case you want to scrape, can get someone to do using:
>>>>>
>>>>> Districts list:
>>>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>>>>
>>>>> Take district ID from there to get local bodies list:
>>>>> https://www.prd.mp.gov.in/Handlers/localbodies.ashx?DistrictID=*45*
>>>>> _Rural=1
>>>>>
>>>>> Take "LBID" from there to get GP Zones:
>>>>> https://www.prd.mp.gov.in/Handlers/GpZones.ashx?LbId=*24319*
>>>>>
>>>>>
>>>>> Note to freshers in python coding who are looking for real world use 
>>>>> cases to learn and apply their skills: This is a good starting project. 
>>>>> Make 3 nested for loops and append all the results to a dict (json) 
>>>>> array. At end, convert to a pandas dataframe, and output to CSV.
>>>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>>>> --
>>>>> Cheers,
>>>>> Nikhil VJ
>>>>> https://nikhilvj.co.in
>>&g

Re: [datameet] Re: Looking for list of census village names for CG and MP in Devanagari

2023-02-12 Thread Pradeep Vanga
Hi Vivek, it looks like the csv file contains only about 12k+ entries.

(It looks like I replied to the author and not this thread earlier, I have 
also scraped the data and uploaded it here :)  
https://www.kaggle.com/datasets/vangap/madhya-pradesh-village-list )

On Monday, February 6, 2023 at 9:49:48 PM UTC+5:30 Vivek Matthew wrote:

> Hi Sharad,
>
> Nice catch regarding the switch to Hindi. The choice of English/Hindi 
> names returned by the server is based on the cookie sent with the request.
>
> I've scraped the village list and put it as a CSV and JSON here: 
> https://gist.github.com/Vonter/dde3c47dfd3ca11e678cea61821aa099
>
> There are 23170 villages in there, but based on my count it looks like 
> there's about a dozen of them without Devanagiri names.
>
> On Saturday, 4 February 2023 at 11:01:58 UTC+5:30 shara...@gmail.com 
> wrote:
>
>> [image: mndagofkncnplmoa.png]
>> But btw, there is an option on the main PRD website to switch to Hindi, 
>> and when I do that, then when I go searching for specific Gram Panchayats, 
>> I do get this search menu, which suggests that at the backend somewhere the 
>> Hindi lists also exist? Any ideas? 
>> On Saturday, February 4, 2023 at 10:32:59 AM UTC+5:30 Sharad Lele wrote:
>>
>>> Dear Nikhil,
>>>
>>> Thanks for your help and yes, I assumed (incorrectly) that if the menus 
>>> are in Hindi then the data will also be in Hindi/Devanagari! Unfortunately, 
>>> as you pointed, out, the data are still in English/Roman script. 
>>>
>>> Which means I have to expand my request: any one who can find a website 
>>> that has village name lists in Hindi/Devanagari (for MP in particular), 
>>> please flag. If someone has the data already in Devanagari, great!
>>>
>>> Sreeram pointed out that the list on the govt of India's LGDIR website 
>>> has devanagari names for some states, but in the case of MP, the column for 
>>> names in Devanagari is very sporadically filled!
>>>
>>> Sharad
>>>
>>> On 04-Feb-23 10:11, Nikhil VJ wrote:
>>>
>>> Hi Sharad, 
>>>
>>> The site you linked is quite easy to scrape with basic GET api calls 
>>> (aka you can open the url in browser also) giving the data in a proper 
>>> structure that can be directly used by a program.
>>>
>>> *But : the data is all in English only.*
>>>
>>> Anyways, in case you want to scrape, can get someone to do using:
>>>
>>> Districts list:
>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>>
>>> Take district ID from there to get local bodies list:
>>> https://www.prd.mp.gov.in/Handlers/localbodies.ashx?DistrictID=*45*
>>> _Rural=1
>>>
>>> Take "LBID" from there to get GP Zones:
>>> https://www.prd.mp.gov.in/Handlers/GpZones.ashx?LbId=*24319*
>>>
>>>
>>> Note to freshers in python coding who are looking for real world use 
>>> cases to learn and apply their skills: This is a good starting project. 
>>> Make 3 nested for loops and append all the results to a dict (json) 
>>> array. At end, convert to a pandas dataframe, and output to CSV.
>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>> --
>>> Cheers,
>>> Nikhil VJ
>>> https://nikhilvj.co.in
>>>
>>>
>>> On Fri, Feb 3, 2023 at 12:10 AM Sharad Lele  wrote:
>>>
 For instance, if someone can scrape the names from this website: 
 https://www.prd.mp.gov.in/GramSearch/SearchPanchayat.aspx  
 (sequentially, so as to get the district, block and GP tags also)

 On Thursday, February 2, 2023 at 9:47:01 PM UTC+5:30 Sharad Lele wrote:

> I am looking for the census village list for Chhattisgarh and Madhya 
> Pradesh (for starters) in Devanagari (Hindi script). Preferably with 
> Census 
> 2011 codes, so that I can quickly match them to the Census dataset, which 
> is in English. But even if no codes attached, an accurate list with 
> tehsil/block and district tags in digital format (not pdf hopefully) will 
> be a big help.
>
> Any suggestions, folks?
>
> Sharad
>
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more 
 about us by visiting http://datameet.org
 --- 
 You received this message because you are subscribed to the Google 
 Groups "datameet" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to datameet+u...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/datameet/4c644c88-1d3b-4e71-81a2-2e48d6c794c3n%40googlegroups.com
  
 
 .

>>> -- 
>>> Datameet is a community of Data Science enthusiasts in India. Know more 
>>> about us by visiting http://datameet.org
>>> --- 
>>>
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "datameet" group.
>>> To unsubscribe from this topic, visit 
>>> 

Re: [datameet] Re: Looking for list of census village names for CG and MP in Devanagari

2023-02-12 Thread Pradeep Vanga
(Looks like I replied to the author earlier, instead of replying in the 
thread)

I have also scraped this data a few days back btw 
https://www.kaggle.com/datasets/vangap/madhya-pradesh-village-list

Vivek, it looks like the CSV file in the gist contains only 12k entries 
while there should be around 22k+ I think?


On Monday, February 6, 2023 at 9:49:48 PM UTC+5:30 Vivek Matthew wrote:

> Hi Sharad,
>
> Nice catch regarding the switch to Hindi. The choice of English/Hindi 
> names returned by the server is based on the cookie sent with the request.
>
> I've scraped the village list and put it as a CSV and JSON here: 
> https://gist.github.com/Vonter/dde3c47dfd3ca11e678cea61821aa099
>
> There are 23170 villages in there, but based on my count it looks like 
> there's about a dozen of them without Devanagiri names.
>
> On Saturday, 4 February 2023 at 11:01:58 UTC+5:30 shara...@gmail.com 
> wrote:
>
>> [image: mndagofkncnplmoa.png]
>> But btw, there is an option on the main PRD website to switch to Hindi, 
>> and when I do that, then when I go searching for specific Gram Panchayats, 
>> I do get this search menu, which suggests that at the backend somewhere the 
>> Hindi lists also exist? Any ideas? 
>> On Saturday, February 4, 2023 at 10:32:59 AM UTC+5:30 Sharad Lele wrote:
>>
>>> Dear Nikhil,
>>>
>>> Thanks for your help and yes, I assumed (incorrectly) that if the menus 
>>> are in Hindi then the data will also be in Hindi/Devanagari! Unfortunately, 
>>> as you pointed, out, the data are still in English/Roman script. 
>>>
>>> Which means I have to expand my request: any one who can find a website 
>>> that has village name lists in Hindi/Devanagari (for MP in particular), 
>>> please flag. If someone has the data already in Devanagari, great!
>>>
>>> Sreeram pointed out that the list on the govt of India's LGDIR website 
>>> has devanagari names for some states, but in the case of MP, the column for 
>>> names in Devanagari is very sporadically filled!
>>>
>>> Sharad
>>>
>>> On 04-Feb-23 10:11, Nikhil VJ wrote:
>>>
>>> Hi Sharad, 
>>>
>>> The site you linked is quite easy to scrape with basic GET api calls 
>>> (aka you can open the url in browser also) giving the data in a proper 
>>> structure that can be directly used by a program.
>>>
>>> *But : the data is all in English only.*
>>>
>>> Anyways, in case you want to scrape, can get someone to do using:
>>>
>>> Districts list:
>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>>
>>> Take district ID from there to get local bodies list:
>>> https://www.prd.mp.gov.in/Handlers/localbodies.ashx?DistrictID=*45*
>>> _Rural=1
>>>
>>> Take "LBID" from there to get GP Zones:
>>> https://www.prd.mp.gov.in/Handlers/GpZones.ashx?LbId=*24319*
>>>
>>>
>>> Note to freshers in python coding who are looking for real world use 
>>> cases to learn and apply their skills: This is a good starting project. 
>>> Make 3 nested for loops and append all the results to a dict (json) 
>>> array. At end, convert to a pandas dataframe, and output to CSV.
>>> https://www.prd.mp.gov.in/Handlers/Districts.ashx?DivisionID=0
>>> --
>>> Cheers,
>>> Nikhil VJ
>>> https://nikhilvj.co.in
>>>
>>>
>>> On Fri, Feb 3, 2023 at 12:10 AM Sharad Lele  wrote:
>>>
 For instance, if someone can scrape the names from this website: 
 https://www.prd.mp.gov.in/GramSearch/SearchPanchayat.aspx  
 (sequentially, so as to get the district, block and GP tags also)

 On Thursday, February 2, 2023 at 9:47:01 PM UTC+5:30 Sharad Lele wrote:

> I am looking for the census village list for Chhattisgarh and Madhya 
> Pradesh (for starters) in Devanagari (Hindi script). Preferably with 
> Census 
> 2011 codes, so that I can quickly match them to the Census dataset, which 
> is in English. But even if no codes attached, an accurate list with 
> tehsil/block and district tags in digital format (not pdf hopefully) will 
> be a big help.
>
> Any suggestions, folks?
>
> Sharad
>
 -- 
 Datameet is a community of Data Science enthusiasts in India. Know more 
 about us by visiting http://datameet.org
 --- 
 You received this message because you are subscribed to the Google 
 Groups "datameet" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to datameet+u...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/datameet/4c644c88-1d3b-4e71-81a2-2e48d6c794c3n%40googlegroups.com
  
 
 .

>>> -- 
>>> Datameet is a community of Data Science enthusiasts in India. Know more 
>>> about us by visiting http://datameet.org
>>> --- 
>>>
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "datameet" group.
>>> To unsubscribe from this topic,