Hey Avinash et al,

using population figures for matching is a neat idea, great!

Meanwhile, I made both legal (licensing issues) and mathematical
progress on matching 2001 Census villages to 2014 polling booths. I have
a large conference next week which might delay things, but I expect to
bring out an open license dataset with the resulting matching table soon
after that. Of course, the matching quality with my strategy entirely
depends on accuracy of GIS data, which varies from district to district
(in some districts, the officers concerned clearly decided to photoshop
rather than visit each station, resulting in a neat artificial grid -
quite funny to see, but quite useless otherwise). Theoretically, one
could combine my algorithm with a fuzzy "name proximity" measure, but I
am not sure yet whether this will improve accuracy or just add confusion.

Anyways, it will be interesting to combine my approach with yours, and
with that of others going down similar roads.

Which still does not solve the 2001 to 2011 Census mapping of course,

Best,
Raphael

On 15.03.2014 08:43, Avinash Celestine wrote:
> Re the issue of mapping of wards to AC/PC boundaries etc raised by
> Siddharth and the original subject of this thread, here's my two bits.
> warning, longish and extremely boring email follows! :
> 
> firstly, you might be better off using the delimitation xls files rather
> than pdfs. they are here:
> http://eci.nic.in/delim/paper1to7/finalpapers.asp
> 
> The delimitation papers map the AC and PC to census 2001 codes and
> boundaries. So its actually a two stage process - one, get the AC, PC
> census area/village/ward etc to line up neatly in a usable table or
> database, and
> two. map the 2001 census codes to the 2011 census codes
> 
> both are non trivial steps
> 
> one, the census categories which are specified in the AC/PC delimitation
> documents do not map neatly to the areas in the census data files. In
> the first place, there are no census codes for individual areas in the
> delim docs making it that much harder. secondly, the level of
> aggregation across the delim papers and the census docs itself is
> different. To take UP for example, many constituencies in the delim
> papers are mapped to what in UP are called 'KCs' (kanungoo circles) or
> PCs(patwari circles). They do not go down further and say which villages
> make up those KCs or PCs. ditto with bihar and other states. The census
> tables on the other hand, have data about villages, tehsils, and blocks.
> they do not go into KCs and PCs. So theres often a mismatch between
> aggregation levels which will take work to resolve.
> 
> two. mapping census 2001 codes to 2011 codes. if you do that, the
> starting point is here:
> https://egovstandards.gov.in/mapping_land_region_codification (the site
> will throw a security warning in many browsers but i think thats because
> nic has not updated its security certificates or whatever. its not been
> a problem for me, but you proceed at your own risk)
> 
> this maps 2001 codes to 2011 codes. Here's the problem:
> for urban areas, the coding goes down to the town level. So there is one
> town code for the whole of mumbai for example, which maps 2001 to 2011.
> What you cannot do with this table, and which is a big problem for urban
> areas, is map wards which exist as of 2001 census, to wards which exist
> as of 2011. Many city municipalities have rejigged ward boundaries over
> the last decade or so (i know delhi has). So even if you can match town
> codes, you still need to match wards from 2001 to 2011. All this is less
> of a problem for rural areas though its still present to some extent.
> This problem also makes 2001 and 2011 census data non comparable at a
> ward level because, if I recall correctly, census 2011 uses newly
> delimited wards whereas 2001 will (obviously) use the old ward boundaries. 
> 
> If you are interested in only a specific city or area, here's a
> suggestion. bypass the delim papers altogether. Start with the pdf
> electoral rolls which are now online for most states. The first page of
> the roll for each polling station has a standard format which has the
> area and ward boundaries, the AC, PC data as well aggregate number of
> electors. Write a scraper to parse just the first page of each roll. Of
> course if you are doing this for UP, you are totally screwed, because as
> Raphael pointed out earlier in this thread, there is a problem with the
> pdf unicode mapping so you'll basically get gibberish. But I think that
> pdfs for other cities may be more scrape-friendly. I tried it out with a
> couple of pdf rolls for delhi as a test case, and it worked reasonably
> well. The ward data from the scrape should line up easily with the
> census data. hopefully.  
> 
> Having said that, I did take a shot at doing the mapping atleast to the
> census 2001 codes. attached is the result. this is an excel (within zip
> file) with about 54000 rows, of which 20,000 rows are 'not matched' for
> reasons described above, so its very much a work in progress. Different
> states have been matched to different extents. UP, Bihar have big gaps,
> - states like delhi and gujarat less so. A few more points :
> * the left side of the table is from the delim papers, the right side is
> census 2001.
> * where the delim papers specify that only a 'part' of a ward or area
> are contained within that AC, i have worked out the proportion of the
> entire ward population that is covered, in the right most column. This
> column is not complete either.
> 
> A word about the methodology and my 'big breakthrough' :-)) in matching
> the two datasets even to this extent. The delim papers have
> population/Sc/ST data from the census. It struck me that given the
> district and state, these population numbers are actually a kind of
> unique identifier of their own. As in, the census population figure for
> village/ward (x) given the state and district, should match_down to the
> last individual_, the population figures from the census - in other
> words, exactly. So the matching field is some form of :
> statecode-districtcode-population total. This actually worked far better
> than i had hoped, though obviously not completely. As a cross check on
> the above, i re-ran the match, using state-districtcode-SC population/ST
> population. The possibility that two areas in a district have exactly
> the same total population _and _the same SC and ST population is, i
> hope, quite small. 
> 
> Anyway, I am hoping people can add to this ...The main caveat applies
> which is that the possibility of error is definitely there. So if you do
> use this for analysis, please, please, please do random cross checks.
> It'll take time, but it will save potential embarassment :-) and wrong
> data. And if you do find errors please fix and reupload.
> 
> 
> regards
> 
> Avinash
> 
> 
> 
> 
> 
> On Sat, Mar 15, 2014 at 11:28 AM, Avinash Celestine
> <avinash.celest...@gmail.com <mailto:avinash.celest...@gmail.com>> wrote:
> 
>     oh J&K is there after all. but would also be grateful if someone
>     could do a random check to see if the matches between PC/AC are correct.
> 
>     I took these from the delimitation final papers if someone wants to
>     know the source
> 
>     A
> 
> 
>     On Sat, Mar 15, 2014 at 11:27 AM, Avinash Celestine
>     <avinash.celest...@gmail.com <mailto:avinash.celest...@gmail.com>>
>     wrote:
> 
>         hi
> 
>         attached an excel with AC-PC-district -states matching along
>         with codes for AC-PC. I can add census district codes if you
>         like...give me a day or two
> 
>         some states are not present - like J&K... if someone could add
>         those that would be great
> 
>         Avinash
> 
> 
>         On Fri, Mar 14, 2014 at 10:27 PM, indro ray
>         <rayindro....@gmail.com <mailto:rayindro....@gmail.com>> wrote:
> 
>             Hi Anand (Chitipothu),
>             Can I know the source from where you get the polling booth
>             and ward data? Is it individual for each state and does it
>             provide the lat-long for the polling booths?
> 
>             Thanks,
>             Indro
> 
> 
>             On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
>             <anandol...@gmail.com <mailto:anandol...@gmail.com>> wrote:
> 
> 
> 
>                 On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
>                 <thriddas.ano...@gmail.com
>                 <mailto:thriddas.ano...@gmail.com>> wrote:
> 
>                     Hi All,
> 
>                     In line with the discussions on elections, this is
>                     something I'd started working on a while back (and
>                     dropped). I was essentially hoping for a PC to AC to
>                     Ward mapping. As far as I understand, census 2011
>                     has population data either at the level of the ward
>                     or the district, so if we had to run even
>                     rudimentary data analysis on a parliamentary or
>                     assembly constituency (like total population)
>                     accurately, I'm guessing we need to go bottom up.
> 
>                     I had started this by attempting to
>                     convert 
> http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf 
> into
>                     excel (using a mixture of pattern matching in
>                     notepad++ and a bit of excel vb). It's time
>                     consuming (largely because each state follows its
>                     own convention - not standardized)
> 
>                     Any suggestions on how one might go about this? If I
>                     wanted to estimate the population in a parliamentary
>                     constituency, or the total households, or the
>                     urban/rural split, how would I go about it? Is there
>                     a better method than looking at the above
>                     demarcation notification? Are there datasets on this
>                     already?
> 
>                     New to the group, didn't find any prior discussions
>                     on Parliamentary to Assembly to Ward/Village
>                     demarcations. 
> 
> 
>                 Hi Siddarth,
> 
>                 The voter list PDFs have the ward info for each polling
>                 booth. The PDFs have the number of voter, but not the
>                 population. So it possible to sum up those number to get
>                 a count of number of voters in a PC or AC.
> 
>                 If you want polling  booth to ward mapping, I'll be able
>                 to provide it.
> 
>                 Anand
> 
>                 -- 
>                 For more details about this list
>                 http://datameet.org/discussions/
>                 ---
>                 You received this message because you are subscribed to
>                 the Google Groups "datameet" group.
>                 To unsubscribe from this group and stop receiving emails
>                 from it, send an email to
>                 datameet+unsubscr...@googlegroups.com
>                 <mailto:datameet+unsubscr...@googlegroups.com>.
>                 For more options, visit https://groups.google.com/d/optout.
> 
> 
>             -- 
>             For more details about this list
>             http://datameet.org/discussions/
>             ---
>             You received this message because you are subscribed to the
>             Google Groups "datameet" group.
>             To unsubscribe from this group and stop receiving emails
>             from it, send an email to
>             datameet+unsubscr...@googlegroups.com
>             <mailto:datameet+unsubscr...@googlegroups.com>.
>             For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
      Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to