Dear Avinash and all, I will try to make some time this week to scrape the pincodes from electoral rolls for all polling booths in my electoral GIS shapefiles.
Since pincode is in latin script, this should not be affected by the much discussed PDF scraping issues with electoral rolls. We could then either go down the voronoi route, or alternatively use the heatmap processing chain that I used to generate AC boundaries - this latter would have the advantage of dealing with wrong coordinates in the booth point dataset (basically, not all electoral booth coordinates are correct; consequently, if we only voronoi, we would have a blip of pincode B within a see of pincode A quite frequently. The heatmap stuff takes care of this). Since I am not familiar with postal boundaries: can anyone here confirm whether pincode areas are contiguous, and whether each pincode has only one area? Or can it be that several non-contiguous areas have the same pincodem intersparsed with other pincodes? (In which case voronoi would perhaps be the better solution at last) In any case, I hope to give you the pincode for each polling booth by end of the week or so (based on all-India 2014 electoral rolls), Best, Raphael On 28.03.2016 06:33, Avinash Celestine wrote: > perhaps one way is to avoid using postal data altogether. > > All header pages in electoral rolls(the first page) contain the name of > the polling station related to that roll, the PS number, and importantly > the pin code. > > A site like psleci.nic.in <http://psleci.nic.in> has geog coordinates > of polling stations (though Raphael had collected the data earlier*). > Matching the two will give a fairly dense scattering of points - in > fact much more dense than if we used some of the methods earlier in this > thread. > > We thus have a way of associating a pin code with a geo coordinate. We > can then use the voronoi method. > > Electoral rolls are mostly in pdf which make them difficult to scrape. > But from what i have seen, for any given state, the location on the > header page, of the pincode number is more or less constant, making it > possible to target just that part of the page with any pdf parser. > > Electoral rolls have become difficult to download in bulk( a good > thing!) but i understand different people on this group have the pdfs > for different states. Putting this stuff together should give us > comprehensive data on header pages for atleast some states. > Alternatively, we can file RTIs for just the header pages of electoral > rolls, though i dont know how successful that would be. > > * Raphael's data is > at https://github.com/raphael-susewind/india-election-data > > > > On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali <iota.kod...@gmail.com > <mailto:iota.kod...@gmail.com>> wrote: > > Well, There were postal delivery zones in the past and the postal > department even used to make maps of these zones. The Delhi postal > delivery zone map > > <https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing> > had > boundaries for delhi. I am not sure if other cities had them or how > long the postal department was doing this, but it certainly can help > with the boundaries for cities. > > Regards, > Srinivas Kodali > www.lostprogrammer.com <http://www.lostprogrammer.com> > /"Not everyone who wanders is lost, I am probably a bit"/ > > On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh <arungra...@gmail.com > <mailto:arungra...@gmail.com>> wrote: > > Shravan, crowdsourcing the boundaries of pincodes is not as > trivial as you think. To start with, an area does not fall under > a pincode, rather a street does based on the post office that > services it. Read > this: http://www.georeference.org/doc/zip_codes_are_not_areas.htm > > You may also want to do some background reading of existing > research that has been done by the group > here: https://datameet.hackpad.com/M4hPFJVV2Gm?eid=v4YoXN4tTw5 > > To sum up, nobody has precise pincode boundaries like how you > imagine them, not even the postal department. Any existing > datasets are an estimate at best using some data processing on a > large volume of address data. > > -- > Datameet is a community of Data Science enthusiasts in India. > Know more about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the > Google Groups "datameet" group. > To unsubscribe from this group and stop receiving emails from > it, send an email to datameet+unsubscr...@googlegroups.com > <mailto:datameet+unsubscr...@googlegroups.com>. > For more options, visit https://groups.google.com/d/optout. > > > -- > Datameet is a community of Data Science enthusiasts in India. Know > more about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google > Groups "datameet" group. > To unsubscribe from this group and stop receiving emails from it, > send an email to datameet+unsubscr...@googlegroups.com > <mailto:datameet+unsubscr...@googlegroups.com>. > For more options, visit https://groups.google.com/d/optout. > > > -- > Datameet is a community of Data Science enthusiasts in India. Know more > about us by visiting http://datameet.org > --- > You received this message because you are subscribed to the Google > Groups "datameet" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to datameet+unsubscr...@googlegroups.com > <mailto:datameet+unsubscr...@googlegroups.com>. > For more options, visit https://groups.google.com/d/optout. -- Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind Impact | https://impactstory.org/raphael-susewind Please consider https://www.gnupg.org for encryption (key id 10AEE42F) -- Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org --- You received this message because you are subscribed to the Google Groups "datameet" group. To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.