Re: [datameet] Pincode Boundaries of India

2016-04-01 Thread Raphael Susewind
Dear all,

following up on my earlier email, I just pushed a list of pincodes for
all electoral booths across India to GitHub and made a pull request to
the datameet repository:

https://github.com/datameet/pincodes/pull/2

Please note that this can be incomplete, and is based on a rather
brutish, quick and dirty hack - see comments in rolls2pincode.pl. But it
does use the same IDs as those in the 2014 elections, and hence can be
combined with my GIS shapefiles for polling booths:

http://dx.doi.org/10.4119/unibi/2674065

I leave it to others to double-check accuracy and create actual pincode
maps. I hope this is useful,

Best,
Raphael

On 28.03.2016 07:50, Raphael Susewind wrote:

> Dear Avinash and all,
> 
> I will try to make some time this week to scrape the pincodes from
> electoral rolls for all polling booths in my electoral GIS shapefiles.
> 
> Since pincode is in latin script, this should not be affected by the
> much discussed PDF scraping issues with electoral rolls.
> 
> We could then either go down the voronoi route, or alternatively use the
> heatmap processing chain that I used to generate AC boundaries - this
> latter would have the advantage of dealing with wrong coordinates in the
> booth point dataset (basically, not all electoral booth coordinates are
> correct; consequently, if we only voronoi, we would have a blip of
> pincode B within a see of pincode A quite frequently. The heatmap stuff
> takes care of this).
> 
> Since I am not familiar with postal boundaries: can anyone here confirm
> whether pincode areas are contiguous, and whether each pincode has only
> one area? Or can it be that several non-contiguous areas have the same
> pincodem intersparsed with other pincodes? (In which case voronoi would
> perhaps be the better solution at last)
> 
> In any case, I hope to give you the pincode for each polling booth by
> end of the week or so (based on all-India 2014 electoral rolls),
> 
> Best,
> Raphael
> 
> On 28.03.2016 06:33, Avinash Celestine wrote:
> 
>> perhaps one way is to avoid using postal data altogether.
>>
>> All header pages in electoral rolls(the first page) contain the name of
>> the polling station related to that roll, the PS number, and importantly
>> the pin code.
>>
>>  A site like psleci.nic.in  has geog coordinates
>> of polling stations (though Raphael had collected the data earlier*).
>> Matching the two will give a fairly dense scattering of points  - in
>> fact much more dense than if we used some of the methods earlier in this
>> thread.
>>
>> We thus have a way of associating a pin code with a geo coordinate. We
>> can then use the voronoi method.
>>
>> Electoral rolls are mostly in pdf which make them difficult to scrape.
>> But from what i have seen, for any given state, the location on the
>> header page, of the pincode number is more or less constant, making it
>> possible to target just that part of the page with any pdf parser.
>>
>> Electoral rolls have become difficult to download in bulk( a good
>> thing!) but i understand different people on this group have the pdfs
>> for different states. Putting this stuff together should give us
>> comprehensive data on header pages for atleast some states.
>> Alternatively, we can file RTIs for just the header pages of electoral
>> rolls, though i dont know how successful that would be.
>>
>> * Raphael's data is
>> at https://github.com/raphael-susewind/india-election-data
>>
>>
>>
>> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali > > wrote:
>>
>> Well, There were postal delivery zones in the past and the postal
>> department even used to make maps of these zones. The Delhi postal
>> delivery zone map
>> 
>> 
>>  had
>> boundaries for delhi. I am not sure if other cities had them or how
>> long the postal department was doing this, but it certainly can help
>> with the boundaries for cities.
>>
>> Regards,
>> Srinivas Kodali
>> www.lostprogrammer.com 
>> /"Not everyone who wanders is lost, I am probably a bit"/
>>
>> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh > > wrote:
>>
>> Shravan, crowdsourcing the boundaries of pincodes is not as
>> trivial as you think. To start with, an area does not fall under
>> a pincode, rather a street does based on the post office that
>> services it. Read
>> this: http://www.georeference.org/doc/zip_codes_are_not_areas.htm
>>
>> You may also want to do some background reading of existing
>> research that has been done by the group
>> here: https://datameet.hackpad.com/M4hPFJVV2Gm?eid=v4YoXN4tTw5
>>
>> To sum up, nobody has precise pincode boundaries like how you
>> imagine them, not even the postal department. Any existing
>> datasets are an esti

Re: [datameet] Pincode Boundaries of India

2016-04-01 Thread Avinash Celestine
Thanks v much Raphael. This is great.

On Friday 1 April 2016, Raphael Susewind  wrote:

> Dear all,
>
> following up on my earlier email, I just pushed a list of pincodes for
> all electoral booths across India to GitHub and made a pull request to
> the datameet repository:
>
> https://github.com/datameet/pincodes/pull/2
>
> Please note that this can be incomplete, and is based on a rather
> brutish, quick and dirty hack - see comments in rolls2pincode.pl. But it
> does use the same IDs as those in the 2014 elections, and hence can be
> combined with my GIS shapefiles for polling booths:
>
> http://dx.doi.org/10.4119/unibi/2674065
>
> I leave it to others to double-check accuracy and create actual pincode
> maps. I hope this is useful,
>
> Best,
> Raphael
>
> On 28.03.2016 07:50, Raphael Susewind wrote:
>
> > Dear Avinash and all,
> >
> > I will try to make some time this week to scrape the pincodes from
> > electoral rolls for all polling booths in my electoral GIS shapefiles.
> >
> > Since pincode is in latin script, this should not be affected by the
> > much discussed PDF scraping issues with electoral rolls.
> >
> > We could then either go down the voronoi route, or alternatively use the
> > heatmap processing chain that I used to generate AC boundaries - this
> > latter would have the advantage of dealing with wrong coordinates in the
> > booth point dataset (basically, not all electoral booth coordinates are
> > correct; consequently, if we only voronoi, we would have a blip of
> > pincode B within a see of pincode A quite frequently. The heatmap stuff
> > takes care of this).
> >
> > Since I am not familiar with postal boundaries: can anyone here confirm
> > whether pincode areas are contiguous, and whether each pincode has only
> > one area? Or can it be that several non-contiguous areas have the same
> > pincodem intersparsed with other pincodes? (In which case voronoi would
> > perhaps be the better solution at last)
> >
> > In any case, I hope to give you the pincode for each polling booth by
> > end of the week or so (based on all-India 2014 electoral rolls),
> >
> > Best,
> > Raphael
> >
> > On 28.03.2016 06:33, Avinash Celestine wrote:
> >
> >> perhaps one way is to avoid using postal data altogether.
> >>
> >> All header pages in electoral rolls(the first page) contain the name of
> >> the polling station related to that roll, the PS number, and importantly
> >> the pin code.
> >>
> >>  A site like psleci.nic.in  has geog coordinates
> >> of polling stations (though Raphael had collected the data earlier*).
> >> Matching the two will give a fairly dense scattering of points  - in
> >> fact much more dense than if we used some of the methods earlier in this
> >> thread.
> >>
> >> We thus have a way of associating a pin code with a geo coordinate. We
> >> can then use the voronoi method.
> >>
> >> Electoral rolls are mostly in pdf which make them difficult to scrape.
> >> But from what i have seen, for any given state, the location on the
> >> header page, of the pincode number is more or less constant, making it
> >> possible to target just that part of the page with any pdf parser.
> >>
> >> Electoral rolls have become difficult to download in bulk( a good
> >> thing!) but i understand different people on this group have the pdfs
> >> for different states. Putting this stuff together should give us
> >> comprehensive data on header pages for atleast some states.
> >> Alternatively, we can file RTIs for just the header pages of electoral
> >> rolls, though i dont know how successful that would be.
> >>
> >> * Raphael's data is
> >> at https://github.com/raphael-susewind/india-election-data
> >>
> >>
> >>
> >> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali <
> iota.kod...@gmail.com 
> >> > wrote:
> >>
> >> Well, There were postal delivery zones in the past and the postal
> >> department even used to make maps of these zones. The Delhi postal
> >> delivery zone map
> >> <
> https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing>
> had
> >> boundaries for delhi. I am not sure if other cities had them or how
> >> long the postal department was doing this, but it certainly can help
> >> with the boundaries for cities.
> >>
> >> Regards,
> >> Srinivas Kodali
> >> www.lostprogrammer.com 
> >> /"Not everyone who wanders is lost, I am probably a bit"/
> >>
> >> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh  
> >> > wrote:
> >>
> >> Shravan, crowdsourcing the boundaries of pincodes is not as
> >> trivial as you think. To start with, an area does not fall under
> >> a pincode, rather a street does based on the post office that
> >> services it. Read
> >> this:
> http://www.georeference.org/doc/zip_codes_are_not_areas.htm
> >>
> >> You may also want to do some backgro