Re: [datameet] Re: MP/MLA Shapes

2013-12-14 Thread Raphael Susewind
Hey there - one alternative route is to recreate them from the Polling Station 
Point data that the ECI has recently put up - see here for UP: 
http://data.raphael-susewind.de/content/gis-shapefiles. Unfortunately, most 
states are still not cleaned up, but around the general elections one should be 
able to get decent data - Raphael

On Saturday, 9 November 2013 19:43:10 UTC+1, Gautam John  wrote:

> Sadly, only from commercial sources.

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Polling stations

2014-01-09 Thread Raphael Susewind
Dear Anand,

I should probably first read properly then respond... If you are not
after the GIS data per se, you should be able to get the data in either

a) the Form 20 returns of past elections on each CEO site

b) [perhaps more useful] the "Download Electoral Roll as PDF" databases
of each CEO - you don't have to scrape the actual PDFs but could just
use the information in the dropdown lists they usually use

Best,
Raphael

On 09.01.2014 09:08, Anand Chitipothu wrote:
> Hi,
> 
> I'm looking for information like, name, constituency, number of voters
> etc. of all polling stations in India. Has someone already scrapped this
> data?
> 
> The names of the polling stations is available at:
> 
> http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
> 
> Are there any other places where this information is available?
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Polling stations

2014-01-09 Thread Raphael Susewind
Hey Anand,

I have done quite a bit of work on this. One problem is, there are two
datasets - one cleaned up, one preliminary:

http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
http://www.eci-polldaymonitoring.nic.in/psleci/default.aspx

For the preliminary data for UP, have a look at my website here:
http://data.raphael-susewind.de/content/gis-shapefiles

For all-India data, I have the preliminary raw point data, but can't
make up my mind whether I should clean it up and make it available now,
or hope that the EC themselves will clean it up further in the current
run-up to the general elections, in which case I could save myself the
trouble and just wait a few months longer.

Also, I am wary of the polling booth ID codes at the moment; for UP, for
instance, they changed slightly with the current roll revision - booth
IDs from 2011-13 are not necessarily the same as those in 2014.
Currently, my website operates with 2011-13 IDs, and I intend to wait a
little longer until I upgrade to 2014 IDs...

And nope, there are no other places where this data is available to my
knowledge (unless you know somebody deep inside the NIC or EC),

Best,
Raphael

On 09.01.2014 09:08, Anand Chitipothu wrote:
> Hi,
> 
> I'm looking for information like, name, constituency, number of voters
> etc. of all polling stations in India. Has someone already scrapped this
> data?
> 
> The names of the polling stations is available at:
> 
> http://www.eci-polldaymonitoring.nic.in/psl/default.aspx
> 
> Are there any other places where this information is available?
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Polling stations

2014-01-09 Thread Raphael Susewind
Dear Anand,

I don't mind sharing at all, have sent you a private email (attachment
is too long for the googlegroup),

Best,
Raphael

On 09.01.2014 10:48, Anand Chitipothu wrote:
> 
> On Thu, Jan 9, 2014 at 2:21 PM, Raphael Susewind
> mailto:m...@raphael-susewind.de>> wrote:
> 
> Dear Anand,
> 
> I should probably first read properly then respond... If you are not
> after the GIS data per se, you should be able to get the data in either
> 
> 
> Dear  Raphael, 
> 
> Thanks for your inputs. Yes, i'm not looking for GIS data. As of now, I
> just want the list of polling stations and their constituencies. More
> information like, number of voters, polling percentage in last elections
> etc. will be valuable too.
> 
> a) the Form 20 returns of past elections on each CEO site
> 
> b) [perhaps more useful] the "Download Electoral Roll as PDF" databases
> of each CEO - you don't have to scrape the actual PDFs but could just
> use the information in the dropdown lists they usually use
> 
> 
> I don't mind scraping a website or two, but doing that for every state
> is too much of work as each website is completely different.
> 
> If you have id, name and constituency for all polling booths, I would
> love to have a look at it. Would you mind sharing it?
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] Re: MP/MLA Shapes

2014-01-21 Thread Raphael Susewind
Dear Krishnan,

I will publish some on said website, but only later this year - the
cleaning up process just takes a lot of attention and time which I don't
have for the coming months,

Best,
Raphael

On 21.01.2014 11:52, Krishna Prasanth wrote:
> Hi everyone
> 
> Has anyone finally managed AC shapefiles post delimitation. Does anyone
> have shapefiles just for Delhi if not other states? I am a novice and I
> would be really glad if someone could help. 
> 
> Thank you
> 
> 
> On Saturday, December 14, 2013 2:54:05 PM UTC+5:30, Raphael Susewind wrote:
> 
> Hey there - one alternative route is to recreate them from the
> Polling Station Point data that the ECI has recently put up - see
> here for UP: http://data.raphael-susewind.de/content/gis-shapefiles
> <http://data.raphael-susewind.de/content/gis-shapefiles>.
> Unfortunately, most states are still not cleaned up, but around the
> general elections one should be able to get decent data - Raphael
> 
> On Saturday, 9 November 2013 19:43:10 UTC+1, Gautam John  wrote:
> 
> > Sadly, only from commercial sources.
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "datameet" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/datameet/AZrQAoeeDhU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Election data (Was: [datameet] [Bangalore] Open Data Camp 2014 Confirmed March 22nd and 23rd - Google Offices)

2014-02-18 Thread Raphael Susewind
Hey everybody,

I am working on PDF electoral rolls, but struggle with unicode
conversion issues (a Crystal Reports bug in the version the ECI
currently uses, at least in some states such as UP or Gujarat, which
leads to a corrupted ToUnicodeCMap, which means you cannot properly copy
and paste from the PDF, or otherwise extract proper UTF8). If your 'free
the pdf event' finds a way around this, do let me know - likewise I
shall send any progress from my side...

Best,
Raphael

On 18.02.2014 10:22, Nisha Thompson wrote:
> This is awesome Anand. We're going to have a free the pdf event the
> weekend before.
> 
> Let's get together and make sure we don't duplicate effort. 
> 
> Will add your work to the pdf event wiki page. 
> 
> Anyone else with election data?
> 
> On Feb 18, 2014 2:49 PM, "Anand Chitipothu"  <mailto:anandol...@gmail.com>> wrote:
> 
> Hello everyone,
> 
> Looks like there is lot of interest in extracting election data from
> PDFs. I've started extracting lot of data from voter lists, esp. for
> Bangalore. 
> 
> All the data is still sitting in my server. Couple of samples:
> 
> http://d.anandology.com/karnataka/pdfs/AC154/AC1540111.json
> 
> I started a project to put all these things in a good shape, expose
> them on a website with a nice API.
> 
> https://github.com/anandology/opendata-ge2014
> 
> If any of you already working on such things, please let me know.
> There is no point duplicating each others efforts.
> 
> Anand
> 
> On Mon, Feb 17, 2014 at 1:13 PM, Nisha Thompson
> mailto:nisha.thomp...@gmail.com>> wrote:
> 
> Hey Everyone,
> 
> I would like to invite everyone to the 2014 Bangalore Open Data
> Camp. March 22nd and 23rd!  Please register and get more
> information on the link below!
> 
> This is our 3rd Camp!  We are excited to be doing a deep dive
> into Election data this year!  The Loka Sabha Election is
> providing us a great opportunity to look at how this information
> is accessed, used, and ways to improve how we use data during
> elections. We will also be looking at accountability data and
> exploring data used in election coverage. The 2nd day there will
> be a Hackathon with Oorvani Foundation and others. 
> 
> http://odc.datameet.org/odcblr2014
> 
> This year we will be doing pre events
> Free the PDF - A one day hackathon that will continue the 2nd
> day of the conference to convert election data from PDFs to open
> formats.
> From District to Constituencies - A workshop with Geo Bangalore
> on dealing with the shape file issues around MP/MLA constituencies. 
> 
> Please register today space is limited.  Also we are being more
> restricted this year for the first day of talks, the 2nd day
> will be more open.  So if you want to speak please take a look
> at the tracks and contact Thej or I directly.  
> 
> There will be more information to come!  Also we are looking for
> volunteers to help run the event and pre events. Please dm me if
> you have some time and want to help. 
> 
> Thanks to our Sponsors Google, Akshara and Oorvani Foundation!
> 
> I hope to you see you all there!
> 
> Nisha
> 
> -- 
> Nisha Thompson
> Mobile: 962-061-2245
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> 
> 
> -- 
> Anand
> http://anandology.com/
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received

Re: [datameet] India 2001 census data village-wise

2014-03-03 Thread Raphael Susewind
Hey,

this was sent over the list a few days ago:

Hi all,

I know some census 2011 data has been posted already, but I thought I'd
share the primary abstract data I have down to the town/village level.
You can download it here: http://journeyman-data.com/census2011/. Please
see the variable list/readme for details.

Best,
Eric Dodge

On 01.03.2014 06:23, Fenella C wrote:
> Hello everyone, 
> 
> I am wondering if any of you have the village-wise 2001 Indian census
> data in a spreadsheet (or similar) format? I am basically looking for
> information at the village level from the 2001 census (e.g., population
> of the village, number of households in the village, etc.)
> 
> The data is available online at the census website
> here 
> http://www.censusindia.gov.in/Census_Data_2001/Village_Directory/View_data/Village_Profile.aspx
> but it is not available in a spreadsheet. I have already tried web
> scraping the data, but it is painfully slow, so I'm wondering if I can
> find it elsewhere.
> 
> Many thanks,
> Fenella
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [datameet] [Bangalore] Open Data Camp 2014 - Pre Events

2014-03-11 Thread Raphael Susewind
Hey everybody,

this sounds fascinating, and though I cannot be in India at the time, I
wanted to let the GIS group know that I am almost done with AC and PC
shapefiles for the whole country, which will be published under an open
license soon (so as to avoid double and triple work).

These are based on polling station localities published by the EC, and
transformed into constituency polygons using a rather complex
classification algorithm (and cross-checks against GDAL district
boundaries). More to be announced soon - perhaps even before the ODC,

Best,
Raphael

On 11.03.2014 13:46, Nisha Thompson wrote:
> Hey Everyone!
> 
> Just a reminder Open Data Camp is less than 2 Weeks Away!!  I hope you
> have registered!
> 
> This weekend we are doing some Pre Events to get the election data flowing!
> 
> We will be working for a few hours on liberating some Election Data. 
> 
> Saturday is a workshop on Drawing Constituency Shapefiles.
> Sunday is a Liberate PDF a thon for election data.  
> 
> These events will be at CIS and there will be Lunch.
> 
> You can sign up here!
> 
> http://odc.datameet.org/odcblr2014:georeferencing101
> http://odc.datameet.org/odcblr2014:liberating-pdf
> 
> I hope to see you all then!
> 
> Nisha
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-11 Thread Raphael Susewind
Hey Siddhart, and Anand,

I, too, am really interested in this, but have not made much progress
yet. I think there are two ways to do this, neither of which is
straightforward.

The "extract ward/village mentioned in roll PDF" strategy is one option.
Depending on raw data, this can however be cumbersome (one source in the
vernacular, one in latin script, etc); I know a couple of scholars who
attempt to do this and they are stuck all the time, having had to
manually match rather frequently (which is a pain given that there are
800.000 or so polling stations).

Currently, we have the additional problem that many of the current roll
PDFs - for instance in UP - are broken: one cannot copy-paste (or
pdftotext, or extract through whatever means) from them, chiefly because
the ToUnicodeCMap is corrupted by the version of CrystalReports the ECI
is using. There is no real workaround other than reverse-OCR, which is a
pain-in-the-a**. Let me know if you figure another way...

The second option would be a very different strategy, namely GIS
matching through next neighbour analysis: "what is the closest Census
village/ward around that particular polling booth" (or the other way
round - the computational challenge is to match ALL booths to at least
one ward AND vice versa). Unfortunately, Census village/ward lat/long is
not in the public domain, as far as I see - and using proprietary data
to do the matching is legally complicated (even if one redistributes
only the matching result and not the proprietary data).

My 5 cents,
Let us know of any progress,

Raphael

On 12.03.2014 05:17, Anand Chitipothu wrote:
> 
> On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu  <mailto:anandol...@gmail.com>> wrote:
> 
> 
> 
> On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
> mailto:thriddas.ano...@gmail.com>> wrote:
> 
> Hi All,
> 
> In line with the discussions on elections, this is something I'd
> started working on a while back (and dropped). I was essentially
> hoping for a PC to AC to Ward mapping. As far as I understand,
> census 2011 has population data either at the level of the ward
> or the district, so if we had to run even rudimentary data
> analysis on a parliamentary or assembly constituency (like total
> population) accurately, I'm guessing we need to go bottom up.
> 
> I had started this by attempting to
> convert 
> http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf 
> into
> excel (using a mixture of pattern matching in notepad++ and a
> bit of excel vb). It's time consuming (largely because each
> state follows its own convention - not standardized)
> 
> Any suggestions on how one might go about this? If I wanted to
> estimate the population in a parliamentary constituency, or the
> total households, or the urban/rural split, how would I go about
> it? Is there a better method than looking at the above
> demarcation notification? Are there datasets on this already?
> 
> New to the group, didn't find any prior discussions on
> Parliamentary to Assembly to Ward/Village demarcations. 
> 
> 
> Hi Siddarth,
> 
> The voter list PDFs have the ward info for each polling booth. The
> PDFs have the number of voter, but not the population. So it
> possible to sum up those number to get a count of number of voters
> in a PC or AC.
> 
> If you want polling  booth to ward mapping, I'll be able to provide it.
> 
> 
> btw, Anand Doshi has already parsed that PDF. The results are available at:
> 
> https://gist.github.com/anandpdoshi/9448203
> 
> Anand
> P.S: uff, so many Anands on this list
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-13 Thread Raphael Susewind
Hey Avinash,

yep - thats what I figured, too. Not only misplaced matras (those could
be rearranged), but a real garbling, which cannot be resolved as far as
I see. Worse, there isnt even a clear pattern - for a few
constituencies, I fed the Voter ID (which is in latin script) to the
"search roll details by voter ID" function on the CEO website, which
returns the properly written unicode name. I then compared garbled name
and unicode name to see if there are any statistical regularities - yet
unfortunately, there are a thousand ways of garbling "Avinash" - its not
always "Abniszhaa".

The only solution I can think of is the following (but I have not
implemented it): train TesserAct (an IndicScript OCR) with the exact
font used in the PDF reports, so that it almost perfectly recognizes
something written in this font (this was a stumblestone for me, rather
complicated work), then extract images of text areas of interest, and
run them through OCR. If you want to give it a shot...

Otherwise, we could only try to convince the EC to fix the bug in
Crystal Reports, and re-generate all PDFs - which is highly unlikely,
they have more important things to do right now (the PDFs display and
print alright, after all, just text extraction does not work - they
would perhaps even consider it a feature rather than a bug).

It might be useful to compile a list of states where this problem occurs
- I have seen it in Gujarat and UP for sure, but don't know whether it
happens everywhere,

Best,
Raphael

On 13.03.2014 05:35, Avinash Celestine wrote:
> well i checked out the unicode table and it only confirms what we knew
> anyway... that there's duplication of unicode hex values for different
> characters... 
> 
> So i guess its back to the drawing board.
> 
> 
> On Thu, Mar 13, 2014 at 9:43 AM, Avinash Celestine
> mailto:avinash.celest...@gmail.com>> wrote:
> 
> Hi Raphael
> 
> In fact the problem with the UP rolls is exactly what I am grappling
> with now. It seems to me that one way is to look at the exact
> mapping of Unicode characters embedded within the files. One way of
> generating such maps is to use a plugin like PDFLIBs font reporter
> which works with Adobe
> Acrobat(http://www.pdflib.com/products/fontreporter/). Have you
> tried out this method and did it work for you? Do tell me if you (or
> anyone else) has given it a shot. I am planning to give it a go
> atleast...
> 
> I have attached a sample roll (of an AC in Agra), along with the
> generated font report if anyone wants to give it a look
> 
> A closer look at the roll shows that the main problem seems to be
> with the Devanagari 'matras' which are not rendering correctly when
> you cut and paste
> 
> regards
> 
> Avinash
> 
> 
> On Wed, Mar 12, 2014 at 12:19 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hey Siddhart, and Anand,
> 
> I, too, am really interested in this, but have not made much
> progress
> yet. I think there are two ways to do this, neither of which is
> straightforward.
> 
> The "extract ward/village mentioned in roll PDF" strategy is one
> option.
> Depending on raw data, this can however be cumbersome (one
> source in the
> vernacular, one in latin script, etc); I know a couple of
> scholars who
> attempt to do this and they are stuck all the time, having had to
> manually match rather frequently (which is a pain given that
> there are
> 800.000 or so polling stations).
> 
> Currently, we have the additional problem that many of the
> current roll
> PDFs - for instance in UP - are broken: one cannot copy-paste (or
> pdftotext, or extract through whatever means) from them, chiefly
> because
> the ToUnicodeCMap is corrupted by the version of CrystalReports
> the ECI
> is using. There is no real workaround other than reverse-OCR,
> which is a
> pain-in-the-a**. Let me know if you figure another way...
> 
> The second option would be a very different strategy, namely GIS
> matching through next neighbour analysis: "what is the closest
> Census
> village/ward around that particular polling booth" (or the other way
> round - the computational challenge is to match ALL booths to at
> least
> one ward AND vice versa). Unfortunately, Census village/ward
> lat/long is
> not in the public domain, as far as I see - and using
> proprietary data
> to do the matc

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-13 Thread Raphael Susewind
Hi all,

apropos Anand Doshi's  https://gist.github.com/anandpdoshi/9448203 -
does somebody have the same table including AC constituency ID codes
(rather than just names)?

Best,
Raphael

On 14.03.2014 03:09, Siddarth Raman wrote:
> Hi Anand,
> 
> Thanks, but the csv link me only has the PC to AC mapping (still
> awesomely useful!).
> 
> Also hoping for ward level details. My intent isn't necessarily focused
> on Voter Rolls. It's on the larger census data itself. What % of the
> population is enrolled at the PC (or AC or Ward) level? Was looking to
> calculate that data, and then overlay voter enrollment data on top as
> and when required. 
> 
> Regards,
> Siddarth
> 
> On Wednesday, March 12, 2014 9:47:14 AM UTC+5:30, Anand Chitipothu wrote:
> 
> 
> On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
> > wrote:
> 
> 
> 
> On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
> > wrote:
> 
> Hi All,
> 
> In line with the discussions on elections, this is something
> I'd started working on a while back (and dropped). I was
> essentially hoping for a PC to AC to Ward mapping. As far as
> I understand, census 2011 has population data either at the
> level of the ward or the district, so if we had to run even
> rudimentary data analysis on a parliamentary or assembly
> constituency (like total population) accurately, I'm
> guessing we need to go bottom up.
> 
> I had started this by attempting to
> convert 
> http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf
> 
> <http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf>
>  into
> excel (using a mixture of pattern matching in notepad++ and
> a bit of excel vb). It's time consuming (largely because
> each state follows its own convention - not standardized)
> 
> Any suggestions on how one might go about this? If I wanted
> to estimate the population in a parliamentary constituency,
> or the total households, or the urban/rural split, how would
> I go about it? Is there a better method than looking at the
> above demarcation notification? Are there datasets on this
> already?
> 
> New to the group, didn't find any prior discussions on
> Parliamentary to Assembly to Ward/Village demarcations. 
> 
> 
> Hi Siddarth,
> 
> The voter list PDFs have the ward info for each polling booth.
> The PDFs have the number of voter, but not the population. So it
> possible to sum up those number to get a count of number of
> voters in a PC or AC.
> 
> If you want polling  booth to ward mapping, I'll be able to
> provide it.
> 
> 
> btw, Anand Doshi has already parsed that PDF. The results are
> available at:
> 
> https://gist.github.com/anandpdoshi/9448203
> <https://gist.github.com/anandpdoshi/9448203>
> 
> Anand
> P.S: uff, so many Anands on this list
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-13 Thread Raphael Susewind
Looks great - all states would be even better... perhaps at the ODC
hackathon next weekend? R

On 14.03.2014 07:50, Anand Chitipothu wrote:
> 
> On Fri, Mar 14, 2014 at 12:14 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi all,
> 
> apropos Anand Doshi's  https://gist.github.com/anandpdoshi/9448203 -
> does somebody have the same table including AC constituency ID codes
> (rather than just names)?
> 
> 
> I have that for some states.
> 
> https://github.com/anandology/opendata-ge2014/tree/master/data
> 
> If you want any other state, I can try to generate it.
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
Hi Avinash and all,

I realized that each constituency falls within only one district in your
file, but there are constituencies that span several districts and vice
versa (rare, but it happens). I attached a list of those, extracted from
polling-station data on eci-polldaymonitoring.nic.in. These are AC only,
naturally the problem would proliferate if you aggregate to PC,

Hope it helps,
Raphael

On 15.03.2014 06:57, Avinash Celestine wrote:
> hi
> 
> attached an excel with AC-PC-district -states matching along with codes
> for AC-PC. I can add census district codes if you like...give me a day
> or two
> 
> some states are not present - like J&K... if someone could add those
> that would be great
> 
> Avinash
> 
> 
> On Fri, Mar 14, 2014 at 10:27 PM, indro ray  <mailto:rayindro@gmail.com>> wrote:
> 
> Hi Anand (Chitipothu),
> Can I know the source from where you get the polling booth and ward
> data? Is it individual for each state and does it provide the
> lat-long for the polling booths?
> 
> Thanks,
> Indro
> 
> 
> On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
> mailto:anandol...@gmail.com>> wrote:
> 
> 
> 
> On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
> mailto:thriddas.ano...@gmail.com>>
> wrote:
> 
> Hi All,
> 
> In line with the discussions on elections, this is something
> I'd started working on a while back (and dropped). I was
> essentially hoping for a PC to AC to Ward mapping. As far as
> I understand, census 2011 has population data either at the
> level of the ward or the district, so if we had to run even
> rudimentary data analysis on a parliamentary or assembly
> constituency (like total population) accurately, I'm
> guessing we need to go bottom up.
> 
> I had started this by attempting to
> convert 
> http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf 
> into
> excel (using a mixture of pattern matching in notepad++ and
> a bit of excel vb). It's time consuming (largely because
> each state follows its own convention - not standardized)
> 
> Any suggestions on how one might go about this? If I wanted
> to estimate the population in a parliamentary constituency,
> or the total households, or the urban/rural split, how would
> I go about it? Is there a better method than looking at the
> above demarcation notification? Are there datasets on this
> already?
> 
> New to the group, didn't find any prior discussions on
> Parliamentary to Assembly to Ward/Village demarcations. 
> 
> 
> Hi Siddarth,
> 
> The voter list PDFs have the ward info for each polling booth.
> The PDFs have the number of voter, but not the population. So it
> possible to sum up those number to get a count of number of
> voters in a PC or AC.
> 
> If you want polling  booth to ward mapping, I'll be able to
> provide it.
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
cel vb). It's time
> consuming (largely because each state follows its
> own convention - not standardized)
> 
> Any suggestions on how one might go about this? If I
> wanted to estimate the population in a parliamentary
> constituency, or the total households, or the
> urban/rural split, how would I go about it? Is there
> a better method than looking at the above
> demarcation notification? Are there datasets on this
> already?
> 
> New to the group, didn't find any prior discussions
> on Parliamentary to Assembly to Ward/Village
> demarcations. 
> 
> 
> Hi Siddarth,
> 
> The voter list PDFs have the ward info for each polling
> booth. The PDFs have the number of voter, but not the
> population. So it possible to sum up those number to get
> a count of number of voters in a PC or AC.
> 
> If you want polling  booth to ward mapping, I'll be able
> to provide it.
> 
> Anand
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to
> the Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-15 Thread Raphael Susewind
Might well be the rule (I remember having read something like this,
too), but the reality apparently differs (at least in the EC's own
data)... Never depend on rules, check them! ;-)

On 15.03.2014 08:58, Avinash Celestine wrote:
> thanks. the rule, as far as i remember, is that ACs are entirely
> contained within a district boundary. PCs, on the other hand, can span
> across district boundaries.
> 
> A
> 
> 
> On Sat, Mar 15, 2014 at 1:19 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Avinash and all,
> 
> I realized that each constituency falls within only one district in your
> file, but there are constituencies that span several districts and vice
> versa (rare, but it happens). I attached a list of those, extracted from
> polling-station data on eci-polldaymonitoring.nic.in
> <http://eci-polldaymonitoring.nic.in>. These are AC only,
> naturally the problem would proliferate if you aggregate to PC,
> 
> Hope it helps,
> Raphael
> 
> On 15.03.2014 06:57, Avinash Celestine wrote:
> > hi
> >
> > attached an excel with AC-PC-district -states matching along with
> codes
> > for AC-PC. I can add census district codes if you like...give me a day
> > or two
> >
> > some states are not present - like J&K... if someone could add those
> > that would be great
> >
> > Avinash
> >
> >
> > On Fri, Mar 14, 2014 at 10:27 PM, indro ray
> mailto:rayindro@gmail.com>
> > <mailto:rayindro@gmail.com <mailto:rayindro@gmail.com>>>
> wrote:
> >
> > Hi Anand (Chitipothu),
> > Can I know the source from where you get the polling booth and
> ward
> > data? Is it individual for each state and does it provide the
> > lat-long for the polling booths?
> >
> > Thanks,
> > Indro
> >
> >
> > On Wed, Mar 12, 2014 at 9:45 AM, Anand Chitipothu
> > mailto:anandol...@gmail.com>
> <mailto:anandol...@gmail.com <mailto:anandol...@gmail.com>>> wrote:
> >
> >
> >
> > On Wed, Mar 12, 2014 at 8:19 AM, Siddarth Raman
> >  <mailto:thriddas.ano...@gmail.com> <mailto:thriddas.ano...@gmail.com
> <mailto:thriddas.ano...@gmail.com>>>
> > wrote:
> >
> > Hi All,
> >
> > In line with the discussions on elections, this is
> something
> > I'd started working on a while back (and dropped). I was
> > essentially hoping for a PC to AC to Ward mapping. As
> far as
> > I understand, census 2011 has population data either
> at the
> > level of the ward or the district, so if we had to run
> even
> > rudimentary data analysis on a parliamentary or assembly
> > constituency (like total population) accurately, I'm
> > guessing we need to go bottom up.
> >
> > I had started this by attempting to
> > convert
> 
> http://eci.nic.in/eci_main/CurrentElections/CONSOLIDATED_ORDER%20_ECI%20.pdf
> into
> > excel (using a mixture of pattern matching in
> notepad++ and
> > a bit of excel vb). It's time consuming (largely because
> > each state follows its own convention - not standardized)
> >
> > Any suggestions on how one might go about this? If I
> wanted
> > to estimate the population in a parliamentary
> constituency,
> > or the total households, or the urban/rural split, how
> would
> > I go about it? Is there a better method than looking
> at the
> > above demarcation notification? Are there datasets on this
> > already?
> >
> > New to the group, didn't find any prior discussions on
> > Parliamentary to Assembly to Ward/Village demarcations.
> >
> >
> > Hi Siddarth,
> >
> > The voter list PDFs have the ward info for each polling booth.
> > The PDFs have the number of voter, but not the population.
> So it
> > possible to sum up those number to get a count of number of
> > voters in a P

Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-03-16 Thread Raphael Susewind
Hi Siddhart,

for my UP dataset, I used spatial matching of polling booth locations
against the MODIS urban extent satellite layer of 2002 - tends to be
larger urban centres, though. Another option is to look at "how many
polling stations have multiple booths" [polling stations being defined
as booths with almost same name in almost same location] - this turned
out to be a rather accurate (and up-to-date) representation of the
"urban" as well as "small town" - only real rural stations have only one
booth, in my experience (UP)...

Best,
Raphael

On 16.03.2014 06:03, Siddarth Raman wrote:
> Hi Avinash,
> 
> Thanks a ton for pointing out the excel files with delimitation. I read
> what you wrote. Will take a look at the zip fie and cross-check. I too
> had hoped the district mapping was contiguous with some political
> boundaries, but they aren't. Bangalore, funnily has a ward (44 I think)
> which is split across three different patches of land which don't share
> a boundary! 
> 
> For those interested in more background regarding the why of it all...
> 
> I was curious to understand what according to anyone is an Urban
> Parliamentary constituency? Mint had done a study a while back
> - 
> http://www.livemint.com/Specials/XovcjYRkWCBLJSwQwxY6wN/India-has-only-53-predominantly-urban-constituencies.html
>  -
> their main source was the million plus cities of India as per census.
> That sparked off the thought. I wanted to dig deeper. I thought that
> while one might disagree with the census definition of urban, it's a
> basis to begin with. Was hoping to look at all PC and AC with a % urban.
>> 50% would imply urban constituency (perhaps not the best method, but
> seemed like a good start)
> 
> I guess it isn't as easy as I imagined, but still would be good to
> figure out. Do let me know if anyone has other ideas.
> 
> Regards,
> Siddarth
> 
> 
> On Saturday, March 15, 2014 2:31:34 PM UTC+5:30, Avinash Celestine wrote:
> 
> hmm yes thats true. its basically an inefficient way to engineer
> seat gains - there are many other more efficient ways! 
> 
> A
> 
> 
> 
> 
> On Sat, Mar 15, 2014 at 2:00 PM, Srinivasan Ramani
> > wrote:
> 
> Interjecting in a fantastic conversation... (Kudos to Avinash &
> Raphael and others for the efforts to mix/match AC-PC and
> administrative jurisdictions)..
> 
> There is no direct containment of ACs within a district. Case in
> point is Delhi, where ACs dont' fit single districts at all. 
> 
> Avinash, 
> 
> Trouble with the kind of political delimitation that you talk
> about is that..it doesn't really serve any purpose. With
> cross-determination of powers at various levels - blocks, wards,
> districts under the bureaucracy vis-a-vis MLAs, changing
> administrative jurisdictions doesn't make much sense as much as
> doing direct gerrymandering for political vote-gaining. In other
> words, the powers of a MLA administratively is much too nebulous
> as compared to district officials across the bureaucracy and the
> third tier of democracy. 
> 
> 
> On Sat, Mar 15, 2014 at 1:49 PM, Avinash Celestine
> > wrote:
> 
> unfortunately you may be right... so thats another layer of
> complexity...
> 
> On a slightly related note, i have often thought, though i
> dont know if its actually possible in practice, for
> governments to do some delimitation on their own (for
> political purposes). For instance, if a village/area is near
> the border of a constituency, its possible through an order
> to bring it under the administrative jurisdiction of a
> neighbouring district. If that district is then served by a
> different AC, you have effectively done some delimitation of
> your own, without actually calling it that
> 
> given that delimitation papers don't specify individual
> villages in many cases, it seems entirely possible to do...
> 
> looking forward to your dataset, Raphael!
> 
> avinash
> 
> 
> On Sat, Mar 15, 2014 at 1:33 PM, Raphael Susewind
> > wrote:
> 
> Might well be the rule (I remember having read something
> like this,
> too), but the reality apparently differs (at least in
> the EC's own
> data)... Never depend on rules, check th

Re: [datameet] PDF scraping

2014-04-08 Thread Raphael Susewind
With linux and xpdf-tools its as easy as

pdftotext xyz.pdf
wc -w xyz.txt

Best,
Raphael

On 08.04.2014 20:44, Eric Dodge wrote:
> Seems like there are 2 steps here, getting the text into a more usable
> format and then getting the word counts. There are programs that let you
> dump pdf into text (http://pdf2txt.software.informer.com/3.2/ for
> example) in batches. Then paste the text into a tool like this
> (http://www.textfixer.com/tools/online-word-counter.php) to get the word
> counts.
> 
> Eric
> 
> 
> On Tue, Apr 8, 2014 at 12:54 AM, Suren Makkar  <mailto:suren.mak...@gmail.com>> wrote:
> 
> Hey guys,
> 
> Quick Rookie question, I'm trying to get total word counts for all
> occurring words in a bunch of PDFs, and I am lost. Help?
> 
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Mapping elections: open GIS shapefile drafts

2014-04-09 Thread Raphael Susewind
Dear all,

Krishna Prasnth's plea for AC shapefiles made me decide to start pushing
mine out there ahead of time in draft form at least. I would have loved
to have them ready before the Bangalore hackathon, but  such things take
time and I am quite busy.

Still, here they come at last: draft GIS shapefiles of parliamentary
constituencies, assembly constituencies and polling booth localities,
published under an open license (CC-BY-NC-SA 4.0):

http://www.raphael-susewind.de/blog/2014/mapping-indias-election

Unlike the hackathon files, these were created using an automated
algorithm (described in the blog post above). I intend to release (and
long-time archive) them by end of the month, and would welcome comments
and feedback until then: if you are familiar with both GIS and a
specific state, it would help me a lot if you could have a look.
Likewise, comments on the general method are very welcome.

So far, the smaller states are online, but I will add more on a rolling
basis - computing takes a few hours per constituency (longer for the
larger states). I hope to complete the set by end of the week.

Let me know if you find them useful,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Security Issues with the Voter List

2014-04-10 Thread Raphael Susewind
 age and voters ID number for every single registered voter of India
>  3. Nearly 25% of the Voter IDs assigned within only Delhi fail to
> conform to the government format, and fail the Luhn Checksum
> test used to validate them. It is likely that other states are
> in a similar, if not worse condition
> 
> 
> Regards,
> 
> Devdatta Tengshe
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Security Issues with the Voter List

2014-04-11 Thread Raphael Susewind
Chandrashekhar,

just on the specific issues of targeting communities, which I have
thought about a great deal (my first book was on post-2002 Gujarat), my
tentative conclusion is this:

The fact that electoral rolls had been used in the past in riots before
they were available online shows that rioters, if they want to, can
access this data already. As Gautam pointed out, it IS public by law.
What changes is merely the scale of data availability. Large-scale data
would only be 'more useful' for large-scale targeting, however
(small-scale targeting is possible already), which I don't see happening
at this time (with the troublesome exception of Gujarat, particularly
troublesome now that Mr Modi runs for PM - but here, too, the targeting
happened in small units on the ground, even though coordination took
place higher up). On the other hand, fine-grained large-scale data is
absolutely necessary to understand a range of issues about (religious,
caste) economic position. So that in this specific case, we have
additional benefits but no additional risk (beyond the worrisome risk
already out there)...

More detailed arguments about this in a forthcoming paper of mine at
http://pub.uni-bielefeld.de/publication/2631138

Best,
Raphael

On 11.04.2014 08:49, Chandrashekhar Raman wrote:
> Raphael, you raise very pertinent issues.
> 
> We as a community love open data and in this country there is a lot that
> can be done to free all kinds of data so that it can be made use of in a
> good way (election data in an aggregated form is one example). But at
> the same time there are certain kinds of data which are not open ( i
> mean not open in a machine readable format) for a good reason. I believe
> voter rolls data is one such type. In the past voter lists have been
> used to pinpoint members of specific communities which were then
> targeted with gruesome effect. Shudder to think what happens if it is
> automated, a 'riot app'?
> 
> As Raphael points out this is not just about privacy, but could be much
> worse.
> 
> This group is a fantastic initiative and as it evolves, it would be
> great for us to involve more social scientists and policy experts - so
> as we advocate vociferously to free more data and make it open - we can
> also bring in the technical expertise here to recommend where data needs
> to be better protected and how.
> 
> cs
> 
> 
> On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Devdatta and Avinash,
> 
> yes, I, too, am frankly surprised at the ease with which one can access
> sensitive data in bulk. Not only PDF rolls and voter details, but also
> things such as land records, BPL lists, and much more - I think we are
> in an exciting as well as dangerous phase of fairly uncontrolled,
> nascent e-Governance practices. But I think the ethical issues here are
> a little more complex than mere privacy concern.
> 
> Upfront, I must admit that I use all the above sources for academic
> research (in UP and across India). What Avinash described in principle
> and at the example of Delhi can indeed be done on an all-India scale,
> and I am sure there are more people than just me who do it.
> 
> But then the social sciences have long dealt with sensitive data and
> developed protocols to protect it. Even though the data is publicly
> available, I for instance have my own copy on a secure workstation with
> full disk encryption and two factor authentication. Whenever possible, I
> also work on anonymized subsets of data. Yet there are other potential
> uses - some of the more worrisome you pointed out - which are not bound
> by such data protection standards.
> 
> To me, this once more highlights the nascent stage of ethical standards
> around Big Data and eGovernance. On the plus side, I am happy to have
> that kind of access to conduct research which will ultimately be
> ethically beneficial, leading to better understanding of social issues
> and potentially to better policy advice. Also, there is a point to be
> made that transparency is an important asset in elections in particular,
> not only in terms of individual electoral search functions, but also in
> terms of publicly accessible (and cross-checkable, publicly verifiable)
> PDF rolls. Finally, a lot of this data had been available in the past as
> well, only in distributed and/or commercial form, which means there had
> been a hierarchy of access: small-time crooks could not use it, but
> large-time crooks were always able to use it. Likewise, scholars at
> large (often foreign) universities were able to use it, but not smaller
> ones (this is still 

Re: [datameet] Please Comment on Copyright License for DataMeet Work

2014-04-11 Thread Raphael Susewind
Hi all,

there is a good comparison of CC vs ODBl when applied to data at
http://www.dcc.ac.uk/resources/how-guides/license-research-data

also, any specific reason to use CC 2.0? There are CC 4.0 licenses
already, arguably more developed (and also more suitable for data, see
link above)...

My five cents,
Raphael

On 11.04.2014 09:24, Thejesh GN wrote:
> This is for the work related to DataMeet, Produced by DataMeet as part
> of events, hackathons or general work, for what sits on one of the
> DataMeet accounts. Like
> https://github.com/datameet
> https://www.youtube.com/user/datameet
> 
> _This doesn't apply to work by individuals themselves._
> 
> I am listing the license and thought process behind them. Please do comment.
> 
> ---
> *For artifacts: **CC BY-SA 2.0*
> https://creativecommons.org/licenses/by-sa/2.0/
> *Idea:* Allow everyone to use it, in any way they want, as long as they
> attribute and share in similar way
> 
> Share — copy and redistribute the material in any medium or format
> Adapt — remix, transform, and build upon the material for any purpose,
> even commercially.
> 
> Attribution — You must give appropriate credit, provide a link to the
> license, and indicate if changes were made. You may do so in any
> reasonable manner, but not in any way that suggests the licensor
> endorses you or your use.
> ShareAlike — If you remix, transform, or build upon the material, you
> must distribute your contributions under the same license as the original. 
> 
> 
> *For code: GNU/GPL*
> https://www.gnu.org/copyleft/gpl.html
> Allows commercial use and make them share alike just like (but not same)
> the  CC BY-SA 2.0
> 
> - Allows remix, share, distribute (all 5 freedoms)
> - Allows commercial usage
> - Makes attribution and share - compulsory
> 
> 
> 
> *For Data : Open Data Commons Open Database License (ODbL)*
> 
> If we want to use specific license for data then we can use this. This
> is similar to CC BY SA 2.0 http://opendatacommons.org/licenses/odbl/summary/
> 
> You are free:
> To Share: To copy, distribute and use the database.
> To Create: To produce works from the database.
> To Adapt: To modify, transform and build upon the database.
> As long as you:
> Attribute: You must attribute any public use of the database, or works
> produced from the database, in the manner specified in the ODbL. For any
> use or redistribution of the database, or works produced from it, you
> must make clear to others the license of the database and keep intact
> any notices on the original database.
> Share-Alike: If you publicly use any adapted version of this database,
> or works produced from an adapted database, you must also offer that
> adapted database under the ODbL.
> Keep open: If you redistribute the database, or an adapted version of
> it, then you may use technological measures that restrict the work (such
> as DRM) as long as you also redistribute a version without such measures.
> -
> 
> 
> Note: If we are extending some ones code/data/artifact, we can continue
> to use the license which the original author has used it. Its easy that
> way. If we start one fresh we can use one of ours.
> 
> Lets discuss this on the list. I will blog the conclusions/results on
> datameet.org/blog <http://datameet.org/blog> next wednesday for future
> reference. 
> 
> 
> Thanks a lot for your time.
> 
> 
> Thej
> --
> Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
> http://thejeshgn.com
> GPG ID :  0xBFFC8DD3C06DD6B0
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Please Comment on Copyright License for DataMeet Work

2014-04-11 Thread Raphael Susewind
Additional advantage of ODbl is that different parts of a compound
dataset can have different licenses, which makes it easier for pulling
together stuff from different sources.

On 11.04.2014 09:50, Thejesh GN wrote:
> 
> We can use  CC-BY-SA-4.0 for artifacts. It looks better and has
> everything CC-BY-SA-2.0 has
>  
> https://creativecommons.org/licenses/by-sa/4.0/
> 
> Share — copy and redistribute the material in any medium or format
> Adapt — remix, transform, and build upon the material
> for any purpose, even commercially.
> 
> As long as
> Attribution — You must give appropriate credit, provide a link to the
> license, and indicate if changes were made. You may do so in any
> reasonable manner, but not in any way that suggests the licensor
> endorses you or your use.
> 
> ShareAlike — If you remix, transform, or build upon the material, you
> must distribute your contributions under the same license as the original.
> 
> No additional restrictions — You may not apply legal terms or
> technological measures that legally restrict others from doing anything
> the license permits.
> 
> 
> 
> I think ODC-ODbl is good choice for data. It allows all kind of usage,
> along with attribution, sharealike and keep it open condition. Unless we
> have better choice, I think we can go with ODC-ODbl. 
> 
> 
> 
> 
> Thej
> --
> Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
> http://thejeshgn.com
> GPG ID :  0xBFFC8DD3C06DD6B0
> 
> 
> On Fri, Apr 11, 2014 at 12:57 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi all,
> 
> there is a good comparison of CC vs ODBl when applied to data at
> http://www.dcc.ac.uk/resources/how-guides/license-research-data
> 
> also, any specific reason to use CC 2.0? There are CC 4.0 licenses
> already, arguably more developed (and also more suitable for data, see
> link above)...
> 
> My five cents,
> Raphael
> 
> On 11.04.2014 09:24, Thejesh GN wrote:
> > This is for the work related to DataMeet, Produced by DataMeet as part
> > of events, hackathons or general work, for what sits on one of the
> > DataMeet accounts. Like
> > https://github.com/datameet
> > https://www.youtube.com/user/datameet
> >
> > _This doesn't apply to work by individuals themselves._
> >
> > I am listing the license and thought process behind them. Please
> do comment.
> >
> > ---
> > *For artifacts: **CC BY-SA 2.0*
> > https://creativecommons.org/licenses/by-sa/2.0/
> > *Idea:* Allow everyone to use it, in any way they want, as long as
> they
> > attribute and share in similar way
> >
> > Share — copy and redistribute the material in any medium or format
> > Adapt — remix, transform, and build upon the material for any purpose,
> > even commercially.
> >
> > Attribution — You must give appropriate credit, provide a link to the
> > license, and indicate if changes were made. You may do so in any
> > reasonable manner, but not in any way that suggests the licensor
> > endorses you or your use.
> > ShareAlike — If you remix, transform, or build upon the material, you
> > must distribute your contributions under the same license as the
> original.
> >
> > 
> > *For code: GNU/GPL*
> > https://www.gnu.org/copyleft/gpl.html
> > Allows commercial use and make them share alike just like (but not
> same)
> > the  CC BY-SA 2.0
> >
> > - Allows remix, share, distribute (all 5 freedoms)
> > - Allows commercial usage
> > - Makes attribution and share - compulsory
> >
> > 
> >
> > *For Data : Open Data Commons Open Database License (ODbL)*
> >
> > If we want to use specific license for data then we can use this. This
> > is similar to CC BY SA 2.0
> http://opendatacommons.org/licenses/odbl/summary/
> >
> > You are free:
> > To Share: To copy, distribute and use the database.
> > To Create: To produce works from the database.
> > To Adapt: To modify, transform and build upon the database.
> > As long as you:
> > Attribute: You must attribute any public use of the database, or works
> > produced from the database, in the manner specified in the ODbL.
> For any
> > use or redistribution of the database, or works produced from it, you
> > must make clear to others the license

Re: [datameet] Security Issues with the Voter List

2014-04-14 Thread Raphael Susewind
As a follow-up to this discussion:

electoralsearch.in began to implement rate limiting and selective IP
blocking yesterday. Sad as this is for my own research purposes, I
welcome the step from a privacy point of view...

Raphael

On 11.04.2014 10:56, Chandrashekhar Raman wrote:
> Raphael, To clarify, i am not trying to make a case against availability
> of fine grained data, far from it i'm with you on this argument among
> others that are made spuriously to restrict access. I might have
> stretched the point but then again - killing is just one extreme form of
> discrimination - there are others that are less visible
> 
> you summed it up very well, its good to have a healthy caution and
> unease when dealing with some of this data,there are probably no simple
> answers here. 
> 
> will read the paper at leisure.
> 
> cs.
> 
> 
> On Fri, Apr 11, 2014 at 12:37 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Chandrashekhar,
> 
> just on the specific issues of targeting communities, which I have
> thought about a great deal (my first book was on post-2002 Gujarat), my
> tentative conclusion is this:
> 
> The fact that electoral rolls had been used in the past in riots before
> they were available online shows that rioters, if they want to, can
> access this data already. As Gautam pointed out, it IS public by law.
> What changes is merely the scale of data availability. Large-scale data
> would only be 'more useful' for large-scale targeting, however
> (small-scale targeting is possible already), which I don't see happening
> at this time (with the troublesome exception of Gujarat, particularly
> troublesome now that Mr Modi runs for PM - but here, too, the targeting
> happened in small units on the ground, even though coordination took
> place higher up). On the other hand, fine-grained large-scale data is
> absolutely necessary to understand a range of issues about (religious,
> caste) economic position. So that in this specific case, we have
> additional benefits but no additional risk (beyond the worrisome risk
> already out there)...
> 
> More detailed arguments about this in a forthcoming paper of mine at
> http://pub.uni-bielefeld.de/publication/2631138
> 
> Best,
> Raphael
> 
> On 11.04.2014 08:49, Chandrashekhar Raman wrote:
> > Raphael, you raise very pertinent issues.
> >
> > We as a community love open data and in this country there is a
> lot that
> > can be done to free all kinds of data so that it can be made use
> of in a
> > good way (election data in an aggregated form is one example). But at
> > the same time there are certain kinds of data which are not open ( i
> > mean not open in a machine readable format) for a good reason. I
> believe
> > voter rolls data is one such type. In the past voter lists have been
> > used to pinpoint members of specific communities which were then
> > targeted with gruesome effect. Shudder to think what happens if it is
> > automated, a 'riot app'?
> >
> > As Raphael points out this is not just about privacy, but could be
> much
> > worse.
> >
> > This group is a fantastic initiative and as it evolves, it would be
> > great for us to involve more social scientists and policy experts - so
> > as we advocate vociferously to free more data and make it open -
> we can
> > also bring in the technical expertise here to recommend where data
> needs
> > to be better protected and how.
> >
> > cs
> >
> >
> > On Fri, Apr 11, 2014 at 11:44 AM, Raphael Susewind
> > mailto:li...@raphael-susewind.de>
> <mailto:li...@raphael-susewind.de
> <mailto:li...@raphael-susewind.de>>> wrote:
> >
> > Hi Devdatta and Avinash,
> >
> > yes, I, too, am frankly surprised at the ease with which one
> can access
> > sensitive data in bulk. Not only PDF rolls and voter details,
> but also
> > things such as land records, BPL lists, and much more - I
> think we are
> > in an exciting as well as dangerous phase of fairly uncontrolled,
> > nascent e-Governance practices. But I think the ethical issues
> here are
> > a little more complex than mere privacy concern.
> >
> > Upfront, I must admit that I use all the above sources for
> academic
> > research (in UP and across India). What Avinash 

[datameet] Polling station names and roll part names

2014-04-14 Thread Raphael Susewind
Dear all,

I am trying to find a list that links polling station names (usually
something like "City Montessory School Room 1") and roll part names
(usually something like "Mohalla XY"), preferably in latin script.

The PDF rolls have both data on the frontpage, but a) in regional
scripts and b) usually not extractable (the encoding bug we already
discussed on this list).

electoralsearch.in shows both data if one searches with an EPIC id from
that particular booth, but they shut out mass queries (and rightly so).

Does anybody know of any other scrapable data source for this?

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Polling station names and roll part names

2014-04-14 Thread Raphael Susewind
Hi Anand,

thanks, I have the psleci data already (its the basis for my electoral
maps). As for the part names, I have looked around on the UP CEO site,
and found that the BLO detail search function contains both part name
and station name - but for an all-India solution, I will have to slowly
query electoralsearch.in I think - so that the rate limiting does not
kick in...

Lets see,
RAphael

On 14.04.2014 10:54, Anand Chitipothu wrote:
> Hi Raphael,
> 
> It is possible to get the polling station names from:
> http://www.eci-polldaymonitoring.nic.in/psleci/Default.aspx
> 
> I have a scrap of that data I can share with you if you want it.
> 
> But if are looking for part name etc, I can't think of any other way
> than hitting election commission website with one query for polling
> booth. I don't think one query per booth should be considered mass
> queries. Did you try searching one the state election commission website
> with voter id?
> 
> Anand
> 
> 
> 
> On Mon, Apr 14, 2014 at 12:50 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> I am trying to find a list that links polling station names (usually
> something like "City Montessory School Room 1") and roll part names
> (usually something like "Mohalla XY"), preferably in latin script.
> 
> The PDF rolls have both data on the frontpage, but a) in regional
> scripts and b) usually not extractable (the encoding bug we already
> discussed on this list).
> 
> electoralsearch.in <http://electoralsearch.in> shows both data if
> one searches with an EPIC id from
> that particular booth, but they shut out mass queries (and rightly so).
> 
> Does anybody know of any other scrapable data source for this?
> 
> Best,
> Raphael
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Papers & Blog | http://www.raphael-susewind.de
> 
> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
> 
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> Anand
> http://anandology.com/
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Polling station names and roll part names

2014-04-14 Thread Raphael Susewind
Hi Anand,

short update: electoralsearch.in does return wrong part names (basically
they copy the station name field) at least for UP. So its back to CEO
sites...

Raphael

On 14.04.2014 10:54, Anand Chitipothu wrote:
> Hi Raphael,
> 
> It is possible to get the polling station names from:
> http://www.eci-polldaymonitoring.nic.in/psleci/Default.aspx
> 
> I have a scrap of that data I can share with you if you want it.
> 
> But if are looking for part name etc, I can't think of any other way
> than hitting election commission website with one query for polling
> booth. I don't think one query per booth should be considered mass
> queries. Did you try searching one the state election commission website
> with voter id?
> 
> Anand
> 
> 
> 
> On Mon, Apr 14, 2014 at 12:50 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> I am trying to find a list that links polling station names (usually
> something like "City Montessory School Room 1") and roll part names
> (usually something like "Mohalla XY"), preferably in latin script.
> 
> The PDF rolls have both data on the frontpage, but a) in regional
> scripts and b) usually not extractable (the encoding bug we already
> discussed on this list).
> 
> electoralsearch.in <http://electoralsearch.in> shows both data if
> one searches with an EPIC id from
> that particular booth, but they shut out mass queries (and rightly so).
> 
> Does anybody know of any other scrapable data source for this?
> 
> Best,
> Raphael
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Papers & Blog | http://www.raphael-susewind.de
> 
> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
> 
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> Anand
> http://anandology.com/
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- You received this message because you are subscribed to the Google
Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary Constituency to Assembly Constituency to Ward linkages

2014-04-17 Thread Raphael Susewind
Dear all,

just a follow-up to this oldish thread: I recently switched to the
newest version of TesserAct OCR to transform buggy PDF rolls to text -
and it works surprisingly well. Small typos here and there, but that can
be rectified. In case anyone else looks for a solution to this...

Best,
Raphael

On 13.03.2014 08:03, Raphael Susewind wrote:
> Hey Avinash,
> 
> yep - thats what I figured, too. Not only misplaced matras (those could
> be rearranged), but a real garbling, which cannot be resolved as far as
> I see. Worse, there isnt even a clear pattern - for a few
> constituencies, I fed the Voter ID (which is in latin script) to the
> "search roll details by voter ID" function on the CEO website, which
> returns the properly written unicode name. I then compared garbled name
> and unicode name to see if there are any statistical regularities - yet
> unfortunately, there are a thousand ways of garbling "Avinash" - its not
> always "Abniszhaa".
> 
> The only solution I can think of is the following (but I have not
> implemented it): train TesserAct (an IndicScript OCR) with the exact
> font used in the PDF reports, so that it almost perfectly recognizes
> something written in this font (this was a stumblestone for me, rather
> complicated work), then extract images of text areas of interest, and
> run them through OCR. If you want to give it a shot...
> 
> Otherwise, we could only try to convince the EC to fix the bug in
> Crystal Reports, and re-generate all PDFs - which is highly unlikely,
> they have more important things to do right now (the PDFs display and
> print alright, after all, just text extraction does not work - they
> would perhaps even consider it a feature rather than a bug).
> 
> It might be useful to compile a list of states where this problem occurs
> - I have seen it in Gujarat and UP for sure, but don't know whether it
> happens everywhere,
> 
> Best,
> Raphael
> 
> On 13.03.2014 05:35, Avinash Celestine wrote:
>> well i checked out the unicode table and it only confirms what we knew
>> anyway... that there's duplication of unicode hex values for different
>> characters... 
>>
>> So i guess its back to the drawing board.
>>
>>
>> On Thu, Mar 13, 2014 at 9:43 AM, Avinash Celestine
>> mailto:avinash.celest...@gmail.com>> wrote:
>>
>> Hi Raphael
>>
>> In fact the problem with the UP rolls is exactly what I am grappling
>> with now. It seems to me that one way is to look at the exact
>> mapping of Unicode characters embedded within the files. One way of
>> generating such maps is to use a plugin like PDFLIBs font reporter
>> which works with Adobe
>> Acrobat(http://www.pdflib.com/products/fontreporter/). Have you
>> tried out this method and did it work for you? Do tell me if you (or
>> anyone else) has given it a shot. I am planning to give it a go
>> atleast...
>>
>> I have attached a sample roll (of an AC in Agra), along with the
>> generated font report if anyone wants to give it a look
>>
>> A closer look at the roll shows that the main problem seems to be
>> with the Devanagari 'matras' which are not rendering correctly when
>> you cut and paste
>>
>> regards
>>
>> Avinash
>>
>>
>> On Wed, Mar 12, 2014 at 12:19 PM, Raphael Susewind
>> mailto:li...@raphael-susewind.de>> wrote:
>>
>> Hey Siddhart, and Anand,
>>
>> I, too, am really interested in this, but have not made much
>> progress
>> yet. I think there are two ways to do this, neither of which is
>> straightforward.
>>
>> The "extract ward/village mentioned in roll PDF" strategy is one
>> option.
>> Depending on raw data, this can however be cumbersome (one
>> source in the
>> vernacular, one in latin script, etc); I know a couple of
>> scholars who
>> attempt to do this and they are stuck all the time, having had to
>> manually match rather frequently (which is a pain given that
>> there are
>> 800.000 or so polling stations).
>>
>> Currently, we have the additional problem that many of the
>> current roll
>> PDFs - for instance in UP - are broken: one cannot copy-paste (or
>> pdftotext, or extract through whatever means) from them, chiefly
>> because
>> the ToUnicodeCMap is corrupted by the version of CrystalReports
>> the ECI
>>   

[datameet] Census village to polling booth matching

2014-04-17 Thread Raphael Susewind
Dear all,

I vaguely remember that some people are working on matching census
villages to polling booths, and wonder what progress they made. As some
of you know, I am currently doing this India-wide through an automated
spatial matching algorithm - but before releasing the result, it would
be nice to assess accuracy of this procedure more thoroughly.

The key problem I face is that polling stations are often not named
"village X" but "primary school founded by Y" - so that name matching
does not help too much in validation (certainly not in urban areas).

It would be better to check against roll part names (thus my email about
those a few days ago), but best would be if anyone has a manually
matched table of polling stations (2014 IDs) against PCLN (2011) or MDDS
(2011) census codes with which I could compare my results - if only at
the example of one state, or a few districts.

Alternatively, if somebody has too much time too offer and is familiar
with any specific district in greater detail, I could send along a
matching table for this district to see how well it fits. Please get in
touch in a direct mail in this case...

Any other ideas how to validate the matching table welcome,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Looking for KML file of India's parliamentary constituencies, need it urgently!

2014-04-28 Thread Raphael Susewind
Ravi, alternatively you might want to try mapbox.com tilemill /
tilestream to render the files once, and then serve tiles only...

Best,
Raphael

On 28.04.2014 10:48, Thejesh GN wrote:
> Ravi,
> It generates a KML of 35MB which is huge for the web apps.
> 
> Google maps/API has limitation wrt to the size of KML 
> https://developers.google.com/kml/documentation/mapsSupport
> 
> --
> But it works on leaftlet using 
> https://gis.stackexchange.com/questions/33513/how-do-i-overlay-a-kml-on-leaflet-0-4-4
> 
> But even locally rendering it takes about 5 minutes,
> 
> 
> So I agree with Srinivasan Ramani, try simplify it. 
> 
> Thej
> --
> Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
> http://thejeshgn.com
> GPG ID :  0xBFFC8DD3C06DD6B0
> 
> 
> On Mon, Apr 28, 2014 at 11:16 AM, Ravi Bajpai  <mailto:bajpair...@gmail.com>> wrote:
> 
> Ok, so I converted the .shp file to .kml. Fusion Tables refused to
> parse the file. So then I uploaded the ,kml file on Google Drive.
> When I open it on Google Maps from there, the application says it
> encountered problem with some data, and doesn't show anything on the
> map.
> 
> Please help!
> 
> Best,
> 
> Ravi
> 
> On Sunday, April 27, 2014 7:20:28 PM UTC+5:30, Ravi Bajpai wrote:
> 
> Hey all,
> 
> I work for Hindustan Times here in Delhi. I am trying to prepare
> backend logistics to build an interactive map to be published on
> our website on the election counting day. I need KML file of
> India's parliamentary constituencies, but I can't seem to find
> it anywhere.
> 
> Please help.
> 
> Thanks a lot.
> 
> Best,
> 
> Ravi Bajpai
> Multimedia Editor
> Hindustan Times
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)



signature.asc
Description: OpenPGP digital signature


Re: [datameet] Mapping elections: open GIS shapefile drafts

2014-04-30 Thread Raphael Susewind
Dear all,

just a heads up that the final version of these shapefiles just went
online, using a slightly adjusted algorithm which works better in
crowded urban areas:

http://dx.doi.org/10.4119/unibi/2674065

Hope they are useful,
Raphael

On 09.04.2014 17:53, Raphael Susewind wrote:

>
> Dear all,
> 
> Krishna Prasnth's plea for AC shapefiles made me decide to start pushing
> mine out there ahead of time in draft form at least. I would have loved
> to have them ready before the Bangalore hackathon, but  such things take
> time and I am quite busy.
> 
> Still, here they come at last: draft GIS shapefiles of parliamentary
> constituencies, assembly constituencies and polling booth localities,
> published under an open license (CC-BY-NC-SA 4.0):
> 
> http://www.raphael-susewind.de/blog/2014/mapping-indias-election
> 
> Unlike the hackathon files, these were created using an automated
> algorithm (described in the blog post above). I intend to release (and
> long-time archive) them by end of the month, and would welcome comments
> and feedback until then: if you are familiar with both GIS and a
> specific state, it would help me a lot if you could have a look.
> Likewise, comments on the general method are very welcome.
> 
> So far, the smaller states are online, but I will add more on a rolling
> basis - computing takes a few hours per constituency (longer for the
> larger states). I hope to complete the set by end of the week.
> 
> Let me know if you find them useful,
> 
> Best,
> Raphael
> 

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] [Article] Limitations of the PDF

2014-05-12 Thread Raphael Susewind
Lets hope the Election Commission reads this before declaring results...

On 12.05.2014 10:46, Sriram Karra wrote:
> 
> http://www.thehindu.com/opinion/op-ed/limitations-of-the-pdf/article5998841.ece
> 
> == snip ==
> 
> 
> The basic format doesn’t include any requirement that text be
> selectable or searchable, while data presented as charts and tables
> is often impossible to export in any useable way.
> 
> It’s the standard file format for nearly every academic paper, political
> briefing and research note. But a new report by the World Bank suggests
> that the venerable pdf is keeping valuable information buried in
> servers, unread and unloved.
> 
> == /snip ==
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Urban constituencies

2014-05-13 Thread Raphael Susewind
Hi Ravi,

there are various options here, depending on what you want.

I am not aware of an official list of rural/urban constituencies as
such. But booths are classified as either urban or rural, at least on
the electoral rolls, and probably elsewhere, too. This could be used in
a simple counting game to see where more than a certain threshold of the
electorate votes in urban areas according to the official ECI definition.

If you are less interested in the official definition, you could try a
GIS-based alternative and overlay the polling booth point layer
(http://dx.doi.org/10.4119/unibi/2674065) with the MODIS rural/urban
polygon by NaturalEarth, which is quite accurate in terms of habitation
pattern irrespective of their official designation
(http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).

Best,
Raphael

On 13.05.2014 10:08, Ravi Krishnan wrote:

> Hi,
> 
> Does anyone have a list of urban constituencies - defined here as those
> with over 75% urban population?
> 
> Thanks and regards
> 
> -- 
> Ravi Krishnan
> 
> Mint
> Tower 3, 9th Floor, India Bulls Finance Centre,
> Senapati Bapat Marg, Elphinstone Road (W),
> Mumbai - 400 013
> Ph:+91-22-6613 4000/4001
> Mob: +91-97691-72938
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Papers & Blog | http://www.raphael-susewind.de

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Urban constituencies

2014-05-13 Thread Raphael Susewind
Hi Ravi,

I did the matching against MODIS data, but don't have electorate count
at hand, so no percentages of urban electorate population yet. But I am
sure you can take it further from the CSV here:

http://www.raphael-susewind.de/ruralurban.csv.tgz

This table shows rural/urban as well as "urban rank" (an indicator MODIS
uses for "how urban is it?") across India on booth level. Be aware that
not all booths are covered by the matching though (and some states,
notably Uttarakhand, are terribly inaccurate), so you have to aggregate
wisely.

For fun, I have also added a list of "urban booth count share" for
parliamentary constituencies, which should give you a very rough idea of
electorate share as well, since all booths are supposed to have a
similar number of electors in them.

Hope it is useful,
Raphael

On 13.05.2014 10:53, Ravi Krishnan wrote:
> Hi Raphael,
> 
> Thanks for your prompt answer. While there is not any official list, I
> am told that the EC gives the percentage of urban population in each
> constituency. I couldn't find it in their web site though.
> 
> As for the method you suggest, I just don't have the technical skills to
> pull that off.
> 
> Thanks and regards
> 
> Ravi   
> 
> 
> On 13 May 2014 14:04, Raphael Susewind  <mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Ravi,
> 
> there are various options here, depending on what you want.
> 
> I am not aware of an official list of rural/urban constituencies as
> such. But booths are classified as either urban or rural, at least on
> the electoral rolls, and probably elsewhere, too. This could be used in
> a simple counting game to see where more than a certain threshold of the
> electorate votes in urban areas according to the official ECI
> definition.
> 
> If you are less interested in the official definition, you could try a
> GIS-based alternative and overlay the polling booth point layer
> (http://dx.doi.org/10.4119/unibi/2674065) with the MODIS rural/urban
> polygon by NaturalEarth, which is quite accurate in terms of habitation
> pattern irrespective of their official designation
> 
> (http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).
> 
> Best,
> Raphael
> 
> On 13.05.2014 10:08, Ravi Krishnan wrote:
> 
> > Hi,
> >
> > Does anyone have a list of urban constituencies - defined here as
> those
> > with over 75% urban population?
> >
> > Thanks and regards
> >
> > --
> > Ravi Krishnan
> >
> > Mint
> > Tower 3, 9th Floor, India Bulls Finance Centre,
> > Senapati Bapat Marg, Elphinstone Road (W),
> > Mumbai - 400 013
> > Ph:+91-22-6613 4000/4001
> > Mob: +91-97691-72938
> >
> > --
> > For more details about this list
> > http://datameet.org/discussions/
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>
> > <mailto:datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Papers & Blog | http://www.raphael-susewind.de
> 
> Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)
> 
> --
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> Ravi Krishnan
> 
> Mint
> Tower 3, 9th Floor, India Bulls Finance Centre,
> Senapati Bapat Marg, Elphinstone Road (W),
> Mumbai - 400 013
> Ph:+91-22-6613 4000/4001
> Mob: +91-97691-72938
> 
> -- 
> For more details about this list
> http://datameet.org/discussions/
> ---
> You received this message because you are subscribed to the Google
> Groups &qu

Re: [datameet] Urban constituencies

2014-05-13 Thread Raphael Susewind
Hi Gilles,

nice to see you over here ;-)

This is not based on Census at all, bot on 2002/3 images of the MODIS
satellite, processed by NASA to classify land cover as habitated or not,
rural or urban (funnily enough, part of the criteria is "light at night"
- they must have come up with something else for India though, at least
in UP there is no power at 3am ;-)...). As such, it pretty accurately
reflects the rural/urban divide in terms of physical geography a decade
ago - but not necessarily by GoI definition. Its a rough fix until we
get a nice, easily browsable list of the ECI's own booth-wise
rural/urban classification for 2014...

On Tonk and Sawai Madhopur: this is an odd slip-up in the AC-to-PC
conversion in my scripts, thanks for noticing. To correct, have a look
at the raw data in ruralurban.csv.tgz and re-calculate from the AC list.
In the meantime, I shall check what went wrong with the PC classification,

Best,
Raphael

On 13.05.2014 22:52, gilles.verni...@sciencespo.fr wrote:
> Is this 2001 census, by the way? Is it valid to juxtapose to 2014
> constituencies? 
> Thanks!
> 
> Gilles 
> 
> Le mardi 13 mai 2014 15:44:19 UTC+5:30, Raphael Susewind a écrit :
> 
> Hi Ravi,
> 
> I did the matching against MODIS data, but don't have electorate count
> at hand, so no percentages of urban electorate population yet. But I am
> sure you can take it further from the CSV here:
> 
> http://www.raphael-susewind.de/ruralurban.csv.tgz
> <http://www.raphael-susewind.de/ruralurban.csv.tgz>
> 
> This table shows rural/urban as well as "urban rank" (an indicator
> MODIS
> uses for "how urban is it?") across India on booth level. Be aware that
> not all booths are covered by the matching though (and some states,
> notably Uttarakhand, are terribly inaccurate), so you have to aggregate
> wisely.
> 
> For fun, I have also added a list of "urban booth count share" for
> parliamentary constituencies, which should give you a very rough
> idea of
> electorate share as well, since all booths are supposed to have a
> similar number of electors in them.
> 
> Hope it is useful,
> Raphael
> 
> On 13.05.2014 10:53, Ravi Krishnan wrote:
> > Hi Raphael,
> >
> > Thanks for your prompt answer. While there is not any official
> list, I
> > am told that the EC gives the percentage of urban population in each
> > constituency. I couldn't find it in their web site though.
> >
> > As for the method you suggest, I just don't have the technical
> skills to
> > pull that off.
> >
> > Thanks and regards
> >
> > Ravi  
> >
> >
> > On 13 May 2014 14:04, Raphael Susewind  
> > <mailto:li...@raphael-susewind.de >> wrote:
> >
> > Hi Ravi,
> >
> > there are various options here, depending on what you want.
> >
> > I am not aware of an official list of rural/urban
> constituencies as
> > such. But booths are classified as either urban or rural, at
> least on
> > the electoral rolls, and probably elsewhere, too. This could
> be used in
> > a simple counting game to see where more than a certain
> threshold of the
> > electorate votes in urban areas according to the official ECI
> > definition.
> >
> > If you are less interested in the official definition, you
> could try a
> > GIS-based alternative and overlay the polling booth point layer
> > (http://dx.doi.org/10.4119/unibi/2674065
> <http://dx.doi.org/10.4119/unibi/2674065>) with the MODIS rural/urban
> > polygon by NaturalEarth, which is quite accurate in terms of
> habitation
> > pattern irrespective of their official designation
> >
> 
> (http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/
> 
> <http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/>).
> 
> >
> > Best,
> > Raphael
> >
> > On 13.05.2014 10:08, Ravi Krishnan wrote:
> >
> > > Hi,
> > >
> > > Does anyone have a list of urban constituencies - defined
> here as
> > those
> > > with over 75% urban population?
> > >
> > > Thanks and regards
> > >
> > > --
> > > Ravi Krishnan
> > >

Re: [datameet] Security Issues with the Voter List

2014-05-18 Thread Raphael Susewind
Dear Gautam,

thanks for the link - a discussion overdue.

After some discussion a few weeks back on this list, the ECI at least
introduced rate limiting to electoralsearch.in (though probably for QoS
reasons rather than privacy). Chhattisgarh is the only state with a
CAPTCHA to prevent mass downloading, while Uttarakhand does not have the
rolls online at all. Rolls for all other states are freely available,
though there are some technical challenges in terms of extracting data
from corrupted PDFs (but this CAN be done).

While I am happy to be able to use electoral roll data for academic
research, this commodification was exactly what I worried about from the
beginning. Let's hope ECI changes its access policies soon - though
arguably the damage is done, with an "almost population register" online
for long enough for all to scrape and use. But then privacy laws that
prohibit what is technically possible could at least limit damage.

My five cents,
Raphael

On 19.05.2014 07:28, Snehashish Ghosh wrote:
> Dear Gautam,
> 
> Thank you. This is very interesting. I wrote a piece on this issue right
> after the failed Google-ECI deal in February <http://goo.gl/e9Xea0>
> The UK approach seems to be a good one. In UK there are two voter lists
> - "full list" and "edited list". You can choose to be removed from the
> edited list during the time of registration or at anytime thereafter.
> The edited list is available in the public domain and the full list is
> safeguarded by purpose limitation and UK Data Protection Law.
> 
> ~Snehashish
> 
> 
> On Mon, May 19, 2014 at 10:36 AM, Gautam John  <mailto:gkj...@gmail.com>> wrote:
> 
> Something I read today:
> 
> http://www.medianama.com/2014/05/223-modak-marketing-election-voter-india/
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Data Sharing Guidelines

2014-05-25 Thread Raphael Susewind
Dear all,

for complex files, I would suggest SQLite (sqlite.org). It is open,
scalable, and extremely rich due to SQL queries. I use it for all my
more complex datasets, interlinked tables, etc...

My five cents,
Raphael

On 26.05.2014 06:50, Dilip Damle wrote:
> Hello,
> 
> I think we need to discuss the following
> 
> 1. When is the data eligible to go to Repository
> 
> There could be several factors here. Mainly cleanliness and completeness.
> 
> 2. Place other than Repository for temporary data.
> I think it should surely not be "only an attachment to a post here"
> Then it becomes difficult to find later
> Administrators should decide on suitable place
> 
> 3. The particular formats itself
> 
> This could vary based on type of data
> 
> My observations  is that  for many types of data  Multiple Linked Tables
> serve better than a single CSV file which is more common.
> In this case is .mdb acceptable or is there any other open format for
> linked tables.
> 
> this could be a long topic...
> 
> 4. Compressing multiple files in one file
> 
> Unless there is a reason multiple files that go together should be
> bundled in to one file.
> This should also be true for repository.
> 
> 5. About the content itself
> 
> Since multiple people will contribute/edit to data we will have to have
> some rules.
> example : when there is a Unique for the data it should always be used
> otherwise combining comparing the data becomes difficult.
> ( presently I am trying to collate the election results data and find
> there are differences in the different sources especially in the Names
> of places. Will be putting up the collated data in .mdb format in a few
> days)
> 
> On Friday, May 23, 2014 10:06:35 AM UTC+5:30, Nisha Thompson wrote:
> 
> In the discussion guidelines thread Dilip suggested we have some
> data sharing guidelines and a place to store some of the more casual
> datasets, people are cleaning up.
> 
> I think its a good idea.
> 
> Can we use this thread as a place to discuss formats, procedure, and
> a good place to put it.  
> 
> We have a github already set up, we can start with that, maybe
> create a project called - Data that needs to be cleaned up.  
> 
> Any other suggestions?
> 
> Nisha
> 
> -- 
> Nisha Thompson
> DataMeet.org
> ni...@datameet.org 
> skype: nishaqt
> mobile: 962-061-2245
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id A5ED49AE)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Booth-wise elector count (male/female/other/total)

2014-06-06 Thread Raphael Susewind
Dear all,

now that Form20 results start to come out, some of you might be
interested in booth-wise elector count to be able to calculate
fine-grained turnout rates. They are not contained in Form20, but
available in the electoral rolls; as a side effect of my ongoing
academic work, I have extracted these.

Here is my pull request to the datameet github:

https://github.com/datameet/india-election-data/pull/8

Note that this is based on a quick-hack automated extraction, so no
guarantees. Also, some states and UTs are missing, notably:

Uttarakhand - PDF rolls not available
Chhattisgarh - PDFs rolls behind captcha
Lakshadweep - problem with parsing
Chandigarh - problem with parsing

I hope this is useful to some,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Booth-wise elector count (male/female/other/total)

2014-06-07 Thread Raphael Susewind
UP is not yet out as far as I know.
Gujarat is, and some other states...

Best,
Raphael

On 07.06.2014 09:43, Avinash Celestine wrote:
> great. thanks
> 
> Have the form 20s for UP been put out? I know Bihar and Bengal are out...
> 
> Avinash
> 
> 
> On Sat, Jun 7, 2014 at 11:56 AM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> now that Form20 results start to come out, some of you might be
> interested in booth-wise elector count to be able to calculate
> fine-grained turnout rates. They are not contained in Form20, but
> available in the electoral rolls; as a side effect of my ongoing
> academic work, I have extracted these.
> 
> Here is my pull request to the datameet github:
> 
> https://github.com/datameet/india-election-data/pull/8
> 
> Note that this is based on a quick-hack automated extraction, so no
> guarantees. Also, some states and UTs are missing, notably:
> 
> Uttarakhand - PDF rolls not available
> Chhattisgarh - PDFs rolls behind captcha
> Lakshadweep - problem with parsing
> Chandigarh - problem with parsing
> 
> I hope this is useful to some,
> 
> Best,
> Raphael
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind
> 
> Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)



signature.asc
Description: OpenPGP digital signature


Re: [datameet] Booth-wise elector count (male/female/other/total)

2014-06-07 Thread Raphael Susewind
Hi Anand,

done - its a perl script which extracts from PDF using poppler and feeds
an sqlite database from which the CSV is then created...

Best,
Raphael

On 07.06.2014 14:10, S Anand wrote:
> Nice one Raphael.
> 
> Just wondering if you would like to add the automated portions of the
> scripts in the repo? For most of the other files, the repo has web
> scrapers
> <https://github.com/datameet/india-election-data/blob/master/parliament-elections/election2014/eci-constituency-wise.py>
>  or
> PDF scrapers
> <http://nbviewer.ipython.org/github/datameet/india-election-data/blob/master/assembly-elections/election.ipynb>
>  committed,
> and it would be instructive to include these as well.
> 
> Thanks,
> Anand
> 
> 
> 
> On Sat, Jun 7, 2014 at 11:56 AM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> now that Form20 results start to come out, some of you might be
> interested in booth-wise elector count to be able to calculate
> fine-grained turnout rates. They are not contained in Form20, but
> available in the electoral rolls; as a side effect of my ongoing
> academic work, I have extracted these.
> 
> Here is my pull request to the datameet github:
> 
> https://github.com/datameet/india-election-data/pull/8
> 
> Note that this is based on a quick-hack automated extraction, so no
> guarantees. Also, some states and UTs are missing, notably:
> 
> Uttarakhand - PDF rolls not available
> Chhattisgarh - PDFs rolls behind captcha
> Lakshadweep - problem with parsing
> Chandigarh - problem with parsing
> 
> I hope this is useful to some,
> 
> Best,
> Raphael
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind
> 
> Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Form20 results for UP and Gujarat

2014-06-09 Thread Raphael Susewind
Dear all,

I just added booth-wise results for UP and Gujarat to the datameet
github - if anybody is working on other states, please contribute, too:

https://github.com/datameet/india-election-data/pull/10

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Urban constituencies

2014-06-17 Thread Raphael Susewind
Dear Srini,

actually I don't know exactly, don't use the indicator myself - you will
have to read the papers by Schneider et al to figure out. Referenced
here:
http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-urban-area/

Best,
Raphael

On 18.06.2014 04:45, Srinivasan Ramani wrote:
> Dear Raphael, 
> 
> Apropos MODIS' urban ranking, can you please clarify the following? A
> higher rank (say 3 over 9) suggests greater urbanisation, right? Or is
> it the other way around? 9 suggests greater urbanisation as compared to 3? 
> 
> Thanks,
> Srini
> 
> 
> On Tue, May 13, 2014 at 3:44 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Ravi,
> 
> I did the matching against MODIS data, but don't have electorate count
> at hand, so no percentages of urban electorate population yet. But I am
> sure you can take it further from the CSV here:
> 
> http://www.raphael-susewind.de/ruralurban.csv.tgz
> 
> This table shows rural/urban as well as "urban rank" (an indicator MODIS
> uses for "how urban is it?") across India on booth level. Be aware that
> not all booths are covered by the matching though (and some states,
> notably Uttarakhand, are terribly inaccurate), so you have to aggregate
> wisely.
> 
> For fun, I have also added a list of "urban booth count share" for
> parliamentary constituencies, which should give you a very rough idea of
> electorate share as well, since all booths are supposed to have a
> similar number of electors in them.
> 
> Hope it is useful,
> Raphael
> 
> On 13.05.2014 10:53, Ravi Krishnan wrote:
> > Hi Raphael,
> >
> > Thanks for your prompt answer. While there is not any official list, I
> > am told that the EC gives the percentage of urban population in each
> > constituency. I couldn't find it in their web site though.
> >
>     > As for the method you suggest, I just don't have the technical
> skills to
> > pull that off.
> >
> > Thanks and regards
> >
> > Ravi
> >
> >
> > On 13 May 2014 14:04, Raphael Susewind  <mailto:li...@raphael-susewind.de>
> > <mailto:li...@raphael-susewind.de
> <mailto:li...@raphael-susewind.de>>> wrote:
> >
> > Hi Ravi,
> >
> > there are various options here, depending on what you want.
> >
> > I am not aware of an official list of rural/urban
> constituencies as
> > such. But booths are classified as either urban or rural, at
> least on
> > the electoral rolls, and probably elsewhere, too. This could
> be used in
> > a simple counting game to see where more than a certain
> threshold of the
> > electorate votes in urban areas according to the official ECI
> > definition.
> >
> > If you are less interested in the official definition, you
> could try a
> > GIS-based alternative and overlay the polling booth point layer
> > (http://dx.doi.org/10.4119/unibi/2674065) with the MODIS
> rural/urban
> > polygon by NaturalEarth, which is quite accurate in terms of
> habitation
> > pattern irrespective of their official designation
> >
> 
> (http://www.naturalearthdata.com/downloads/50m-cultural-vectors/50m-urban-areas/).
> >
> > Best,
> > Raphael
> >
> > On 13.05.2014 10:08, Ravi Krishnan wrote:
> >
> > > Hi,
> > >
> > > Does anyone have a list of urban constituencies - defined
> here as
> > those
> > > with over 75% urban population?
> > >
> > > Thanks and regards
> > >
> > > --
> > > Ravi Krishnan
> > >
> > > Mint
> > > Tower 3, 9th Floor, India Bulls Finance Centre,
> > > Senapati Bapat Marg, Elphinstone Road (W),
> > > Mumbai - 400 013
> > > Ph:+91-22-6613 4000/4001
> > > Mob: +91-97691-72938
> > >
> > > --
> > > For more details about this list
> > > http://datameet.org/discussions/
> > > ---
> > > You received this message because you are subscribed to the
> Google
> > > Groups "datameet" group.
> > &

Re: [datameet] Form20 results for UP and Gujarat

2014-06-17 Thread Raphael Susewind
Dear all,

Matt Lowe pointed me to crawling errors in the original version - they
are now corrected on the datameet github.

Sorry for that,
Raphael

On 10.06.2014 08:54, Raphael Susewind wrote:
> Dear all,
> 
> I just added booth-wise results for UP and Gujarat to the datameet
> github - if anybody is working on other states, please contribute, too:
> 
> https://github.com/datameet/india-election-data/pull/10
> 
> Best,
> Raphael
> 

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Mumbai/Thane assembly constituency

2014-09-23 Thread Raphael Susewind
Hi Saurabh,

you might have a look at my dataset - it is of varying quality (because
raw data from the ECI was), but perhaps it does what you need:

http://dx.doi.org/10.4119/unibi/2674065

Best,
Raphael

On 23.09.2014 09:09, Saurabh Datar wrote:
> Hi all,
> 
> Is there any shapefile/SVG file for assembly constituencies of Mumbai
> and adjoining Thane? Couldn't find it anywhere. I wished some
> visualisations on my own blog. Please help if possible.
> 
> thank you 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Data on Muslim electorate in UP and Gujarat

2014-09-27 Thread Raphael Susewind
Dear all,

I am happy to inform you that today's EPW carries a piece on "Spatial
variation in the 'Muslim vote' in Gujarat and Uttar Pradesh, 2014",
which I have co-authored with Raheel Dhattiwala:

http://www.epw.in/ejournal/show/1/_/3024

We demonstrate that Muslims' electoral choices vary a lot from
constituency to constituency, implying that "vote banks" operate on a
much more local level than hitherto assumed. We also explore a few
factors that might shape this variation: minority concentration, riot
history, and ethnic coordination.

More relevant to this list: we also published interactive maps and a
replication dataset under an open license, which contains booth-wise
estimates of the Muslim electorate. Those of you working on religion and
politics might be interested to play with it:

http://www.raphael-susewind.de/blog/2014/

Let me know if you find that data useful,
and/or if you have any questions about it,

Best,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Data on Muslim electorate in UP and Gujarat

2014-11-03 Thread Raphael Susewind
Hi Shankesh,

sorry, it's meant to be

http://www.raphael-susewind.de/blog/2014/data-epw

the dataset itself is at

http://dx.doi.org/10.4119/unibi/2694082

Best,
Raphael

On 03.11.2014 11:36, shankesh singh wrote:
> Hi Raphael,
> The link
> http://www.raphael-susewind.de/blog/2014/
> <http://www.raphael-susewind.de/blog/2014/>  to the data is not
> working.  Can you please post again.
> Thanks!
> 
> On Saturday, September 27, 2014 1:32:48 PM UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> I am happy to inform you that today's EPW carries a piece on "Spatial
> variation in the 'Muslim vote' in Gujarat and Uttar Pradesh, 2014",
> which I have co-authored with Raheel Dhattiwala:
> 
> http://www.epw.in/ejournal/show/1/_/3024
> <http://www.epw.in/ejournal/show/1/_/3024>
> 
> We demonstrate that Muslims' electoral choices vary a lot from
> constituency to constituency, implying that "vote banks" operate on a
> much more local level than hitherto assumed. We also explore a few
> factors that might shape this variation: minority concentration, riot
> history, and ethnic coordination.
> 
> More relevant to this list: we also published interactive maps and a
> replication dataset under an open license, which contains booth-wise
> estimates of the Muslim electorate. Those of you working on religion
> and
> politics might be interested to play with it:
> 
> http://www.raphael-susewind.de/blog/2014/
> <http://www.raphael-susewind.de/blog/2014/>
> 
> Let me know if you find that data useful,
> and/or if you have any questions about it,
> 
> Best,
> Raphael
> 
> -- 
> Raphael Susewind | BGHS Bielefeld University, CSASP University of
> Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind
> 
> Please do consider http://www.gnupg.org for encryption (key id
> 10AEE42F)
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Form 20 for Mumbai

2014-11-06 Thread Raphael Susewind
Dear all,

does anyone have access to booth-level results for Maharashtra,
especially Mumbai, both general and assembly elections? Or any
information as to whether and when it might be available? On the CEO
website, one finds links to general election form 20, but those links
are dead. No mention of assembly data (either now or earlier):

https://www.ceo.maharashtra.gov.in/Results/Form20.aspx

Any hint appreciated,
Raphael

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2014-11-07 Thread Raphael Susewind
Hi Avinash,

Thanks - I did not check all PDFs systematically, should have done...

Any idea when and/or whether assembly form 20 will be available?

Best,
Raphael

On 07.11.2014 08:52, Avinash Celestine wrote:
> I should mention that these are for the parliamentary elections of May
> 2014, not the recent assembly elections.
> 
> A
> 
> On Fri, Nov 7, 2014 at 1:20 PM, Avinash Celestine
> mailto:avinash.celest...@gmail.com>> wrote:
> 
> Hi Raphael,
> 
> some of those links are dead, but not all. seems not all form 20s
> for each constituency have been uploaded yet. I have the ones for
> which it is (downloaded sometime back) ...attached. as far as mumbai
> is concerned, I think the south mumbai data is not there
> yet...ignore the pdf files which are too small in size (2KB etc).
> Those are the ones for which the links were dead.
> 
> 
> 
> 
> ​
>  mhGE2014-incomplete.zip
> 
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web>
> ​
> 
> On Fri, Nov 7, 2014 at 12:09 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> does anyone have access to booth-level results for Maharashtra,
> especially Mumbai, both general and assembly elections? Or any
> information as to whether and when it might be available? On the CEO
> website, one finds links to general election form 20, but those
> links
> are dead. No mention of assembly data (either now or earlier):
> 
> https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> 
> Any hint appreciated,
> Raphael
> 
> --
> Raphael Susewind | BGHS Bielefeld University, CSASP University
> of Oxford
>   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind
> 
> Please do consider http://www.gnupg.org for encryption (key id
> 10AEE42F)
> 
> --
> Datameet is a community of Data Science enthusiasts in India.
> Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2014-11-07 Thread Raphael Susewind
Dear all,

there is this one, though: http://electionmsd.blogspot.in/ - anybody can
confirm its accuracy?

Best,
Raphael

On 07.11.2014 10:42, Shafeeq Rahman wrote:
> They not have published form 20 data for any of assembly elections (2009
> & 2014) so far.
> 
> Assembly election 2009 for some selected ACs may be traced by going to
> specific district site.
> 
> Regards,
> 
> Shafeeq
>  
> 
> On Friday, 7 November 2014 14:09:09 UTC+5:30, Raphael Susewind wrote:
> 
> Hi Avinash,
> 
> Thanks - I did not check all PDFs systematically, should have done...
> 
> Any idea when and/or whether assembly form 20 will be available?
> 
> Best,
> Raphael
> 
> On 07.11.2014 08:52, Avinash Celestine wrote:
> > I should mention that these are for the parliamentary elections of
> May
> > 2014, not the recent assembly elections.
> >
> > A
> >
> > On Fri, Nov 7, 2014 at 1:20 PM, Avinash Celestine
> >  <mailto:avinash@gmail.com
> >> wrote:
> >
> > Hi Raphael,
> >
> > some of those links are dead, but not all. seems not all form 20s
> > for each constituency have been uploaded yet. I have the ones for
> > which it is (downloaded sometime back) ...attached. as far as
> mumbai
> > is concerned, I think the south mumbai data is not there
> > yet...ignore the pdf files which are too small in size (2KB etc).
> > Those are the ones for which the links were dead.
> >
> >
> >
> >
> > ​
> >  mhGE2014-incomplete.zip
> >
> 
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web
> 
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web>>
> 
> > ​
> >
> > On Fri, Nov 7, 2014 at 12:09 PM, Raphael Susewind
> > 
> <mailto:li...@raphael-susewind.de >> wrote:
> >
> > Dear all,
> >
> > does anyone have access to booth-level results for
> Maharashtra,
> > especially Mumbai, both general and assembly elections? Or
> any
> > information as to whether and when it might be available?
> On the CEO
> > website, one finds links to general election form 20, but
> those
> >     links
> > are dead. No mention of assembly data (either now or
> earlier):
> >
> > https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> <https://www.ceo.maharashtra.gov.in/Results/Form20.aspx>
> >
> > Any hint appreciated,
> > Raphael
> >
> > --
> > Raphael Susewind | BGHS Bielefeld University, CSASP
> University
> > of Oxford
> >   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >Web & Twitter | http://www.raphael-susewind.de |
> @RaphaelSusewind
> >
> > Please do consider http://www.gnupg.org for encryption
> (key id
> > 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India.
> > Know more about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the
> > Google Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from
> > it, send an email to datameet+u...@googlegroups.com
> 
> > <mailto:datameet%2bunsubscr...@googlegroups.com
> >.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> >
> >
> >
>     > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to datameet+u...@googlegroups.com 
> > <mailto:datameet+u...@googlegroups.com >.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
&

Re: [datameet] Form 20 for Mumbai

2014-11-07 Thread Raphael Susewind
Dear all,

as to the files published on http://electionmsd.blogspot.in, at least AC
176 has an election commission stamp on it (its a scan, the others are
excel-to-pdf-conversions), so looks legitimate...

Best,
Raphael

On 07.11.2014 10:42, Shafeeq Rahman wrote:
> They not have published form 20 data for any of assembly elections (2009
> & 2014) so far.
> 
> Assembly election 2009 for some selected ACs may be traced by going to
> specific district site.
> 
> Regards,
> 
> Shafeeq
>  
> 
> On Friday, 7 November 2014 14:09:09 UTC+5:30, Raphael Susewind wrote:
> 
> Hi Avinash,
> 
> Thanks - I did not check all PDFs systematically, should have done...
> 
> Any idea when and/or whether assembly form 20 will be available?
> 
> Best,
> Raphael
> 
> On 07.11.2014 08:52, Avinash Celestine wrote:
> > I should mention that these are for the parliamentary elections of
> May
> > 2014, not the recent assembly elections.
> >
> > A
> >
> > On Fri, Nov 7, 2014 at 1:20 PM, Avinash Celestine
> >  <mailto:avinash@gmail.com
> >> wrote:
> >
> > Hi Raphael,
> >
> > some of those links are dead, but not all. seems not all form 20s
> > for each constituency have been uploaded yet. I have the ones for
> > which it is (downloaded sometime back) ...attached. as far as
> mumbai
> > is concerned, I think the south mumbai data is not there
> > yet...ignore the pdf files which are too small in size (2KB etc).
> > Those are the ones for which the links were dead.
> >
> >
> >
> >
> > ​
> >  mhGE2014-incomplete.zip
> >
> 
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web
> 
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web>>
> 
> > ​
> >
> > On Fri, Nov 7, 2014 at 12:09 PM, Raphael Susewind
> > 
> <mailto:li...@raphael-susewind.de >> wrote:
> >
> > Dear all,
> >
> > does anyone have access to booth-level results for
> Maharashtra,
> > especially Mumbai, both general and assembly elections? Or
> any
> > information as to whether and when it might be available?
> On the CEO
> > website, one finds links to general election form 20, but
> those
> >     links
> > are dead. No mention of assembly data (either now or
> earlier):
> >
> > https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> <https://www.ceo.maharashtra.gov.in/Results/Form20.aspx>
> >
> > Any hint appreciated,
> > Raphael
> >
> > --
> > Raphael Susewind | BGHS Bielefeld University, CSASP
> University
> > of Oxford
> >   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >Web & Twitter | http://www.raphael-susewind.de |
> @RaphaelSusewind
> >
> > Please do consider http://www.gnupg.org for encryption
> (key id
> > 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India.
> > Know more about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the
> > Google Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from
> > it, send an email to datameet+u...@googlegroups.com
> 
> > <mailto:datameet%2bunsubscr...@googlegroups.com
> >.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> >
> >
> >
>     > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it,
> send
> > an email to datameet+u...@googlegroups.com 
> > <mailto:datameet+u...@googlegroups.com >.
> > For more options, visit htt

Re: [datameet] Form 20 for Mumbai

2014-11-07 Thread Raphael Susewind
Dear all,

Mumbai city itself is online here:

http://www.electionmumbaicity.org/assembly2014boothwiseresults.html

Best,
Raphael

On 07.11.2014 07:39, Raphael Susewind wrote:
> Dear all,
> 
> does anyone have access to booth-level results for Maharashtra,
> especially Mumbai, both general and assembly elections? Or any
> information as to whether and when it might be available? On the CEO
> website, one finds links to general election form 20, but those links
> are dead. No mention of assembly data (either now or earlier):
> 
> https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> 
> Any hint appreciated,
> Raphael
> 

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Any privacy issue in publishing names of voters?

2015-03-01 Thread Raphael Susewind
Hi Anand,

as someone who worked with the voter lists, including an analysis of
trends (http://www.raphael-susewind.de/blog/2012/noor-mohd-ali), I would
personally NOT put them online in disaggregate form. I would only share
aggregate data (i.e. the 50 most frequent names in state X and their
prominence over time, or some such). If you do put them online, I would
do so at state level only, not further disaggregated. But I DO think
there are big privacy issues here. There was a discussion on this on the
list a few months back as well - spurred by this post by Snehashish
Ghosh:
http://cis-india.org/internet-governance/blog/electoral-databases-2013-privacy-and-security-concerns

My 5 cents,
Raphael

On 02.03.2015 05:04, Anand Chitipothu wrote:
> Hi,
> 
> I've voter data for couple of states with me. I'm thinking of publishing
> gender, age and name of all voters of these. Do you see any privacy
> issue in this? Any other issue that I should be careful about?
> 
> I'm planning to sort the names before publishing so that the original
> order is lost.
> 
> I think it'll be very interesting to study the patterns of how names are
> changing over time.
> 
> Anand
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] How to get polling station/booth lat-long data for an assembly constituency

2015-05-15 Thread Raphael Susewind
Dear all,

the whole dataset is also available at
http://dx.doi.org/10.4119/unibi/2674065 - raw data was scraped at a time
when psleci did not have a copyright disclaimer...

Best
Raphael

On 15.05.2015 20:23, Nikhil VJ wrote:
> Not sure about the legality of the data itself, but sharing a general
> method we can use in any mapping interface which is working here as well
> on my end.
> 
> Install Firebug extension in Firefox browser.
> https://addons.mozilla.org/en-US/firefox/addon/firebug/
> 
> Go to psleci.nic.in <http://psleci.nic.in>
> Activate Firebug. Console appears at bottom (or wherever you've
> positioned it)
> Go on "Net" tab
> Under that, "XHR" tab
> 
> Select State, then District, then AC (Assembly Constituency)
> Then press one of the Search buttons.
> 
> All the polling stations in that constituency come up.
> 
> Now in the console, click on "POST GetGoogleObject" to expand it (this
> should be around a 100 KB in size now while one step earlier it was much
> smaller.)
> Go to "JSON" tab
> Click on "d" to expand it
> Right-click on "Points", and select "Copy Points as JSON "
> 
> Now go to http://konklone.io/json/
> Paste the JSON stuff there
> It gets converted to a table, and you can see it.
> Download the CSV linked.
> Open up the CSV in Calc/Excel, and edit as per your needs.
> 
> -
> Why Firebug is needed : The regular "inspect element" etc parts of
> Chrome and Firefox do help you to see the incoming JSON objects, but
> Firebug also lets you copy them out.
> 
> Screenshot:
> Inline image 1
> 
> 
> --
> Cheers,
> Nikhil
> +91-966-583-1250
> Pune, India
> Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
> http://nikhilsheth.blogspot.in
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | BGHS Bielefeld University, CSASP University of Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parsing Voters List : Glyph to Unicode issue

2015-09-21 Thread Raphael Susewind
Hi Siddarth and Nikhil,

sorry for the delay, I was travelling for the past weeks. I have worked
extensively with the electoral rolls, and ultimately the only solution I
found for the problem of corrupted text is OCR - tesseract was the most
accurate in my experiments (and the relatively fastest...). It can also
be automated, though scaling up would require vast resources.

Let us know if you find an alternative (though I am sceptical),

Best,
Raphael

On 19.09.2015 11:51, Nikhil VJ wrote:
> Hi Siddharth,
> 
> Sorry I missed this earlier.
> In April this year I converted a budget PDF to excel that had Marathi
> content, in legacy font (similar to ShreeDev). It was two-step : first
> extract to excel, and then replace all the text after passing through a
> legacy font to unicode converter (an HTML file with javascript)
> 
> http://nikhilsheth.blogspot.in/2015/05/diy-pdf-to-excel-spreadsheet-conversion.html
> 
> Just check your document or send me a copy.. if it has legacy fonts then
> copy-pasting from it gives us random english letters and punctuations.
> It it's unicode, then copy-pasting gives us unicode text only, but
> inaccurate. It's possible that someone might have made a converter for
> this; if not, then if you have enough content then you could make your
> own converter.
> 
> If the PDF has Unicode font in it, then my method fails.
> 
> I wasn't aware of the stackoverflow questions you've linked to. Great
> insights here into why Unicode extraction is failing.
> 
> If it's less pages then this free online multi-language OCR tool might
> help: http://www.i2ocr.com/free-online-hindi-ocr
> (per page time-taking process, so only advisable if content is less or
> if you have a slave army of interns at your disposal :P)
> 
> 
> 
> 
> --
> Cheers,
> Nikhil
> +91-966-583-1250
> Pune, India
> Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
> http://nikhilsheth.blogspot.in
> 
> 
> 
> 
> 
> On Tue, Sep 1, 2015 at 7:37 PM, Siddharth Vijayakrishnan
> mailto:svija...@gmail.com>> wrote:
> 
> Hi,
> 
> I downloaded a few files containing voter rolls and tried to parse
> the PDFs using pdfminer. Ran straight into a problem[1] where the
> glyphs are converted to unicode using a wrong character map.  Before
> I try and solve this on my own, I wonder if anyone in this community
> has a readymade solution ?
> 
> [1]
> 
> http://stackoverflow.com/questions/31876415/parsing-a-pdfdevanagari-script-using-pdfminer-gives-incorrect-output
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr. Raphael Susewind | Political anthropologist, Associate CSASP Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parsing Voters List : Glyph to Unicode issue

2015-09-29 Thread Raphael Susewind
Hi Nikhil and all,

I had the best results with a python tool called pdf-table-extract:

https://github.com/ashima/pdf-table-extract

you have to tweak the parameters a bit, but then it rather nicely
extracts the coordinates of each cell (defined as something surrounded
by a black rectangle) which you can then feed into ghostscript or some
such to extract the image (gs is faster than pdftoppm IMHO). In most
cases pdf-table-extract -i FILE -p PAGE -r 300 -l 0.7 -t cells_xml
worked nicely for electoral rolls...

Just my 5 cents,

Raphael

On 29.09.2015 21:19, Nikhil VJ wrote:

> Hi Raphael,
> 
> Thanks for sharing about Tesseract: it always helps to know what's in
> the engines ~:)
> 
> I wish we had a way of OCR'ing tabular documents. Tabula's interface
> combined with OCR.
> I created a feature request on Tabula for this :
> https://github.com/tabulapdf/tabula/issues/409
> Let's hope it gets some love! Please +1 it!
> 
> Siddharth, you should share at least a one page PDF sample of what
> you're working with, we'll be able to see which way is best for what
> you've got.
> 
> If one goes the OCR way, we might need to convert the target PDF to
> image format. There are quite some online sites for doing that, but it
> gets tricky when using non-English script. If you're on a linux OS, then
> *pdftoppm* is a good command line tool to use.
> 
> Sample command: pdftoppm -rx 200 -ry 200 -png b.pdf b
> (200 sets DPI.. I found this to be best with the docs I was doing)

-- 
Dr. Raphael Susewind | Political anthropologist, Associate CSASP Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Form 20 for Mumbai

2016-01-16 Thread Raphael Susewind
Dear all,

I am reviving this old thread to ask whether anyone has Form 20 election
results for Maharashtra 2014 Assembly Polls in a usable format (csv,
json, xml, whatever - but not scanned pdf...)?

The PDFs are online on the CEO website, but before I go to the trouble
of extracting data, I wonder whether someone has done it already?

If not, I shall put them on the datameet github in a few weeks...

Thanks,
Raphael

On 07.11.2014 12:16, Avinash Celestine wrote:
> no unfortunately not. my impression is that maharashtra is slower than
> putting these out than some other states.
> 
> A
> 
> On Nov 7, 2014 2:09 PM, "Raphael Susewind"  <mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Avinash,
> 
> Thanks - I did not check all PDFs systematically, should have done...
> 
> Any idea when and/or whether assembly form 20 will be available?
> 
> Best,
> Raphael
> 
> On 07.11.2014 08:52, Avinash Celestine wrote:
> > I should mention that these are for the parliamentary elections of May
> > 2014, not the recent assembly elections.
> >
> > A
> >
> > On Fri, Nov 7, 2014 at 1:20 PM, Avinash Celestine
> > mailto:avinash.celest...@gmail.com>
> <mailto:avinash.celest...@gmail.com
> <mailto:avinash.celest...@gmail.com>>> wrote:
> >
> > Hi Raphael,
> >
> > some of those links are dead, but not all. seems not all form 20s
> > for each constituency have been uploaded yet. I have the ones for
> > which it is (downloaded sometime back) ...attached. as far as
> mumbai
> > is concerned, I think the south mumbai data is not there
> > yet...ignore the pdf files which are too small in size (2KB etc).
> > Those are the ones for which the links were dead.
> >
> >
> >
> >
> > ​
> >  mhGE2014-incomplete.zip
> >   
>  
> <https://docs.google.com/file/d/0BxAgA1sHG2dMcDVrWFd1Vkozb2M/edit?usp=drive_web>
> > ​
> >
> > On Fri, Nov 7, 2014 at 12:09 PM, Raphael Susewind
> > mailto:li...@raphael-susewind.de>
> <mailto:li...@raphael-susewind.de
> <mailto:li...@raphael-susewind.de>>> wrote:
> >
> > Dear all,
> >
> > does anyone have access to booth-level results for
> Maharashtra,
> > especially Mumbai, both general and assembly elections? Or any
> > information as to whether and when it might be available?
> On the CEO
> > website, one finds links to general election form 20, but
> those
> > links
> > are dead. No mention of assembly data (either now or earlier):
> >
> > https://www.ceo.maharashtra.gov.in/Results/Form20.aspx
> >
> > Any hint appreciated,
> > Raphael
> >
> > --
> > Raphael Susewind | BGHS Bielefeld University, CSASP University
> > of Oxford
> >   Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >Web & Twitter | http://www.raphael-susewind.de |
> @RaphaelSusewind
> >
> > Please do consider http://www.gnupg.org for encryption (key id
> > 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India.
> > Know more about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the
> > Google Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from
> > it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>
> > <mailto:datameet%2bunsubscr...@googlegroups.com
> <mailto:datameet%252bunsubscr...@googlegroups.com>>.
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to datameet

Re: [datameet] Form 20 for Mumbai

2016-01-24 Thread Raphael Susewind
Dear all,

following my earlier email, I have now compiled booth-level results for
the 2014 assembly polls in Mumbai (ACs 152-187) and put them in the
datameet github repo (pull request pending) in case anyone is
interested: https://github.com/datameet/india-election-data/pull/16

Happy coding,
Raphael

On 16.01.2016 18:26, Raphael Susewind wrote:

> Dear all,
> 
> I am reviving this old thread to ask whether anyone has Form 20 election
> results for Maharashtra 2014 Assembly Polls in a usable format (csv,
> json, xml, whatever - but not scanned pdf...)?
> 
> The PDFs are online on the CEO website, but before I go to the trouble
> of extracting data, I wonder whether someone has done it already?
> 
> If not, I shall put them on the datameet github in a few weeks...
> 
> Thanks,
> Raphael

-- 
Dr. Raphael Susewind | Political anthropologist, Associate CSASP Oxford
  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
   Web & Twitter | http://www.raphael-susewind.de | @RaphaelSusewind

Please do consider http://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Bhuvan Derived Data Copyright Question (Shapefile data)

2016-01-31 Thread Raphael Susewind
siasts in
> India. Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to
> the Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to datameet+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> Nisha Thompson
> DataMeet.org
> ni...@datameet.org <mailto:ni...@datameet.org>
> skype: nishaqt
> mobile: 962-061-2245
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Political anthropologist, Associate CSASP Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Data on religion and politics in India now on GitHub

2016-02-22 Thread Raphael Susewind
Dear all,

over the last weeks, I moved my comprehensive dataset on religion and
politics in India to GitHub. This should make it more easily accessible
and also makes it easier for others - you - to add content.

So far, it includes Uttar Pradesh data, namely booth-level election
results, booth-level estimates of religious demography, candidate names,
GIS data, and a table that (partially) links booth-level data from 2007
through 2009 and 2012 to 2014. Detailed infos on all the tables and
variables, on licenses, etc are online here:

https://github.com/raphael-susewind/india-religion-politics

Do play around with it, give it a GitHub star if you like it, and
improve upon it! In a couple of weeks, I will create a formal first
release, in case no major bugs crop up until then.

This group has been a tremendous resource for me; I am glad to give back
what I can in the spirit of open data sharing and research.

All the best,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Pincode Boundaries of India

2016-03-19 Thread Raphael Susewind
Hi Shravan,

another option - depending on what you are after - could be to use
Devdatta's point data for post offices, voronoi it into polygons, and
aggregate by pincode - that might not be the same as official
boundaries, but the closest you can get (each locality in India would be
assigned to the most proximate postoffice...)

Best,
Raphael

On 17.03.2016 06:18, Jaisen Nedumpala wrote:
> Hi Shravan,
> 
> I don't think that you would get it that easy. I was in search of this
> data, since the year 2008. Eventually I could understand that even the
> department of posts doesnt have this data. We could do it as a community
> project to build it. Not easy, but not impossible.
> 
> 
> 2016-03-17 10:32 GMT+05:30 shravan  <mailto:shravan.s...@gmail.com>>:
> 
> Hey everyone,
> 
> I am looking for pin code boundaries of India, preferably in any of
> the GIS file formats ( kml, kmz, shp, geojson or any other ). It
> would be nice if someone can point me in the right direction, where
> I can get this data from.
> 
> Thanks,
> Shravan
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> -- 
> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
>  - ജയ്സെനോവ് നെടുമ്പാലോവിച്ച് പഹയനോവ്സ്കി -
> ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~
> (`'·.¸(`'·.¸^¸.·'´)¸.·'´)
> «´¨`·* . Jaisenov. *..´¨`»
> (¸.·'´(`'·.¸ ¸.·'´)`'·.¸)
> ¸.·´^.`'·.¸ ¸.·'´
>  ( `·.¸`·.¸
>   `·.¸ )`·.¸
>  ¸.·(´ `·.¸
> ¸.·(.·´)`·.¸
>   ( `v´ )
> `v´
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Pincode Boundaries of India

2016-03-27 Thread Raphael Susewind
Dear Avinash and all,

I will try to make some time this week to scrape the pincodes from
electoral rolls for all polling booths in my electoral GIS shapefiles.

Since pincode is in latin script, this should not be affected by the
much discussed PDF scraping issues with electoral rolls.

We could then either go down the voronoi route, or alternatively use the
heatmap processing chain that I used to generate AC boundaries - this
latter would have the advantage of dealing with wrong coordinates in the
booth point dataset (basically, not all electoral booth coordinates are
correct; consequently, if we only voronoi, we would have a blip of
pincode B within a see of pincode A quite frequently. The heatmap stuff
takes care of this).

Since I am not familiar with postal boundaries: can anyone here confirm
whether pincode areas are contiguous, and whether each pincode has only
one area? Or can it be that several non-contiguous areas have the same
pincodem intersparsed with other pincodes? (In which case voronoi would
perhaps be the better solution at last)

In any case, I hope to give you the pincode for each polling booth by
end of the week or so (based on all-India 2014 electoral rolls),

Best,
Raphael

On 28.03.2016 06:33, Avinash Celestine wrote:

> perhaps one way is to avoid using postal data altogether.
> 
> All header pages in electoral rolls(the first page) contain the name of
> the polling station related to that roll, the PS number, and importantly
> the pin code.
> 
>  A site like psleci.nic.in <http://psleci.nic.in> has geog coordinates
> of polling stations (though Raphael had collected the data earlier*).
> Matching the two will give a fairly dense scattering of points  - in
> fact much more dense than if we used some of the methods earlier in this
> thread.
> 
> We thus have a way of associating a pin code with a geo coordinate. We
> can then use the voronoi method.
> 
> Electoral rolls are mostly in pdf which make them difficult to scrape.
> But from what i have seen, for any given state, the location on the
> header page, of the pincode number is more or less constant, making it
> possible to target just that part of the page with any pdf parser.
> 
> Electoral rolls have become difficult to download in bulk( a good
> thing!) but i understand different people on this group have the pdfs
> for different states. Putting this stuff together should give us
> comprehensive data on header pages for atleast some states.
> Alternatively, we can file RTIs for just the header pages of electoral
> rolls, though i dont know how successful that would be.
> 
> * Raphael's data is
> at https://github.com/raphael-susewind/india-election-data
> 
> 
> 
> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali  <mailto:iota.kod...@gmail.com>> wrote:
> 
> Well, There were postal delivery zones in the past and the postal
> department even used to make maps of these zones. The Delhi postal
> delivery zone map
> 
> <https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing>
>  had
> boundaries for delhi. I am not sure if other cities had them or how
> long the postal department was doing this, but it certainly can help
> with the boundaries for cities.
> 
> Regards,
> Srinivas Kodali
> www.lostprogrammer.com <http://www.lostprogrammer.com>
> /"Not everyone who wanders is lost, I am probably a bit"/
> 
> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh  <mailto:arungra...@gmail.com>> wrote:
> 
> Shravan, crowdsourcing the boundaries of pincodes is not as
> trivial as you think. To start with, an area does not fall under
> a pincode, rather a street does based on the post office that
> services it. Read
> this: http://www.georeference.org/doc/zip_codes_are_not_areas.htm
> 
> You may also want to do some background reading of existing
> research that has been done by the group
> here: https://datameet.hackpad.com/M4hPFJVV2Gm?eid=v4YoXN4tTw5
> 
> To sum up, nobody has precise pincode boundaries like how you
> imagine them, not even the postal department. Any existing
> datasets are an estimate at best using some data processing on a
> large volume of address data.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India.
> Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> &

Re: [datameet] Pincode Boundaries of India

2016-04-01 Thread Raphael Susewind
Dear all,

following up on my earlier email, I just pushed a list of pincodes for
all electoral booths across India to GitHub and made a pull request to
the datameet repository:

https://github.com/datameet/pincodes/pull/2

Please note that this can be incomplete, and is based on a rather
brutish, quick and dirty hack - see comments in rolls2pincode.pl. But it
does use the same IDs as those in the 2014 elections, and hence can be
combined with my GIS shapefiles for polling booths:

http://dx.doi.org/10.4119/unibi/2674065

I leave it to others to double-check accuracy and create actual pincode
maps. I hope this is useful,

Best,
Raphael

On 28.03.2016 07:50, Raphael Susewind wrote:

> Dear Avinash and all,
> 
> I will try to make some time this week to scrape the pincodes from
> electoral rolls for all polling booths in my electoral GIS shapefiles.
> 
> Since pincode is in latin script, this should not be affected by the
> much discussed PDF scraping issues with electoral rolls.
> 
> We could then either go down the voronoi route, or alternatively use the
> heatmap processing chain that I used to generate AC boundaries - this
> latter would have the advantage of dealing with wrong coordinates in the
> booth point dataset (basically, not all electoral booth coordinates are
> correct; consequently, if we only voronoi, we would have a blip of
> pincode B within a see of pincode A quite frequently. The heatmap stuff
> takes care of this).
> 
> Since I am not familiar with postal boundaries: can anyone here confirm
> whether pincode areas are contiguous, and whether each pincode has only
> one area? Or can it be that several non-contiguous areas have the same
> pincodem intersparsed with other pincodes? (In which case voronoi would
> perhaps be the better solution at last)
> 
> In any case, I hope to give you the pincode for each polling booth by
> end of the week or so (based on all-India 2014 electoral rolls),
> 
> Best,
> Raphael
> 
> On 28.03.2016 06:33, Avinash Celestine wrote:
> 
>> perhaps one way is to avoid using postal data altogether.
>>
>> All header pages in electoral rolls(the first page) contain the name of
>> the polling station related to that roll, the PS number, and importantly
>> the pin code.
>>
>>  A site like psleci.nic.in <http://psleci.nic.in> has geog coordinates
>> of polling stations (though Raphael had collected the data earlier*).
>> Matching the two will give a fairly dense scattering of points  - in
>> fact much more dense than if we used some of the methods earlier in this
>> thread.
>>
>> We thus have a way of associating a pin code with a geo coordinate. We
>> can then use the voronoi method.
>>
>> Electoral rolls are mostly in pdf which make them difficult to scrape.
>> But from what i have seen, for any given state, the location on the
>> header page, of the pincode number is more or less constant, making it
>> possible to target just that part of the page with any pdf parser.
>>
>> Electoral rolls have become difficult to download in bulk( a good
>> thing!) but i understand different people on this group have the pdfs
>> for different states. Putting this stuff together should give us
>> comprehensive data on header pages for atleast some states.
>> Alternatively, we can file RTIs for just the header pages of electoral
>> rolls, though i dont know how successful that would be.
>>
>> * Raphael's data is
>> at https://github.com/raphael-susewind/india-election-data
>>
>>
>>
>> On Sun, Mar 27, 2016 at 12:07 PM, srinivas kodali > <mailto:iota.kod...@gmail.com>> wrote:
>>
>> Well, There were postal delivery zones in the past and the postal
>> department even used to make maps of these zones. The Delhi postal
>> delivery zone map
>> 
>> <https://drive.google.com/file/d/0B1RcWLku0ZOWWVBHMldrZWdfZEU/view?usp=sharing>
>>  had
>> boundaries for delhi. I am not sure if other cities had them or how
>> long the postal department was doing this, but it certainly can help
>> with the boundaries for cities.
>>
>> Regards,
>> Srinivas Kodali
>> www.lostprogrammer.com <http://www.lostprogrammer.com>
>> /"Not everyone who wanders is lost, I am probably a bit"/
>>
>> On Tue, Mar 22, 2016 at 9:29 PM, Arun Ganesh > <mailto:arungra...@gmail.com>> wrote:
>>
>> Shravan, crowdsourcing the boundaries of pincodes is not as
>> trivial as you think. To start with, an area does not fall under
>> a pincode, rather a street does based on the post office that
>>

Re: [datameet] Pincode Boundaries of India

2016-04-02 Thread Raphael Susewind
Hi Dev,

there are state/state.boothraw.* shapefiles, these should contain the
raw polling booth locations.

Heatmap scripts are terribly customized - I would have to look into this
myself, I am afraid, which could take some time (very busy)

You would have to go with voronois for now, sorry,

Best,
Raphael

On 02.04.2016 09:02, Devdatta Tengshe wrote:
> Hi Raphael,
> 
> Firstly, thanks a lot for extracting this information.
> 
> I was looking at http://dx.doi.org/10.4119/unibi/2674065, but I could
> find only the Boundaries for the constituencies.
> 
> Can you tell us where we can find the locations of the polling booths
> that you had extracted?
> 
> Secondly, can you also share (if you still have them) the heatmaps code
> that you used to create the constituency boundaries? I think that is
> what will be required to create the pincode boundaries as well.
> 
> Regards,
> Dev
> 
> Regards,
> Devdatta
> 
> On Fri, Apr 1, 2016 at 6:31 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> following up on my earlier email, I just pushed a list of pincodes for
> all electoral booths across India to GitHub and made a pull request to
> the datameet repository:
> 
> https://github.com/datameet/pincodes/pull/2
> 
> Please note that this can be incomplete, and is based on a rather
> brutish, quick and dirty hack - see comments in rolls2pincode.pl
> <http://rolls2pincode.pl>. But it
> does use the same IDs as those in the 2014 elections, and hence can be
> combined with my GIS shapefiles for polling booths:
> 
> http://dx.doi.org/10.4119/unibi/2674065
> 
> I leave it to others to double-check accuracy and create actual pincode
> maps. I hope this is useful,
> 
> Best,
> Raphael
> 
> On 28.03.2016 07:50, Raphael Susewind wrote:
> 
> > Dear Avinash and all,
> >
> > I will try to make some time this week to scrape the pincodes from
> > electoral rolls for all polling booths in my electoral GIS shapefiles.
> >
> > Since pincode is in latin script, this should not be affected by the
> > much discussed PDF scraping issues with electoral rolls.
> >
> > We could then either go down the voronoi route, or alternatively
> use the
> > heatmap processing chain that I used to generate AC boundaries - this
> > latter would have the advantage of dealing with wrong coordinates
> in the
> > booth point dataset (basically, not all electoral booth
> coordinates are
> > correct; consequently, if we only voronoi, we would have a blip of
> > pincode B within a see of pincode A quite frequently. The heatmap
> stuff
> > takes care of this).
> >
> > Since I am not familiar with postal boundaries: can anyone here
> confirm
> > whether pincode areas are contiguous, and whether each pincode has
> only
> > one area? Or can it be that several non-contiguous areas have the same
> > pincodem intersparsed with other pincodes? (In which case voronoi
> would
> > perhaps be the better solution at last)
> >
> > In any case, I hope to give you the pincode for each polling booth by
> > end of the week or so (based on all-India 2014 electoral rolls),
> >
> > Best,
> > Raphael
> >
> > On 28.03.2016 06:33, Avinash Celestine wrote:
> >
> >> perhaps one way is to avoid using postal data altogether.
> >>
> >> All header pages in electoral rolls(the first page) contain the
> name of
> >> the polling station related to that roll, the PS number, and
> importantly
> >> the pin code.
> >>
> >>  A site like psleci.nic.in <http://psleci.nic.in>
> <http://psleci.nic.in> has geog coordinates
> >> of polling stations (though Raphael had collected the data earlier*).
> >> Matching the two will give a fairly dense scattering of points  - in
> >> fact much more dense than if we used some of the methods earlier
> in this
> >> thread.
> >>
> >> We thus have a way of associating a pin code with a geo
> coordinate. We
> >> can then use the voronoi method.
> >>
> >> Electoral rolls are mostly in pdf which make them difficult to
> scrape.
> >> But from what i have seen, for any given state, the location on the
> >> header page, of the pincode number is more or less constant,
> making it
> >

[datameet] Village to AC mapping

2016-04-03 Thread Raphael Susewind
Dear all,

some time ago, we had a discussion of linking Census villages to
assembly constituencies; there is also a dataset in the datameet
catalog: https://github.com/datameet/catalog

Since this is not complete, though, and lacks Census ID codes, I have
generated a new table (through spatial matching); pull request here:

https://github.com/datameet/india-election-data/pull/16

Hope this is useful,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Census EB coordinates (esp for Mumbai)?

2016-05-18 Thread Raphael Susewind
Dear all,

we had this discussion some time ago, and I fear the situation hasn't
changed - but I wonder whether anyone here can share lat/long
co-ordinates or maps of Census Enumeration Blocks, ie the smallest level
of Census operations?

I am particularly interested in Mumbai...

Best,
Raphael

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Help with schools location data extracting

2016-05-21 Thread Raphael Susewind
Hi Nikhil,

most likely the flash application loads something like a JSON (or CSV,
if they are bad programmers ;-) ) from a specified API address. Use a
network sniffer to intercept the traffic that the flashplayer generates,
and see whether you can replicate the API.

If you are lucky, you will see HTTP requests to an URL along the lines
of http://schoolgis.nic.in/state_x/data.json?school=001 to
14986. In that case, you can then manually scrape the JSON files (if
need be by emulating a flashplayer's HTTP headers, though I doubt that
they check for this).

If you are unlucky, its a more complex API - some stateful frontends for
SQL databases can be very nasty to replicate, for instance. One brute
force kind of solution in such cases would be to write a custom proxy
server (there are python/perl/... modules for this) - i.e. a kind of
customized sniffer - and route your browser traffic through this, then
automate the browser (again, there are plugins for firefox and chrome
that have corresponding python or perl interfaces), and intercept the
traffic generated. That's the solution I found to scrape polling station
localities from the ECI server (before they put a bold copyright
disclaimer on it - now this kind of scraping would probably be illegal -
so do check these issues as well).

Let us know what you find out about the API,

Best of luck,
Raphael

On 21.05.2016 11:43, Nikhil VJ wrote:

> Hi friends,
> 
> is their any way to extract data from such a flash player platform...as
> follows...
> 
> schoolgis.nic.in <http://schoolgis.nic.in/>
> 
> --regards,
> Nikhil VJ
> Pune
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Associate, Contemporary South Asia Studies, Oxford
 Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
  Web & Twitter | https://www.raphael-susewind.de | @RaphaelSusewind
 Impact | https://impactstory.org/raphael-susewind

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Village to AC mapping

2016-07-12 Thread Raphael Susewind
Dear Naveen,

yes, sure - I just pushed these to GitHub, pull request here:

https://github.com/datameet/india-election-data/pull/17

Best,
Raphael

On 11.07.2016 08:21, Naveen Bharathi wrote:
> 
> For my research I wanted the list of Assembly constituencies and their
> corresponding villages (from census directories) pre-delimitation and
> post-delimitation for Karnataka. 
> By any chance would you have the list for pre-limitation assembly
> segments and their corresponding census villages in the same way? 
> 
> On Monday, April 4, 2016 at 11:58:26 AM UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> some time ago, we had a discussion of linking Census villages to
> assembly constituencies; there is also a dataset in the datameet
> catalog: https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>
> 
> Since this is not complete, though, and lacks Census ID codes, I have
> generated a new table (through spatial matching); pull request here:
> 
> https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>
> 
> Hope this is useful,
> Raphael
> 
> -- 
> Dr Raphael Susewind | Associate, Contemporary South Asia Studies,
> Oxford
>  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>   Web & Twitter | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de> | @RaphaelSusewind
>  Impact | https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>
> 
> Please consider https://www.gnupg.org for encryption (key id 10AEE42F)
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
 | https://www.raphael-susewind.de

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


signature.asc
Description: OpenPGP digital signature


Re: Private message regarding: [datameet] Re: Village to AC mapping

2016-07-13 Thread Raphael Susewind
Dear Shafeeq,

thanks for noting this. Could you post your findings to the GitHub
README? I think it is important that others who might want to use that
data know about its limitations (I dont use it myself, merely posted in
on popular request).

The thing is: this is done through automated spatial matching, so there
is really nothing I can do about accuracy - I have a point cloud of
villages and AC polygon shapefiles, and merely let QGIS merge the two. I
cannot vouch for the accuracy of either the point cloud or the polygons
- they are proprietary, not my own creation...

Regards,
Raphael

On 13.07.2016 11:57, Shafeeq Rahman wrote:
> Dear Raphael
> 
> Thanks for sharing such useful information.
> 
> I just cross checked this list for Uttar Pradesh and found error in pre
> and post delimitation i.e. in Ghaziabad district pre delimitation
> showing its villages (like 909000201085000) in Pillibhit AC which is far
> from Ghaziabad.
> 
> Kindly check the same and if require I may cross check the same for
> other states also.
> 
> Regards,
> 
> Shafeeq 
> 
> 
> 
> 
> 
> On Tuesday, 12 July 2016 18:18:39 UTC+5:30, Raphael Susewind wrote:
> 
> Dear Naveen,
> 
> yes, sure - I just pushed these to GitHub, pull request here:
> 
> https://github.com/datameet/india-election-data/pull/17
> <https://github.com/datameet/india-election-data/pull/17>
> 
> Best,
> Raphael
> 
> On 11.07.2016 08:21, Naveen Bharathi wrote:
> >
> > For my research I wanted the list of Assembly constituencies and
> their
> > corresponding villages (from census directories) pre-delimitation and
> > post-delimitation for Karnataka.
> > By any chance would you have the list for pre-limitation assembly
> > segments and their corresponding census villages in the same way?
> >
> > On Monday, April 4, 2016 at 11:58:26 AM UTC+5:30, Raphael Susewind
> wrote:
> >
> > Dear all,
> >
> > some time ago, we had a discussion of linking Census villages to
> > assembly constituencies; there is also a dataset in the datameet
> > catalog: https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>
> > <https://github.com/datameet/catalog
> <https://github.com/datameet/catalog>>
> >
> > Since this is not complete, though, and lacks Census ID codes,
> I have
> > generated a new table (through spatial matching); pull request
> here:
> >
> > https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>
> > <https://github.com/datameet/india-election-data/pull/16
> <https://github.com/datameet/india-election-data/pull/16>>
> >
> > Hope this is useful,
> > Raphael
> >
> > --
> > Dr Raphael Susewind | Associate, Contemporary South Asia Studies,
> > Oxford
> >  Snail Mail | Melanchthonstr. 4a, 33615 Bielefeld,
> Germany
> >   Web & Twitter | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>
> > <https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>> | @RaphaelSusewind
> >  Impact | https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>
> > <https://impactstory.org/raphael-susewind
> <https://impactstory.org/raphael-susewind>>
> >
> > Please consider https://www.gnupg.org for encryption (key id
> 10AEE42F)
> >
> > --
> > Datameet is a community of Data Science enthusiasts in India. Know
> more
> > about us by visiting http://datameet.org
> > ---
> > You received this message because you are subscribed to the Google
> > Groups "datameet" group.
> > To unsubscribe from this group and stop receiving emails from it,
>     send
> > an email to datameet+u...@googlegroups.com 
> > <mailto:datameet+u...@googlegroups.com >.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> -- 
> Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
>  | https://www.raphael-susewind.de
> <https://www.raphael-susewind.de>
> 
> Please consider https://www.gnupg.org for encryption (key id 10AEE42F)
> 

-- 
Raphael Susewind | Melanchthonstr. 4a, 33615 Bielefeld, Germany
 | https://www.raphael-susewind.de

Please consider https://www.gnupg.org for encryption (key id 10AEE42F)

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Library to read tables in scanned PDFs

2017-01-23 Thread Raphael Susewind
Hi Mohit,

just to add - a hacked-but-working workflow to extract the table
structure and OCR bits and pieces as needed can be found in my GitHub,
for instance here (at the bottom of the perl file):

https://github.com/raphael-susewind/india-religion-politics/blob/master/rajrolls2014/run-in-arc/pdf2list.pl

It boils down to

pdf-table-extract -i $file -p $page -r 300 -l 0.7 -t cells_xml

for each page, parsing the results to extract cell coordinates, then

gs -q -r300 -dFirstPage=$page -dLastPage=$page -sDEVICE=tiffgray
-sCompression=lzw -o $temp.tif -g".$width."x".$height." -c '<> setpagedevice' -f $file

to get a TIFF of this cell, to be fed into

tesseract -psm 4 -l hin temp.tif stdout

(in the case of devanagari)

Best of luck,
Raphael

On 01/23/2017 09:20 AM, Amanbir Singh wrote:
> Hi Mohit,
> 
> You'll have to use OCR on the pdf before any other method can be
> applied. This obviously makes it more complicated, but still manageable. 
> 
> You could use the Tesseract, a popular OCR package
> (https://github.com/tesseract-ocr/tesseract) and then try using tabula
> or the other packages mentioned. I've also had success using Xpdf
> (http://www.foolabs.com/xpdf/) to convert pdfs to text and then parsing
> the text. 
> 
> Aman
> 
> 
> On Friday, 20 January 2017 18:18:59 UTC+5:30, mohit ranjan wrote:
> 
> Tried Tabula, but again it's for PDF which has all the meta-data
> within it.
> I need it for paper scanned PDF/JPG and it fails by saying so
> 
> /"Sorry, your PDF file is image-based; it does not have any embedded
> text. It might have been scanned from paper... Tabula isn't able to
> extract any data from image-based PDFs. Click the Help button for
> more information."/
> 
> - Mohit
> 
> On Fri, Jan 20, 2017 at 6:14 PM, Srinivasan Ramani
> > wrote:
> 
> Tabula - http://tabula.technology/ works great with table
> extraction from PDFs. 
> 
> On Fri, Jan 20, 2017 at 5:51 PM, mohit ranjan
> > wrote:
> 
> Thanks for response Johnson.
> 
> Is this the pdf-table-extract
> <https://github.com/ashima/pdf-table-extract> you are
> referring to ?
> It says, it reads table meta from PDF. 
> 
> My query was for scanned PDF/JPG images
> 
> - Mohit
> 
> On Fri, Jan 20, 2017 at 4:37 PM, Johnson Chetty
> > wrote:
> 
> 
> Hello, 
> 
> I have had some reasonable success with 'pdfquery'
> if you like Python. It works with regional text as
> well. 
> Also, for tabular data, do try pdf-table-extract if
> quick and dirty works for you. 
> 
> Java folks should try pdfbox. 
> 
> 
> 
> 
> 
> On 20 January 2017 at 15:23, mohit ranjan
> > wrote:
> 
> Sorry if this is off-topic, but have seen
> threads here about liberating data from PDFs.
> Most likely there will be lot of scanned PDFs
> among them.
> 
> Do we have any in-house expert on this and which
> library/tool (preferably not paid) to extract
> tables in scanned PDF/JPG ?
> 
> CVision
> 
> <http://www.cvisiontech.com/library/ocr/file-ocr/ocr-table-recognition.html>
> does a decent job, but it's paid.
> 
> 
> 
> - Mohit
> 
> -- 
> Datameet is a community of Data Science
> enthusiasts in India. Know more about us by
> visiting http://datameet.org
> ---
> You received this message because you are
> subscribed to the Google Groups "datameet" group.
> To unsubscribe from this group and stop
> receiving emails from it, send an email to
> datameet+u...@googlegroups.com .
> For more options, visit
> https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in
> India. Know more about us by visitin

Re: [datameet] Oriya and Malayalam readers...

2017-03-03 Thread Raphael Susewind
Dear Sitansu,

thank you so much - this is very helpful. Keralites, you out there?

Best,
Raphael

On 03/03/2017 04:29 PM, Sitansu Mahapatra wrote:
> Dear Raphael
> 
> I can read and write Odia. Please find the attachment for translation text.
> 
> Regards,
> Sitansu
> OGD Platform India (data.gov.in <http://data.gov.in>)
> 
> On Fri, Mar 3, 2017 at 7:46 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> I am currently extracting front page information from electoral rolls -
> village, taluk, district, station name, station address, pincodes etc.
> Some people here are also interested in this as far as I remember...
> 
> Since I don't read Malayalam and Oriya, could somebody here help me out
> and translate the Oriya bits from the attached image? And someone else
> tell me where on Kerala's rolls these variables (station name, address,
> village, taluk, etc) can be found (Kerala uses a somewhat different
> frontpage layout - see second attachment)?
> 
> Much appreciated,
> and expect the results on GitHub soon,
> 
> Best,
> Raphael
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Lecturer in Social Anthropology & Development
| Department of International Development
| King's College London, London WC2R 2LS, UK
| https://www.raphael-susewind.de

Please consider PGP for encryption: https://keybase.io/raphaelsusewind


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Oriya and Malayalam readers...

2017-03-06 Thread Raphael Susewind
Hi Naveen,

thanks for alerting me to the changes - I am interested in whatever was
the case during 2014 elections, but will make a note on the published
dataset to alert others that things change constantly...

Best,
Raphael

On 03/04/2017 02:58 AM, Naveen Francis wrote:
> Hi Raphael,
> 
> Electoral roll have very old data, even which is published in 2017 
> http://ceo.kerala.gov.in/electoralrolls.html
> 
> Taluk/Local Govt mapping have been changed much. 
> Data ECI publishing is based on 2005 Local Govt delimitation. 
> Taluk has increased from 63 to 75. 
> 
> State election commission had updated data. Now their list is not seeing. 
> sec.kerala.gov.in
> 
> Thanks,
> Naveen
> 
> 
> On Friday, 3 March 2017 19:49:12 UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> I am currently extracting front page information from electoral rolls -
> village, taluk, district, station name, station address, pincodes etc.
> Some people here are also interested in this as far as I remember...
> 
> Since I don't read Malayalam and Oriya, could somebody here help me out
> and translate the Oriya bits from the attached image? And someone else
> tell me where on Kerala's rolls these variables (station name, address,
> village, taluk, etc) can be found (Kerala uses a somewhat different
> frontpage layout - see second attachment)?
> 
> Much appreciated,
> and expect the results on GitHub soon,
> 
> Best,
> Raphael
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: Oriya and Malayalam readers...

2017-03-06 Thread Raphael Susewind
Hi Nishadh (?) and George (?),

thanks very much - this is exactly what I hoped for. The complete
dataset with frontpage details will be on my GitHub sometime next week;
will let the group know when and where,

Best,
Raphael

On 03/04/2017 03:39 AM, nishadh wrote:
> For Malayalam, the translated words are mostly in right and always the
> above of Malayalam words
> 
> On Saturday, March 4, 2017 at 8:28:13 AM UTC+5:30, Naveen Francis wrote:
> 
> Hi Raphael,
> 
> Electoral roll have very old data, even which is published in 2017 
> http://ceo.kerala.gov.in/electoralrolls.html
> <http://ceo.kerala.gov.in/electoralrolls.html>
> 
> Taluk/Local Govt mapping have been changed much. 
> Data ECI publishing is based on 2005 Local Govt delimitation. 
> Taluk has increased from 63 to 75. 
> 
> State election commission had updated data. Now their list is not
> seeing. 
> sec.kerala.gov.in <http://sec.kerala.gov.in>
> 
> Thanks,
> Naveen
> 
> 
> On Friday, 3 March 2017 19:49:12 UTC+5:30, Raphael Susewind wrote:
> 
> Dear all,
> 
> I am currently extracting front page information from electoral
> rolls -
> village, taluk, district, station name, station address,
> pincodes etc.
> Some people here are also interested in this as far as I
> remember...
> 
> Since I don't read Malayalam and Oriya, could somebody here help
> me out
> and translate the Oriya bits from the attached image? And
> someone else
> tell me where on Kerala's rolls these variables (station name,
> address,
> village, taluk, etc) can be found (Kerala uses a somewhat different
> frontpage layout - see second attachment)?
> 
> Much appreciated,
> and expect the results on GitHub soon,
> 
> Best,
> Raphael
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] data request

2017-03-06 Thread Raphael Susewind
Dear Roshan,

I have them booth-wise, which should be easy to aggregate to AC level:

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2012

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2014

Otherwise have a look at the CEO Uttar Pradesh website...

Best,
Raphael

On 03/06/2017 12:45 PM, roshan kishore wrote:
> Would anyone have party-wise (BJP, BSP, Congress, SP) votes for each
> assembly segment for UP in 2012 and 2014 elections. 
> 
> Best
> Roshan 
> 
> Data Journalist 
> Mint 
> 
> official id: rosha...@htlive.com <mailto:rosha...@htlive.com> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Lecturer in Social Anthropology & Development
| Department of International Development
| King's College London, London WC2R 2LS, UK
| https://www.raphael-susewind.de

Please consider PGP for encryption: https://keybase.io/raphaelsusewind


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Pincode Boundaries of India

2017-03-07 Thread Raphael Susewind
Hi Palash,

no need to - have just pushed it all to GitHub (see separate
announcement)...

Best,
Raphael

On 03/07/2017 11:22 AM, Palash Kulshrestha wrote:
> Hi Veena
> I may be able to help if you can clearly define the steps (not able to 
> understand the kannada language). 
> As far as i can see, pincode in the pdf is 6 digit number which can be easily 
> grepped from pdf.The question is where is the polling booth name.
> 

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[datameet] Electoral roll frontpage details for Andhra, Delhi, Haryana, Karnataka, Kerala, Maharashtra, MP, Orissa, Rajasthan and UP

2017-03-07 Thread Raphael Susewind
Dear all,

I just updated my GitHub repo on elecoral data with the various details
on the electoral roll frontpages of abovementioned states (Gujarat to be
added later this week). This is basically (in vernacular script) the
district, taluk, village, ward, address, name, pincode, etc of each
booth. The data are in the respective *id tables of my dataset, so for
Karnataka it would be in the karid table. Hope this is useful:

https://github.com/raphael-susewind/india-religion-politics

This is based on the 2014 rolls, used during the last general election.

Feel free to experiment with it and please alert me to any mistakes -
its a semi-automated process, so mistakes can always happen.

Incidently, does anybody know of a source that links Census ID codes to
village names in vernacular script? The lgov directory only maps them to
latin script. Ultimately, the goal is of course to link electoral and
census data together at finer levels - something many here have been
interested in over the years...

Best,
Raphael

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Electoral roll frontpage details for Andhra, Delhi, Haryana, Karnataka, Kerala, Maharashtra, MP, Orissa, Rajasthan and UP

2017-03-07 Thread Raphael Susewind
Hi Sutirtha,

happy to hear - and yes, of course: if you or somebody else here takes
some time to merge in Census 2011 IDs into my dataset - perhaps starting
with one state for the time being, such as UP - that would be very
helpful to many people.

I expect there will be quite a bit of fuzzy matching involved, though -
which is one reason why it would be good if somebody were to test
whether the workflow you suggest works out, so that we get an idea of
how much effort would be involved to do this at scale...

Feel free to submit pull requests to github and I'll include it...

Best,
Raphael


On 03/07/2017 12:32 PM, Sutirtha Roy wrote:
> Hi Raphael -- The village names in vernacular is available. Towns I am
> not sure about.
> 
> 1. Goto http://164.100.129.6/netnrega/secc_list.aspx
> 2. Choose local language radio button
> 3. Choose the state>district>tehsil>gram_panchayat
> 4. It will give you the vernaculat<>latin map
> 5. Match the latin with Census 2011 ID from lgdirectory
> 
> 
> Let me know if you need help, your poll booth data compilation has been
> immensely useful to me -- and I would be very happy to your efforts.
> 
> 
> Best,
> SR
> 
> On Tue, Mar 7, 2017 at 5:03 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Dear all,
> 
> I just updated my GitHub repo on elecoral data with the various details
> on the electoral roll frontpages of abovementioned states (Gujarat to be
> added later this week). This is basically (in vernacular script) the
> district, taluk, village, ward, address, name, pincode, etc of each
> booth. The data are in the respective *id tables of my dataset, so for
>     Karnataka it would be in the karid table. Hope this is useful:
> 
> https://github.com/raphael-susewind/india-religion-politics
> <https://github.com/raphael-susewind/india-religion-politics>
> 
> This is based on the 2014 rolls, used during the last general election.
> 
> Feel free to experiment with it and please alert me to any mistakes -
> its a semi-automated process, so mistakes can always happen.
> 
> Incidently, does anybody know of a source that links Census ID codes to
> village names in vernacular script? The lgov directory only maps them to
> latin script. Ultimately, the goal is of course to link electoral and
> census data together at finer levels - something many here have been
> interested in over the years...
> 
> Best,
> Raphael
> 
> --
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet%2bunsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Dr Raphael Susewind | Lecturer in Social Anthropology & Development
| Department of International Development
| King's College London, London WC2R 2LS, UK
| https://www.raphael-susewind.de

Please consider PGP for encryption: https://keybase.io/raphaelsusewind


-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Polling Station Locations

2017-03-13 Thread Raphael Susewind
Hi Joseph,

have a look here: (using 2014 booth IDs)

https://dx.doi.org/10.4119/unibi/2674065

Best,
Raphael

On 03/12/2017 09:59 AM, Joseph Sebastian wrote:
> Hi, 
> 
> The election commission has the poling station location data here
> http://psleci.nic.in/
> 
> Has anyone extracted this? 
> 
> I am looking specifically looking for polling station data for Kerala. 
> 
> Regards,
> 
> Joseph Sebastian 
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Re: State Election data

2017-03-14 Thread Raphael Susewind
Hi Srinivas & Shantanu,

for 2007, 2009 (GE), 2012 and 2014 (GE), I already have the Form 20 for
UP scraped here:

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2007

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2009

https://github.com/raphael-susewind/india-religion-politics/tree/master/upvidhansabha2012

https://github.com/raphael-susewind/india-religion-politics/tree/master/uploksabha2014

The form 20 for 2017 will be added as soon as it is available. Be
careful as the booth ID codes change between elections (and 2007 has
been pre-delimitation anyway).

Cheers,
Raphael

On 03/14/2017 08:40 AM, shantanu choudhary wrote:
> 
> 
> On Tue, Mar 14, 2017 at 1:51 PM, Bhanu Kamapantula  <mailto:talk2k...@gmail.com>> wrote:
> 
> Hi Srinivas,
> 
> Electors per constituency data has to be scraped out of the state
> PDFs at ECI website. Not sure if anyone's done it already.
> 
> 
> For UP, there  is Polling Booth wise results available
> here: http://ceouttarpradesh.nic.in/Form20.aspx It has data for 2012
> assembly elections and it gives excel sheet for each booth, and
> processing it might be easier than parsing PDF. I am not sure if same is
> available for other states(I didn't find any such listing for Punjab
> atleast). If you are using Linux I was able to get all excel sheets from
> 403 booths using this shell script/command:
> for i in {1..403}; do wget
> http://ceouttarpradesh.nic.in/Form20_12/$i.xls; done
> 
> -- 
> Regards
> Shantanu
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Need some Guidence on Parsing Electoral Roles.

2017-08-19 Thread Raphael Susewind
Hi Devdatta,

I had run into the same issue, and indeed the only workaround is OCR.
Its not just a different encoding than unicode - its actually garbled
CMaps, which is much worse (ie not recoverable).

See my comments here for starters (and the badly written scripts):

https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014

As for Soundex, you might want to take a look at the IndicSoundex
collection, which is more accurate than transliteration into latin
followed by English soundex:

http://libindic.org/Soundex

Good news is that I have done the whole exercise for Maharashtra 2014,
and may be able to share depending on what your project is about.
Perhaps send me a PM and we can discuss further,

Best,
Raphael

On 08/19/2017 06:14 PM, Devdatta Tengshe wrote:
> I'm attempting to read Names, Ages & Genders from Electoral Rolls, so
> that I can create a database of Names, to figure out the General Spread
> of Specific Names across locations, and ages.
> 
> I began working with Mumbai's rolls, and am running into the following
> issues:
> 
> 1) The Electoral Rolls are not in English, but in Devanagari. This is
> not a Major issue, because I could transliterate it into English for
> Comparison (I need the names to be in English, so that I can use Soundex
> to remove misspellings etc). I know libraries for transliteratation that
> work with Devanagari (Hindi & Marathi). Is there anything similar for
> other scripts such as Kannada & Tamil etc?
> 
> 2)While the Rolls are in Devanagari, the text is not actually in
> Unicode. It is in some other font, and hence when I Get the text out,
> it's garbage. Since Others have worked with the rolls before, is there a
> better way to get the Text Out?
> 
> 3)If it's not possible to get the Text out, Can we use OCR? What OCR
> library is best at working with Indic Scripts?
> 
> If anyone has some experience to share on these issues, it will be much
> appreciated.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Need some Guidence on Parsing Electoral Roles.

2017-08-21 Thread Raphael Susewind
Hi Nikhil and Devdatta,

very useful references.

Just to jump in on table conversion: there is a python script called
pdf-table-convert that is quite capable of detecting tables in PDFs.
They use a graphical approach rather than a logical one, so it doesn't
matter how bad the PDF is - even scans work in principle.

Importantly, with the right options, the script gives you boundary box
coordinates for each cell, which you can feed into ghostscript (or
whatever you like) to extract an image of just that cell prior to OCRing
- which indeed saves a lot of time.

The whole processing chain is referenced in my GitHub scripts (see
below), namely in most versions of pdf2list.pl, where pdf-table-convert
is called towards the bottom and the output then fed to tesseract...

Best,
Raphael

On 08/20/2017 11:00 AM, Nikhil VJ wrote:
> Hi Devdatta,
> 
> I had come across the legacy Devnagri fonts issue earlier when I started
> working on budget data. The fonts are Shree-Dev, Kruti-Dev, Shivaji, etc
> : legacy fonts used in an era when unicode devnagri wasn't invented, and
> to get around, there was simple substitution like a = क etc. I've put up
> a graphic that shows this mapping for a few fonts
> : http://i.imgur.com/ICUC6Wk.png
> 
> I found a group named technical-hindi who have been working on simple
> javascript pages that convert these fonts to unicode devnagri (and
> back!). I used them, and with the content I had, I had to introduce some
> extra conversions, and it worked like a charm.
> 
> Their site where many converters are shared :
> https://sites.google.com/site/technicalhindi/home/converters
> Their google group: https://groups.google.com/forum/#!forum/technical-hindi
> 
> I've shared the modified converters I used here:
> http://ourpuneourbudget.in/tools/
> (only had those limited use cases)
> 
> In the process of studying these, I came upon an unexpected situation :
> If the document you are extracting data from is a PDF (which I also
> refer to as "digital graveyard"), then it is PREFERABLE if the fonts are
> in legacy Devnagri font rather than Unicode font! 
> 
> That's because as of today (or 2015 when I came across it), PDF
> technology doesn't handle unicode Devnagri well. Some distortions are
> done to make the glyphs "print" properly, which permanently distorts the
> original chars. The issue is described here:
> https://stackoverflow.com/questions/30756193/unable-to-copy-exact-hindi-content-from-pdf
> 
> ..So if the text in the PDFs you're working on is in legacy Devnagri
> instead of Unicode Devnagri, then you're actually lucky :P . 
> 
> If it's in unicode then that PDF is a true digital graveyard :P. OCR can
> work, yes, but please tell me if you find a way to OCR a page table cell
> by table cell separately instead of jumbling up everything. I had also
> come across a project like yours an year ago but I backed out because I
> could not get around this issue.. the fonts in the PDF were in Unicode.
> 
> Here's an issue I filed in the Tabula project related to this, and they
> fixed it for the legacy fonts extraction at least.
> https://github.com/tabulapdf/tabula/issues/303
> 
> 
> 
> --
> Cheers,
> Nikhil VJ
> +91-966-583-1250
> Pune / Mandangad, India
> DataMeet Pune chapter <https://datameet-pune.github.io/>
> Self-designed learner at Swaraj University <http://www.swarajuniversity.org>
> Blog <http://nikhilsheth.blogspot.in>
> 
> On Sat, Aug 19, 2017 at 11:21 PM, Raphael Susewind
> mailto:li...@raphael-susewind.de>> wrote:
> 
> Hi Devdatta,
> 
> I had run into the same issue, and indeed the only workaround is OCR.
> Its not just a different encoding than unicode - its actually garbled
> CMaps, which is much worse (ie not recoverable).
> 
> See my comments here for starters (and the badly written scripts):
> 
> 
> https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014
> 
> <https://github.com/raphael-susewind/india-religion-politics/tree/master/maharolls2014>
> 
> As for Soundex, you might want to take a look at the IndicSoundex
> collection, which is more accurate than transliteration into latin
> followed by English soundex:
> 
> http://libindic.org/Soundex
> 
> Good news is that I have done the whole exercise for Maharashtra 2014,
> and may be able to share depending on what your project is about.
> Perhaps send me a PM and we can discuss further,
> 
> Best,
> Raphael
> 
> On 08/19/2017 06:14 PM, Devdatta Tengshe wrote:
> > I'm attempting to read Names, Ages & Genders from Electoral Rolls, so
> > that I can 

Re: [datameet] Visualization

2017-12-08 Thread Raphael Susewind
Dear Thej,

there is not much of a howto - I took my polling booth locality
shapefiles (https://pub.uni-bielefeld.de/data/2674065) as well as the
MODIS data that is available in convenient format from Naturalearth
(https://pub.uni-bielefeld.de/data/2674065), put both into QGIS, and
used their 'Join attributes by location' function. That's that. A
somewhat more involved processing chain underpins the polling booth
shapefile as such - all described in the link above...

Srini could probably chip in with more detail on how he ran the analysis
per se?

Hope that helps,
Raphael

On 12/08/2017 07:30 AM, Thejesh GN wrote:
> 
> 
> http://www.thehindu.com/elections/gujarat-2017/voting-trends-show-a-clear-rural-urban-divide-for-cong-bjp-in-gujarat/article21285328.ece
> 
> 
> 
> Interesting considering both Susewind and Ramani are part of DataMeet
> community. It would be great to have an how-to. Either audio or text. 
> 
> 
> Quote from from the article:
> 
> Social anthropologist Raphael Susewind’s work on Gujarat was used to
> arrive at this. Dr. Susewind merges NASA’s urban-rural classifications
> (MODIS data) based on satellite information and the Election 
> Commission’s 
> polling
> booth data to identify if a booth is located in a rural or an urban
> setting. MODIS data classifies urban areas into highly urban,
> semi-urban, etc. in a scale of 1 to 9 (the lower number corresponds to
> higher urbanity). Sixty five per cent of the electorate voted in booths
> in rural areas while the rest in various urban classifications.
> 
> Thej
> --
> Thejesh GN *⏚* ತೇಜೇಶ್ ಜಿ.ಎನ್
> http://thejeshgn.com
> GPG ID :  0xBFFC8DD3C06DD6B0
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Any project on Name Commonality

2018-03-06 Thread Raphael Susewind
Dear Pradeep,

it is possible in principle, though with complications (including
ethical complications). Have a look at my github for starters on how to
extract names from the electoral rolls:

https://github.com/raphael-susewind

What is definitely possible is something like this:

https://www.raphael-susewind.de/blog/2012/noor-mohd-ali

Best,
Raphael

On 03/05/2018 05:29 PM, Pradeep Bhatt wrote:
> Hi All,
> 
> Is there any work done on name commonality in India, something like this
> site
> 
> http://howmanyofme.com/
> 
> Finding how many "Yuvraj Singh" or "Priyanka Chopra" are there in India.
> 
> Guys, who have scraped Voter ID data. Do you think its possible ?
> 
> Regards,
> Pradeep
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [datameet] Parliamentary constituency boundaries 2019

2019-03-26 Thread Raphael Susewind
Dear all,

just a warning that my polling booth locality data is quite useless for
2019 in most states - booth IDs will have changed... For UP I might get
around to update them but not currently for other states. So don't use
this data to map the ongoing elections!

You have been warned ;-)
Raphael

On 3/25/19 2:12 PM, Arun Ganesh wrote:
> Hey all, i've made a map of the electoral boundaries and polling booths
> using available data: Electoral Map of India
> <https://api.mapbox.com/styles/v1/planemad/cjoescdh20cl62spey0zj3v19.html?fresh=true&title=true&access_token=pk.eyJ1IjoicGxhbmVtYWQiLCJhIjoiemdYSVVLRSJ9.g3lbg_eN0kztmsfIPxa9MQ#4.56/22.34/75.08>
> 
> Data sources:
> - Assembly constituency boundaries: Datameet
> https://github.com/datameet/maps/tree/master/assembly-constituencies
> - Polling booths: Raphael Susewind
> (2014) https://pub.uni-bielefeld.de/data/2674065
> 
> Have used the assembly boundaries as they had a higher definition the
> the parliamentary boundaries. When you zoom in the PC name should be
> visible.
> 
> Is there a more recent dataset of the polling booths available anywhere?
> Found scraped data from the ECI site  at
> https://github.com/aaronrudkin/IndianPollingStations from 5 months ago,
> but looks like the latlong values were not scraped and has been geocoded
> to the town centre which are not very useful. The scraper also does not
> seem to work anymore.
> 
> On Tue, Mar 19, 2019 at 7:43 AM Avinash Celestine
> mailto:avinash.celest...@gmail.com>> wrote:
> 
> Constituency boundaries were last delimited in 2008, and have not
> changed since.
> 
> On Sat, Mar 16, 2019 at 12:06 AM Arun Ganesh  <mailto:arungra...@gmail.com>> wrote:
> 
> With the upcoming elections, this would be a hot dataset that
> everyone will be looking for. The best available dataset on the
> web right now is on the datameet repository
> 
> <https://github.com/datameet/maps/tree/master/parliamentary-constituencies>
> updated during the previous elections in 2014.
> 
> Does anyone know if there have been changes in the constituency
> boundaries since 2014? Also the existing boundaries are fairly
> generalized resulting in an accuracy of around a km.
> 
> See this comparison for Bengaluru: 1) PC shapes from datameet 2)
> AC shapes from datameet 3) PC shapes from Karnataka KSRAC
> new1.gif
> 
> The KSRAC boundaries was queried from their geoserver
> 
> <https://stg1.ksrsac.in/maps/rest/services/Polling/Polling_PC/MapServer/0/query?where=OBJECTID+is+not+null&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=4326&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&f=geojson>
>  and
> are super accurate upto the street level, but is limited to only
> Karnataka. Does anyone know how we can source this for the
> entire country?
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India.
> Know more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the
> Google Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know
> more about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to datameet+unsubscr...@googlegroups.com
> <mailto:datameet+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google
> Groups "datameet" group.
> To unsubscribe from this group and stop receiv