This is very interesting.

Can this be used for commercial purposes? Where I can read about data
policy on this?

Regards,
Pradeep

On Mon, Nov 14, 2016 at 9:21 AM, Nikhil VJ <nikhil...@gmail.com> wrote:

> Hi friends,
>
> I've created some shell scripts to aggregate the data from downloaded
> 7/12 records (html files) into two csv's. Sharing a github link having
> the code and instructions:
> https://github.com/answerquest/mahabhulekh-7-12-aggregating
>
> Still no luck on automated scraping from the site, but this
> aggregating was the next step and has really simplified the process of
> inspecting multiple records at once.
>
> -Nikhil
>
> On 10/27/16, Nikhil VJ <nikhil...@gmail.com> wrote:
> > Hi Ankit,
> >
> > Thanks for the R lead! I checked it out.. am already doing something
> > like it using some quick shell/bash commands and a python script that
> > converts any html table to csv (http://stackoverflow.com/a/16697784).
> > Once we have the data down in HTMLs it's fairly straightforward. This
> > part come after the scraping.
> >
> > The data in this case is not in permanent HTMLs that we can just save
> > in batch. It's being generated at server-side on Mahabhulekh server
> > depending on form inputs in an authenticated user session and then
> > being rendered as html at one constant URL. So what I'm looking for is
> > something that would simulate / automate (with due time intervals
> > between each call of course, we must not overload the server) the
> > calls to the mahabhulekh server, and capture the output it is
> > returning.
> >
> > So far I'm not able to programmatically capture the HTML coming in the
> > popup window it is generating. The POST request returns a generic null
> > response or the site's main webpage in all the wget and curl commands
> > I've tried. Folks who have done some scraping earlier might be able to
> > help.
> >
> > Another track worth exploring might be iMacros or other ways to
> > automate browser sessions. Foiks working in testing departments of
> > ticketing / booking sites etc might know and could help, so please
> > share this with your friends working in such projects!
> >
> > I've read at some places R can be used to simulate this.. so yes it'll
> > be worth to keep exploring but I know shell scripting more so hoping
> > something comes there.
> >
> > --
> > --
> > Cheers,
> > Nikhil
> > +91-966-583-1250
> > Pune, India
> > Self-designed learner at Swaraj University
> > <http://www.swarajuniversity.org>
> > Blog <http://nikhilsheth.blogspot.in> | Contribute
> > <https://www.payumoney.com/webfronts/#/index/NikhilVJ>
> >
> >
> >
> > On 10/25/16, Ankit Gaur <gauran...@gmail.com> wrote:
> >> Though I am not very well conversant with Data Sciences and web
> scraping,
> >> we had a recent DataKind meetup
> >> https://www.meetup.com/DataKind-Bangalore/events/234855978/ in
> Bangalore,
> >> where Bargava talked about using R's rvest library
> >> <https://blog.rstudio.org/2014/11/24/rvest-easy-web-scraping-with-r/>.
> We
> >> were able to do some basic scraping on goodreads with this. See if this
> >> fits your needs.
> >>
> >> Thanks,
> >> Ankit
> >>
> >> On Mon, Oct 24, 2016 at 10:09 PM, Nikhil VJ <nikhil...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm looking at Maharashtra's land records portal :
> >>> https://mahabhulekh.maharashtra.gov.in
> >>>
> >>> .. and wondering if it's possible to scrape data from here?
> >>>
> >>> Will share a workflow:
> >>> choose 7/12 (७/१२) > select any जिल्हा > तालुका > गाव
> >>> select शोध :  सर्वे नंबर / गट नंबर (first option)
> >>> type 1 in the text box and press the "शोधा" button
> >>> Then we get a dropdown with options like 1/1 , 1/2, 1/3 etc.
> >>>
> >>> On selecting any and clicking "७/१२ पहा",
> >>> a new window/tab opens up (you have to enable popups), having static
> >>> HTML content (some tables). I need to capture this content.
> >>>
> >>> The URL is always the same:
> >>> https://mahabhulekh.maharashtra.gov.in/Konkan/pg712.aspx
> >>> ..but the content changes depending on the options chosen.
> >>>
> >>> On using the browser's "Inspect Element"> Network and clicking the
> >>> final button, there is a request to this URL:
> >>>
> >>> https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712
> >>>
> >>> and the request Params / Payload is like:
> >>>
> >>> {'sno':'1','vid':'273200030398260000','dn':'रत्
> >>> नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','
> >>> did':'32','tid':'3'}
> >>>
> >>> when you change the survey/gat number to 1/10, the params change like
> >>> so:
> >>> {'sno':'1#10','vid':'273200030398260000','dn':'रत्
> >>> नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','
> >>> did':'32','tid':'3'}
> >>>
> >>> for 1/1अ:
> >>> {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्
> >>> नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','
> >>> did':'32','tid':'3'}
> >>>
> >>> I tried some wget and curl commands but no luck so far. Do let me know
> >>> if you can make some headway.
> >>>
> >>> Also, it would be great to also learn how to extract on the list of
> >>> districts, talukas (subdistricts) in each district, and villages in
> >>> each taluka.
> >>>
> >>> dumping other info at bottom if it helps.
> >>>
> >>> Why do this:
> >>> At present it's just an exploration following on from our work on
> >>> village shapefiles.
> >>> The district > taluka > village mapping data from official Land
> >>> Records data could serve as a good source for triangulation.
> >>> Then, while I don't see myself going deeper into this right now, I am
> >>> aware that land records / ownership has major corruption,
> >>> entanglements and other issues precisely because of the lack of
> >>> transparency. The mahabhulekh website itself is a significant step
> >>> forward in making this sector a little more transparent, and more push
> >>> in this direction would probably do more good IMHO. At some point
> >>> GIS/lat-long info might come in, and it would be good to bring the
> >>> data to a level that is ready for it.
> >>>
> >>>
> >>> Data dump:
> >>> When we press the button to fetch the 7/12 (saatbarah) record, the
> >>> console records a POST with these parameters:
> >>>
> >>> Copy as cURL:
> >>> curl 'https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712'
> >>> -H 'Host: mahabhulekh.maharashtra.gov.in' -H 'User-Agent: Mozilla/5.0
> >>> (X11; Ubuntu; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0' -H
> >>> 'Accept: application/json, text/plain, */*' -H 'Accept-Language:
> >>> en-US,en;q=0.5' --compressed -H 'Content-Type:
> >>> application/json;charset=utf-8' -H 'Referer:
> >>> https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx' -H
> >>> 'Content-Length: 170' -H 'Cookie:
> >>> ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc' -H 'Connection:
> >>> keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'
> >>>
> >>> Copy POST data:
> >>> {'sno':'1#1अ','vid':'273200030398260000','dn':'रत्
> >>> नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','
> >>> did':'32','tid':'3'}
> >>>
> >>> request headers:
> >>> POST /Konkan/Home.aspx/call712 HTTP/1.1
> >>> Host: mahabhulekh.maharashtra.gov.in
> >>> User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:42.0)
> >>> Gecko/20100101 Firefox/42.0
> >>> Accept: application/json, text/plain, */*
> >>> Accept-Language: en-US,en;q=0.5
> >>> Accept-Encoding: gzip, deflate
> >>> Content-Type: application/json;charset=utf-8
> >>> Referer: https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx
> >>> Content-Length: 170
> >>> Cookie: ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc
> >>> Connection: keep-alive
> >>> Pragma: no-cache
> >>> Cache-Control: no-cache
> >>>
> >>> response headers:
> >>> HTTP/1.1 200 OK
> >>> Cache-Control: private, max-age=0
> >>> Content-Type: application/json; charset=utf-8
> >>> Server: Microsoft-IIS/8.0
> >>> X-Powered-By: ASP.NET
> >>> Date: Mon, 24 Oct 2016 15:31:40 GMT
> >>> Content-Length: 10
> >>>
> >>> Copy Response:
> >>> {"d":null}
> >>>
> >>>
> >>> --
> >>> --
> >>> Cheers,
> >>> Nikhil
> >>> +91-966-583-1250
> >>> Pune, India
> >>> Self-designed learner at Swaraj University <
> http://www.swarajuniversity.
> >>> org>
> >>> Blog <http://nikhilsheth.blogspot.in> | Contribute
> >>> <https://www.payumoney.com/webfronts/#/index/NikhilVJ>
> >>>
> >>> --
> >>> Datameet is a community of Data Science enthusiasts in India. Know more
> >>> about us by visiting http://datameet.org
> >>> ---
> >>> You received this message because you are subscribed to the Google
> >>> Groups
> >>> "datameet" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> >>> an
> >>> email to datameet+unsubscr...@googlegroups.com.
> >>> For more options, visit https://groups.google.com/d/optout.
> >>>
> >>
> >> --
> >> Datameet is a community of Data Science enthusiasts in India. Know more
> >> about us by visiting http://datameet.org
> >> ---
> >> You received this message because you are subscribed to the Google
> Groups
> >> "datameet" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to datameet+unsubscr...@googlegroups.com.
> >> For more options, visit https://groups.google.com/d/optout.
> >>
> >
> >
> > --
> > --
> > Cheers,
> > Nikhil
> > +91-966-583-1250
> > Pune, India
> > Self-designed learner at Swaraj University
> > <http://www.swarajuniversity.org>
> > Blog <http://nikhilsheth.blogspot.in> | Contribute
> > <https://www.payumoney.com/webfronts/#/index/NikhilVJ>
> >
>
>
> --
> --
> Cheers,
> Nikhil
> +91-966-583-1250
> Pune, India
> Self-designed learner at Swaraj University <http://www.swarajuniversity.o
> rg>
> Blog <http://nikhilsheth.blogspot.in> | Contribute
> <https://www.payumoney.com/webfronts/#/index/NikhilVJ>
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to