On 14 Apr 2014, at 09:50, Dan Burzynski <[email protected]> wrote:
> Hi there. I'm just about to (hopefully) write an import script for MapIt to 
> pull in a load of data from the ONSPD file

A generic import script for importing data linking postcodes with areas that 
don't have a defined boundary would be good, thanks :-) We have generic import 
scripts for Areas and Postcodes themselves, but it is missing one that can use 
the Postcode to Area many-to-many table that exists; only specific scripts like 
the one you mention currently exist.

> (Local Education Authorities being an example of one).

Just to point out here that there is no need to import Local Education 
Authorities if the MapIt installation already imported local councils, because 
you already have the data. If your point lookup returns a London borough, 
Metropolitan borough, Unitary authority, County council, or the Isles of 
Scilly, that is the appropriate LEA also. In Northern Ireland it's slightly 
different in that there are five Education and Library Boards for the 
(currently) 26 councils, but the mapping is just a list of councils to each 
board - you'd be better off doing a simple lookup, I wouldn't think it worth 
importing a whole giant array of data to add that.

(You could potentially use the mapit_import_area_unions script to create the 
Education and Library boards out of the Northern Ireland councils, as an 
alternative, though I've not used that script myself, and again I'm not sure 
it's worth it.)

> Is there any instructions/tutorials/examples on 'how to create a new area 
> type and an import script'?

To create a new area type, you can use the admin interface. (Or you could do it 
as part of an import script of course, but just to note that the admin 
interface lets you create new things directly also.) An area Type is a basic 
model with code and description, so in code you would create one with something 
like Type.objects.get_or_create(code='CODE', description='Description of this 
type')

Writing an import script, no, I'm afraid not, though they don't do anything 
non-standard, just normal Django/python things to read in data and create 
objects in the database from that data. So the Django documentation would be 
what I would point you at first.

> I've had a quick peek at ampi_UK_import_nspd_ni.py but I'm still none the 
> wiser (I'm not a Python programmer which doesn't help) ;-)

It also doesn't help that the script uses a manual CSV file to look up the 
right areas for that specific data :) The existing generic import scripts as I 
mentioned above are mapit_import and mapit_import_postal_codes which are 
documented at http://code.mapit.mysociety.org/import/boundaries/ and 
http://code.mapit.mysociety.org/import/postal-codes/ respectively. 
http://code.mapit.mysociety.org/how-data-is-stored/ mentions the many-to-many 
table at the bottom, though that could do with some expanding.

You may want to look at the mapit_UK_import_nspd_national_parks management 
script which doesn't have the Area specific stuff the NI script does, and could 
be more easily generalised. Something that had a pre_row/post_row (like 
mapit_import_postal_codes does) so it can be subclassed, and has command line 
arguments to look up/possibly create the area, would be the most useful, I 
imagine. So you'd supply a CSV file of postcode/identifiers, options for the 
column numbers, what type the area is, whether it should be created or not - 
the mapit_UK_import_nspd_national_parks script could then hopefully be a small 
subclass of that.

The national parks script is doing the following:
* handle_label loops through the provided file, and for each row:
  * It ignores rows we don't care about
  * It looks up the Postcode
  * It gets the identifier from column 37, looks up its name, gets or creates 
an Area, and then adds a link in the many-to-many table from the postcode to 
that area.

And for completeness, the NI script is doing:
* handle_label() first uses a manual CSV file to map NI area identifiers for 
wards, NI Assembly and UK Parliament constituencies to the existing Areas in 
the database for those areas, then calls process(). (Note that the areas 
already existed in the database.)
* process() opens the provided CSV file, loops through its rows, calling 
pre_row, handle_row, post_row on each one (the default import script does 
nothing in pre_row or post_row, and in handle_row checks to see if the postcode 
already exists and creates/updates as necessary)
* pre_row() ignores rows we don't care about, and gets the ONS and Parliament 
codes for the supplied postcode. It then creates self.areas, the six areas 
relevant to this postcode from those two codes.
* post_row() adds the six areas from pre_row() to the many-many table linking 
Postcodes and Areas.

So the NI script is actually importing the postcodes, hence why it's a subclass 
and that it's doing slightly more.

> It seems that (and I'm kind of guessing here) that there's a function called 
> handle_label that is where you put the file parsing logic.

That script is a Django management command, details of how they're written can 
be found in the Django documentation:
    https://docs.djangoproject.com/en/1.6/howto/custom-management-commands/
In this particular case, that script is a subclass of the postcode import 
management command which is where the main control flow 
(process/pre_row/handle_row/post_row) is being done and then overridden where 
necessary, as explained above.

> What's the best way to run the system in some kind of debug mode so I can at 
> least trial and error it without screwing up my database? ;-)


What would you want a debug mode to do? Some of the existing management scripts 
(not the one you mention, though, I'm afraid) run in a "dry run" state by 
default, not committing to the database unless --commit is provided. That 
script could be altered to do that, or any new script you write. Or you could 
use a copy of your database for testing purposes.

Hope that's helpful.

ATB,
Matthew
_______________________________________________
developers-public mailing list
[email protected]
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Unsubscribe: 
https://secure.mysociety.org/admin/lists/mailman/options/developers-public/archive%40mail-archive.com

Reply via email to