Parsing CSV files is one of the nastiest computing problems around. Very 
frequently, CSV files will have unparsable lines. Ogr2ogr is not going to solve 
this problem for you - it will only  consume Correctly Formatted CSV files.

Before you can even get around to handling the problem of unparsable lines, 
oftentimes a character set conversion is required. There are, unfortunately, 
way too many folks who publish CSV files, shapefiles, and SQL dumps which 
contain UTF-8 multibyte encoding sequences saved in ISO-8859-1 file encoding. I 
need to run iconv on roughly 90% of the shapefiles I load with shp2pgsql - 
generally any shapefile (or CSV or SQL dump) which was produced by a North 
American or Western European person which contains international data. This 
class of folk seem to believe that since ISO 8859-1 or ISO 8859-15 works for 
their own character set, it works for the entire world. In 2013, there is 
absolutely no reason for anyone on this planet to be encoding in something 
other than UTF-8 - disk space and bandwidth is cheap enough now and in the 
areas of the world where it's not yet cheap enough, UTF-8 is the only choice 
anyway.

What causes unparsable lines in CSV? Quotes where there aren't supposed to be, 
missing quotes,missing fields, ambiguously utilised and unescaped delimiter 
characters, etc. Manual correction is difficult when you are handling, for 
example, an 80 thousand line file.

Here is a tool I wrote to fix CSV files from one particularly nasty source. It 
changes a file delimited by commas into a file delimited by tabs, as well as 
correcting a whole host of other common problems. I have found that it works 
quite well, in general, for multiple sources of nastily encoded CSV files.


Use it like this:

fix-csv.pl nasty.csv > fixed.csv



#!/usr/bin/perl -w

while (<>)
{
  # 1. remove ^M
  $_ =~ s/\r//g;

  # 2. change commas at beginning of line to tabs
  $_ =~ s/^,/\t/;

  # 3. change "," to "\t" ("tab")
  $_ =~ s/","/"\t"/g;

  # 4. change ", to \t (tab)
  $_ =~ s/",/\t/g;

  # 5. change ," to \t (tab)
  $_ =~ s/,"/\t/g;

  # 6. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 7. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 8. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 9. change \t, to \t\t (double tab)
  $_ =~ s/\t,/\t\t/g;

  # 10. remove quotes
  $_ =~ s/"//g;

  print $_;
}


-mike







On Apr 13, 2013, at 10:41 PM, Nathan Hemenway <[email protected]> wrote:

> As Richard Greenwood noted, ogr2ogr works great for importing CSV files into 
> Postgres tables.
> In fact, your CSV file does not necessarily even need to have any geometry 
> related columns for this to work.
> 
> It is all documented here very nicely:
> 
> http://www.gdal.org/ogr/drv_csv.html
> 
> 
> 
> On 4/13/2013 5:54 AM, Margie Roswell wrote:
>> I figured out that COPY is used to import a file into a table.
>> 
>> (Actually, even though I don't speak a word of Portuguese, a Portuguese 
>> video did a great job of showing copying first into a temp table: 
>> https://www.youtube.com/watch?v=CwsnPPub9v4 )
>> 
>> But the shp2pgsql thread yesterday got me thinking: to import a shapefile, 
>> they've created a utility so that we don't have to set up the structure of 
>> the table in advance
>> 
>> Is there something similar on the CSV side?
>> 
>> My guess is that http://www.safe.com/solutions/for-databases/postgis/
>> might have something, but I can't quite put my finger on it.
>> 
>> Details on that? 
>> 
>> Also, I'm sure there's a fee for that. Are there any other strategies for 
>> making the table creation more efficient, when importing a file to a table?
>> 
>> I suppose I could copy and paste the field names from the top row in the 
>> original Excel spreadsheet, and then manually reformat them into a CREATE 
>> NEW TABLE statement by adding all the field types. What strategies (like the 
>> shp2pgsql utility?) reduce the pain of importing a text file?
>> 
>> Margie
>> 
>> --
>> http://FarmBillPrimer.org
>> http://www.BaltimoreUrbanAg.org (Please send events; This site is hungry.)
>> http://www.ExcellentNutrition.org
>> http://www.packtpub.com/drupal-5-views-recipes/book
>> 
>> 
>> On Fri, Apr 12, 2013 at 6:14 PM, David Rush <[email protected]> wrote:
>> Total noob to PostgreSQL and PostGIS here.  Trying to follow examples from 
>> the Obe+Hsu book (1st Ed) in using shp2pgsql from the command line to import 
>> some tiger county data.
>> 
>> I ran this:
>> 
>> shp2pgsql -s 4269 -g geom_4269 -W LATIN1 
>> c:/users/david/downloads/tl_2012_us_county/tl_2012_us_county.shp 
>> public.us_counties psql -h localhost -U postgres -p 5432 -d mygisdb 
>> 
>> Thanks to an archive of this list that led me to add the "-W LATIN1" param 
>> (it was failing with an error w/out it).
>> 
>> Now the command runs for several minutes, spitting out mostly zillions of 
>> hex digits, with no overt errors.  Last line it spits out is "COMMIT;".
>> 
>> But when I go into psql, I can't find the public.us_counties table that I 
>> thought I just added created:
>> 
>> mygisdb=# select * from public.us_counties;
>> ERROR:  relation "public.us_counties" does not exist
>> LINE 1: select * from public.us_counties;
>>                       ^
>> mygisdb=# select table_schema, table_name,table_type from 
>> information_schema.tables where
>> table_schema not in ('pg_catalog','information_schema');
>>  table_schema |    table_name     | table_type
>> --------------+-------------------+------------
>>  public       | geography_columns | VIEW
>>  public       | geometry_columns  | VIEW
>>  public       | spatial_ref_sys   | BASE TABLE
>>  ch01         | lu_franchises     | BASE TABLE
>>  ch01         | fastfoods         | BASE TABLE
>> (5 rows)
>> 
>> Poking around with pgAdmin III I can't find in anywhere, either.
>> 
>> Is the new table us_counties hiding somewhere?  Or did it quietly fail?  Or 
>> what?
>> 
>> David
>> 
>> _______________________________________________
>> postgis-users mailing list
>> [email protected]
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>> 
>> 
>> 
>> 
>> _______________________________________________
>> postgis-users mailing list
>> [email protected]
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> 
> 
> -- 
> .nathan.
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to