Re: [GENERAL] \COPY ... CSV with hex escapes
On 08/07/10 17:42, Alban Hertroys wrote: On 8 Jul 2010, at 4:21, Craig Ringer wrote: Yes, that's ancient. It is handled quite happily by \copy in csv mode, except that when csv mode is active, \xnn escapes do not seem to be processed. So I can have *either* \xnn escape processing *or* csv-style input processing. Anyone know of a way to get escape processing in csv mode? And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV doesn't contain any information about what encoding was used to create it, so it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever Sybase was using. I'm just saying, be careful what you're parsing there ;) Thanks for that. In this case, the escapes are just bytes - what's important is that, after unescaping, the CSV data is interpreted as latin-1. OK, Windows-1252, but close enough. In the end Python's csv module did the trick. I just pulled in the CSV data, and spat out Postgresql-friendly COPY format so that I didn't need to use the COPY ... CSV modifier and Pg would interpret the escapes during input. In case anyone else needs to deal with this format, here's the program I used. -- Craig Ringer Tech-related writing: http://soapyfrogs.blogspot.com/ #!/usr/bin/env python import os import sys import csv class DialectSybase(csv.Dialect): delimiter = ',' doublequote = True escapechar = None quotechar = '\'' quoting = csv.QUOTE_MINIMAL lineterminator = '\n' class DialectPgCOPY(csv.Dialect): delimiter = '\t' doublequote = False escapechar = None quotechar = None quoting = csv.QUOTE_NONE lineterminator = '\n' #class DialectPgCOPY(csv.Dialect): #delimiter = '\t' #doublequote = True #escapechar = '\\' #quotechar = '\'' #quoting = csv.QUOTE_NONE #lineterminator = '\n' def unescape_item(item): ''' noop so far ''' #if item.find(\\X) = 0: #print repr(item) #return item return item.replace(\\X,\\x) def unescape_row(row): newrow = [] for item in row: newitem = item if type(item) == str: newitem = unescape_item(item) newrow.append(newitem) return newrow def main(infn, outfn): infile = open(infn,'r') outfile = open(outfn,'w') r = csv.reader( infile, dialect=DialectSybase ) w = csv.writer( outfile, dialect=DialectPgCOPY ) for row in r: w.writerow(unescape_row(row)) if __name__ == '__main__': print customers main('customer.txt', 'customer_unescaped.txt') print class main('class.txt', 'class_unescaped.txt') print orders main('orders.txt', 'orders_unescaped.txt') print items main('items.txt', 'items_unescaped.txt') -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] \COPY ... CSV with hex escapes
On 8 Jul 2010, at 4:21, Craig Ringer wrote: Yes, that's ancient. It is handled quite happily by \copy in csv mode, except that when csv mode is active, \xnn escapes do not seem to be processed. So I can have *either* \xnn escape processing *or* csv-style input processing. Anyone know of a way to get escape processing in csv mode? And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV doesn't contain any information about what encoding was used to create it, so it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever Sybase was using. I'm just saying, be careful what you're parsing there ;) Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling. !DSPAM:737,4c359d9f286212106016419! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] \COPY ... CSV with hex escapes
Hi folks I have an odd csv input format to deal with. I'm about to put some Python together to reprocess it, but I thought I'd check in and see if I'm missing something obvious in \copy's capabilities. The input is fairly conventional comma-delimeted text with quoted fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter. Yes, that's ancient. It is handled quite happily by \copy in csv mode, except that when csv mode is active, \xnn escapes do not seem to be processed. So I can have *either* \xnn escape processing *or* csv-style input processing. Anyone know of a way to get escape processing in csv mode? -- Craig Ringer -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] \COPY ... CSV with hex escapes
On Wed, Jul 7, 2010 at 9:21 PM, Craig Ringer cr...@postnewspapers.com.au wrote: Hi folks I have an odd csv input format to deal with. I'm about to put some Python together to reprocess it, but I thought I'd check in and see if I'm missing something obvious in \copy's capabilities. The input is fairly conventional comma-delimeted text with quoted fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter. Yes, that's ancient. It is handled quite happily by \copy in csv mode, except that when csv mode is active, \xnn escapes do not seem to be processed. So I can have *either* \xnn escape processing *or* csv-style input processing. Anyone know of a way to get escape processing in csv mode? Don't know if you can do it directly, but this seem like one of those cases where a ETL tool like that from Pentaho (Kettle / Spoon) might be in order? One step to handle the escape chars and one to load the actual CSV... -- Peter Hunsberger -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general