On 08/07/10 17:42, Alban Hertroys wrote:
> On 8 Jul 2010, at 4:21, Craig Ringer wrote:
>
>> Yes, that's ancient. It is handled quite happily by \copy in csv mode,
>> except that when csv mode is active, \xnn escapes do not seem to be
>> processed. So I can have *either* \xnn escape processing *or* csv-style
>> input processing.
>>
>> Anyone know of a way to get escape processing in csv mode?
>
>
> And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV
> doesn't contain any information about what encoding was used to create it, so
> it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever
> Sybase was using.
>
> I'm just saying, be careful what you're parsing there ;)
Thanks for that. In this case, the escapes are just "bytes" - what's
important is that, after unescaping, the CSV data is interpreted as
latin-1. OK, Windows-1252, but close enough.
In the end Python's csv module did the trick. I just pulled in the CSV
data, and spat out Postgresql-friendly COPY format so that I didn't need
to use the COPY ... CSV modifier and Pg would interpret the escapes
during input.
In case anyone else needs to deal with this format, here's the program I
used.
--
Craig Ringer
Tech-related writing: http://soapyfrogs.blogspot.com/
#!/usr/bin/env python
import os
import sys
import csv
class DialectSybase(csv.Dialect):
delimiter = ','
doublequote = True
escapechar = None
quotechar = '\''
quoting = csv.QUOTE_MINIMAL
lineterminator = '\n'
class DialectPgCOPY(csv.Dialect):
delimiter = '\t'
doublequote = False
escapechar = None
quotechar = None
quoting = csv.QUOTE_NONE
lineterminator = '\n'
#class DialectPgCOPY(csv.Dialect):
# delimiter = '\t'
# doublequote = True
# escapechar = '\\'
# quotechar = '\''
# quoting = csv.QUOTE_NONE
# lineterminator = '\n'
def unescape_item(item):
''' noop so far '''
#if item.find("\\X") >= 0:
#print repr(item)
#return item
return item.replace("\\X","\\x")
def unescape_row(row):
newrow = []
for item in row:
newitem = item
if type(item) == str:
newitem = unescape_item(item)
newrow.append(newitem)
return newrow
def main(infn, outfn):
infile = open(infn,'r')
outfile = open(outfn,'w')
r = csv.reader( infile, dialect=DialectSybase )
w = csv.writer( outfile, dialect=DialectPgCOPY )
for row in r:
w.writerow(unescape_row(row))
if __name__ == '__main__':
print "customers"
main('customer.txt', 'customer_unescaped.txt')
print "class"
main('class.txt', 'class_unescaped.txt')
print "orders"
main('orders.txt', 'orders_unescaped.txt')
print "items"
main('items.txt', 'items_unescaped.txt')
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general