Re: [GENERAL] \COPY ... CSV with hex escapes

2010-07-12 Thread Craig Ringer
On 08/07/10 17:42, Alban Hertroys wrote:
 On 8 Jul 2010, at 4:21, Craig Ringer wrote:
 
 Yes, that's ancient. It is handled quite happily by \copy in csv mode,
 except that when csv mode is active, \xnn escapes do not seem to be
 processed. So I can have *either* \xnn escape processing *or* csv-style
 input processing.

 Anyone know of a way to get escape processing in csv mode?
 
 
 And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV 
 doesn't contain any information about what encoding was used to create it, so 
 it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever 
 Sybase was using.
 
 I'm just saying, be careful what you're parsing there ;)

Thanks for that. In this case, the escapes are just bytes - what's
important is that, after unescaping, the CSV data is interpreted as
latin-1. OK, Windows-1252, but close enough.

In the end Python's csv module did the trick. I just pulled in the CSV
data, and spat out Postgresql-friendly COPY format so that I didn't need
to use the COPY ... CSV modifier and Pg would interpret the escapes
during input.

In case anyone else needs to deal with this format, here's the program I
used.

-- 
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/
#!/usr/bin/env python
import os
import sys
import csv

class DialectSybase(csv.Dialect):
delimiter = ','
doublequote = True
escapechar = None
quotechar = '\''
quoting = csv.QUOTE_MINIMAL
lineterminator = '\n'

class DialectPgCOPY(csv.Dialect):
delimiter = '\t'
doublequote = False
escapechar = None
quotechar = None
quoting = csv.QUOTE_NONE
lineterminator = '\n'

#class DialectPgCOPY(csv.Dialect):
#delimiter = '\t'
#doublequote = True
#escapechar = '\\'
#quotechar = '\''
#quoting = csv.QUOTE_NONE
#lineterminator = '\n'

def unescape_item(item):
''' noop so far '''
#if item.find(\\X) = 0:
#print repr(item)
#return item
return item.replace(\\X,\\x)

def unescape_row(row):
newrow = []
for item in row:
newitem = item
if type(item) == str:
newitem = unescape_item(item)
newrow.append(newitem)
return newrow

def main(infn, outfn):
infile = open(infn,'r')
outfile = open(outfn,'w')

r = csv.reader( infile, dialect=DialectSybase )
w = csv.writer( outfile, dialect=DialectPgCOPY )

for row in r:
w.writerow(unescape_row(row))

if __name__ == '__main__':
print customers
main('customer.txt', 'customer_unescaped.txt')
print class
main('class.txt', 'class_unescaped.txt')
print orders
main('orders.txt', 'orders_unescaped.txt')
print items
main('items.txt', 'items_unescaped.txt')

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] \COPY ... CSV with hex escapes

2010-07-08 Thread Alban Hertroys
On 8 Jul 2010, at 4:21, Craig Ringer wrote:

 Yes, that's ancient. It is handled quite happily by \copy in csv mode,
 except that when csv mode is active, \xnn escapes do not seem to be
 processed. So I can have *either* \xnn escape processing *or* csv-style
 input processing.
 
 Anyone know of a way to get escape processing in csv mode?


And what do those hex-escaped bytes mean? Are they in text strings? AFAIK CSV 
doesn't contain any information about what encoding was used to create it, so 
it could be about anything; UTF-8, Win1252, ISO-8859-something, or whatever 
Sybase was using.

I'm just saying, be careful what you're parsing there ;)

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4c359d9f286212106016419!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] \COPY ... CSV with hex escapes

2010-07-07 Thread Craig Ringer
Hi folks

I have an odd csv input format to deal with. I'm about to put some
Python together to reprocess it, but I thought I'd check in and see if
I'm missing something obvious in \copy's capabilities.

The input is fairly conventional comma-delimeted text with quoted
fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter.
Yes, that's ancient. It is handled quite happily by \copy in csv mode,
except that when csv mode is active, \xnn escapes do not seem to be
processed. So I can have *either* \xnn escape processing *or* csv-style
input processing.

Anyone know of a way to get escape processing in csv mode?

--
Craig Ringer

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] \COPY ... CSV with hex escapes

2010-07-07 Thread Peter Hunsberger
On Wed, Jul 7, 2010 at 9:21 PM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 Hi folks

 I have an odd csv input format to deal with. I'm about to put some
 Python together to reprocess it, but I thought I'd check in and see if
 I'm missing something obvious in \copy's capabilities.

 The input is fairly conventional comma-delimeted text with quoted
 fields, as output by Sybase SQL Anywhere 5.5's isql 'OUTPUT TO' filter.
 Yes, that's ancient. It is handled quite happily by \copy in csv mode,
 except that when csv mode is active, \xnn escapes do not seem to be
 processed. So I can have *either* \xnn escape processing *or* csv-style
 input processing.

 Anyone know of a way to get escape processing in csv mode?

Don't know if you can do it directly, but this seem like one of those
cases where a ETL tool like that from Pentaho (Kettle / Spoon) might
be in order?  One step to handle the escape chars and one to load the
actual CSV...

-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general