Hi Jean-Paul for some really good advice. I'll take a look at the project to see how this is handled. I was not aware of your wrapper project for SQLite - so this is something new to look at too. I have worked with SQLObject and also Django's db wrappers. In fact this question has come out of an SQLObject implementation in RDFlib since it is here I discovered this issue in the way this backend is behaving with SQLite3 and I have got it working now. I am only starting to warm to the idea of unicode throughout. For example. In the backend code that I am trying to work with you have this. _tokey is a helper to bring things into the relational database, _fromkey is a helper when extracting data from the database. Commenting out the .decode("UTF-8") and value = value.decode("UTF-8") allowed me to get this working but I need to make this work with unicode. My unicode experience is limited and I am confused about writing unicode compatible replacements for things like: return '<%s>' % ''.join(splituri(term.encode("UTF-8")))
def splituri(uri): if uri.startswith('<') and uri.endswith('>'): uri = uri[1:-1] if uri.startswith('_'): uid = ''.join(uri.split('_')) return '_', uid if '#' in uri: ns, local = rsplit(uri, '#', 1) return ns + '#', local if '/' in uri: ns, local = rsplit(uri, '/', 1) return ns + '/', local return NO_URI, uri def _fromkey(key): if key.startswith("<") and key.endswith(">"): key = key[1:-1].decode("UTF-8") ## Fails here when data extracted from database if key.startswith("_"): key = ''.join(splituri(key)) return BNode(key) return URIRef(key) elif key.startswith("_"): return BNode(key) else: m = _literal.match(key) if m: d = m.groupdict() value = d["value"] value = unquote(value) value = value.decode("UTF-8") ## Fails here when data extracted from database lang = d["lang"] or '' datatype = d["datatype"] return Literal(value, lang, datatype) else: msg = "Unknown Key Syntax: '%s'" % key raise Exception(msg) def _tokey(term): if isinstance(term, URIRef): term = term.encode("UTF-8") if not '#' in term and not '/' in term: term = '%s%s' % (NO_URI, term) return '<%s>' % term elif isinstance(term, BNode): return '<%s>' % ''.join(splituri(term.encode("UTF-8"))) elif isinstance(term, Literal): language = term.language datatype = term.datatype value = quote(term.encode("UTF-8")) if language: language = language.encode("UTF-8") if datatype: datatype = datatype.encode("UTF-8") n3 = '"%s"@%s&<%s>' % (value, language, datatype) else: n3 = '"%s"@%s' % (value, language) else: if datatype: datatype = datatype.encode("UTF-8") n3 = '"%s"&<%s>' % (value, datatype) else: n3 = '"%s"' % value return n3 else: msg = "Unknown term Type for: %s" % term raise Exception(msg) In an unrelated question, it appears SQLite is also extremely flexible about what types of data it can contain. When writing SQL in Postgres I use timestamp type and can use this also in SQLite. With my work with Django, the same information is mapped to datetime type. Would you be inclined to recommend the use of one type over the other. If so, can you explain the rationale for this choice. Many thanks. Regards, David On Tuesday, November 8, 2005, at 04:49 PM, Jean-Paul Calderone wrote: > On Tue, 08 Nov 2005 16:27:25 -0400, David Pratt > <[EMAIL PROTECTED]> wrote: >> Recently I have run into an issue with sqlite where I encode strings >> going into sqlite3 as utf-8. I guess by default sqlite3 is converting >> this to unicode since when I try to decode I get an attribute error >> like this: >> >> AttributeError: 'unicode' object has no attribute 'decode' >> >> The code and data I am preparing is to work on postgres as well a >> sqlite so there are a couple of things I could do. I could always >> store any data as unicode to any db, or test the data to determine >> whether it is a string or unicode type when it comes out of the >> database so I can deal with this possibility without errors. I will >> likely take the first option but I looking for a simple test to >> determine my object type. >> >> if I do: >> >>>>> type('maybe string or maybe unicode') >> >> I get this: >> >>>>> <type 'unicode'> >> >> I am looking for something that I can use in a comparison. >> >> How do I get the type as a string for comparison so I can do something >> like >> >> if type(some_data) == 'unicode': >> do some stuff >> else: >> do something else >> > > You don't actually want the type as a string. What you seem to be > leaning towards is the builtin function "isinstance": > > if isinstance(some_data, unicode): > # some stuff > elif isinstance(some_data, str): > # other stuff > ... > > But I think what you actually want is to be slightly more careful > about what you place into SQLite3. If you are storing text data, > insert is as a Python unicode string (with no NUL bytes, unfortunately > - this is a bug in SQLite3, or maybe the Python bindings, I forget > which). If you are storing binary data, insert it as a Python buffer > object (eg, buffer('1234')). When you take text data out of the > database, you will get unicode objects. When you take bytes out, you > will get buffer objects (which you can convert to str objects with > str()). > > You may want to look at Axiom > (<http://divmod.org/trac/wiki/DivmodAxiom>) to see how it handles each > of these cases. In particular, the "text" and "bytes" types defined > in the attributes module > (<http://divmod.org/trac/browser/trunk/Axiom/axiom/attributes.py>). > > By only encoding and decoding at the border between your application > and the outside world, and the border between your application and the > data, you will eliminate the possibility for a class of bugs where > encodings are forgotten, or encoded strings are accidentally combined > with unicode strings. > > Hope this helps, > > Jean-Paul > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list