Steve,

> I can only think of one where it's common. Windows filenames.

Nearly all weblog data then.

> But if
> you're going to support arbitrary data in a load then whatever escape
> character you choose will appear sometimes.

If we allow an 8-bit character set in the "text" file, then yes, any
delimiter you choose has the potential to appear in your input data.  In
practice, with *mostly* 7-bit ASCII characters and even with international
8-bit text encodings, you can choose a delimiter and newline that work well.
Exceptions are handled by the forthcoming single row error handling patch.

> I strongly suspect that a patch to improve performance without changing
> behaviour would be accepted with no questions asked.

Understood - not sure it's the best thing for support of the users yet.
We've found a large number of issues from customers with the unmodified
behavior.
  
> There are already two loader routines. One of them is text-based and is
> designed for easy generation of data load format using simple text
> manipulation tools by using delimiters. It also allows (unlike your
> suggestion) for loading of arbitrary data from a text file.

Not to distract, but try loading a binary null into a text field.  The
assumption of null terminated strings penetrates deep into the codebase.
The existing system does not allow for loading arbitrary data from a text
file.

Our suggestion allows for escapes, but requires the ability to specify
alternate characters or none.
 
> Because it allows for arbitrary data and uses delimiters to separate
> fields it has to use an escaping mechanism.
> 
> If you want to be able to load arbitrary data and not have to handle
> escape characters there's are two obvious ways to do it.

Let's dispense with the notion that we're suggesting no escapes (see above).

Binary with a bookends format is a fine idea and would be my personal
preference if it were fast, which it isn't.  Customers in the web log
analysis and other data warehousing fields prefer "mostly 7-bit" ascii text
input, which we're trying to support with this change.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to