Re: Derby and character set encodings
Ken Frank wrote: the one remaining question, for the folks at derby-user (and adding derby-dev) is the first one: 1. when one creates a new derby database, is the database created with a certain encoding that will be used ? No. The database doesn't have an encoding. Dan.
Re: Derby and character set encodings
the one remaining question, for the folks at derby-user (and adding derby-dev) is the first one: 1. when one creates a new derby database, is the database created with a certain encoding that will be used ? or is there an argument given to create command that can indicate the encoding to be used ? And if so, is that encoding the default encoding of the locale I am in when I run the create database command or is it utf-8 always ? (for example, for one of the Japanese locales of Solaris, the encoding of it is euc-jp) or could it be that of the encoding of the locale the actual dbase server is started in ? (which might be java's view of the users locale/encoding which would be I think the same as the OS locale user is in) that is, user might start the db server in some separate locale from where they start netbeans. Thanks - Ken === David Van Couvering wrote: I think I can actually answer some of these questions :) On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote: Thanks David for sending this. Let me note a few questions: 1. when one creates a new database, is the database created with a certain encoding that will be used ? And if so, is that encoding that of the locale I am in when I run the create database commands or is it utf-8 always ? (for example, for one of the Japanese locales of Solaris, the encoding of it is euc-jp) or could it be that of the encoding of the locale the actual dbase server is started in ? (which might be java's view of the users locale/encoding which would be I think the same as the OS locale user is in) I saw this from derby docs: "To support users in many different languages, Derby's SQL parser understands all Unicode characters and allows any Unicode character or number to be used in an identifier." but I don't know if it means that there is no concept of an encoding for a database itself or not. I think with Oracle for example, there is an argument to create database that lets one specify the encoding of it. This question stumps me, I'll leave it to others... 2. The locale the user is in when starting derby server - what things are affected by that - ie encoding of dbase, messages to user (if translated), time, date, etc ? (vs user needing to set separate variables or properties) I don't know what "encoding of the dbase" means, but the other display stuff: exception messages, time and date and money formats, etc., are all controlled by locale. 3. I think its allowed for identifiers like database names, table and column names, to have non ascii in them, if proper quoting is used when referring to them ? Yes, that's right. Thanks - Ken David Van Couvering wrote: Hi, all. I am getting some questions from Ken Frank NetBeans internationalization quality team about Java DB and character set encodings. Rather than try and play go-between, I'm including him here so he can directly ask any follow-on questions. Ken would like to understand how Derby makes use of character encodings, and how it is affected by various settings. How does Derby handle things if the encoding is set to something different from our default of UTF-8? Are we impacted, or do we rely on Java routines such as the Collator and Comparator class to handle this? Sorry if I'm talking out my ear, i18n is not one of my fortes. Thanks, David -- if your reply to this mail bounces, and reply was sent to kenf@, then please reply to [EMAIL PROTECTED] instead ===
Re: Derby and character set encodings
Thanks David for sending this. Let me note a few questions: 1. when one creates a new database, is the database created with a certain encoding that will be used ? And if so, is that encoding that of the locale I am in when I run the create database commands or is it utf-8 always ? (for example, for one of the Japanese locales of Solaris, the encoding of it is euc-jp) or could it be that of the encoding of the locale the actual dbase server is started in ? (which might be java's view of the users locale/encoding which would be I think the same as the OS locale user is in) I saw this from derby docs: "To support users in many different languages, Derby's SQL parser understands all Unicode characters and allows any Unicode character or number to be used in an identifier." but I don't know if it means that there is no concept of an encoding for a database itself or not. I think with Oracle for example, there is an argument to create database that lets one specify the encoding of it. 2. The locale the user is in when starting derby server - what things are affected by that - ie encoding of dbase, messages to user (if translated), time, date, etc ? (vs user needing to set separate variables or properties) 3. I think its allowed for identifiers like database names, table and column names, to have non ascii in them, if proper quoting is used when referring to them ? Thanks - Ken David Van Couvering wrote: Hi, all. I am getting some questions from Ken Frank NetBeans internationalization quality team about Java DB and character set encodings. Rather than try and play go-between, I'm including him here so he can directly ask any follow-on questions. Ken would like to understand how Derby makes use of character encodings, and how it is affected by various settings. How does Derby handle things if the encoding is set to something different from our default of UTF-8? Are we impacted, or do we rely on Java routines such as the Collator and Comparator class to handle this? Sorry if I'm talking out my ear, i18n is not one of my fortes. Thanks, David
Re: Derby and character set encodings
its the correct Andrey; he works with me on i18n; but also thanks for sending to Andrei also. Ken David Van Couvering wrote: I think this was actually meant to go to a different Andrei (sorry Andrey) On 9/6/07, David Van Couvering <[EMAIL PROTECTED]> wrote: I think I can actually answer some of these questions :) On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote: Thanks David for sending this. Let me note a few questions: 1. when one creates a new database, is the database created with a certain encoding that will be used ? And if so, is that encoding that of the locale I am in when I run the create database commands or is it utf-8 always ? (for example, for one of the Japanese locales of Solaris, the encoding of it is euc-jp) or could it be that of the encoding of the locale the actual dbase server is started in ? (which might be java's view of the users locale/encoding which would be I think the same as the OS locale user is in) I saw this from derby docs: "To support users in many different languages, Derby's SQL parser understands all Unicode characters and allows any Unicode character or number to be used in an identifier." but I don't know if it means that there is no concept of an encoding for a database itself or not. I think with Oracle for example, there is an argument to create database that lets one specify the encoding of it. This question stumps me, I'll leave it to others... 2. The locale the user is in when starting derby server - what things are affected by that - ie encoding of dbase, messages to user (if translated), time, date, etc ? (vs user needing to set separate variables or properties) I don't know what "encoding of the dbase" means, but the other display stuff: exception messages, time and date and money formats, etc., are all controlled by locale. 3. I think its allowed for identifiers like database names, table and column names, to have non ascii in them, if proper quoting is used when referring to them ? Yes, that's right. Thanks - Ken David Van Couvering wrote: Hi, all. I am getting some questions from Ken Frank NetBeans internationalization quality team about Java DB and character set encodings. Rather than try and play go-between, I'm including him here so he can directly ask any follow-on questions. Ken would like to understand how Derby makes use of character encodings, and how it is affected by various settings. How does Derby handle things if the encoding is set to something different from our default of UTF-8? Are we impacted, or do we rely on Java routines such as the Collator and Comparator class to handle this? Sorry if I'm talking out my ear, i18n is not one of my fortes. Thanks, David
Re: Derby and character set encodings
I think this was actually meant to go to a different Andrei (sorry Andrey) On 9/6/07, David Van Couvering <[EMAIL PROTECTED]> wrote: > I think I can actually answer some of these questions :) > > On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote: > > Thanks David for sending this. > > > > Let me note a few questions: > > > > 1. when one creates a new database, > > is the database created with a certain encoding that will be used ? > > > > And if so, is that encoding that of the locale I am in when I run > > the create database commands or is it utf-8 always ? > > (for example, for one of the Japanese locales of Solaris, the encoding of it > > is euc-jp) > > > > or could it be that of the encoding of the locale the actual dbase server > > is started in ? (which might be java's view of the users locale/encoding > > which would be I think the same as the OS locale user is in) > > > > I saw this from derby docs: > > "To support users in many different languages, Derby's SQL parser > > understands all Unicode characters and allows any Unicode character or > > number to be used in an identifier." > > > > but I don't know if it means that there is no concept of an encoding > > for a database itself or not. > > > > I think with Oracle for example, there is an argument to create database > > that lets one specify the encoding of it. > > > > This question stumps me, I'll leave it to others... > > > > > > > 2. The locale the user is in when starting derby server - > > what things are affected by that - ie encoding of dbase, messages to > > user (if translated), time, date, etc ? > > (vs user needing to set separate variables or properties) > > > > I don't know what "encoding of the dbase" means, but the other display > stuff: exception messages, time and date and money formats, etc., are > all controlled by locale. > > > 3. I think its allowed for identifiers like database names, > > table and column names, to have non ascii in them, if proper > > quoting is used when referring to them ? > > > > Yes, that's right. > > > > > Thanks - Ken > > > > > > David Van Couvering wrote: > > > > >Hi, all. I am getting some questions from Ken Frank NetBeans > > >internationalization quality team about Java DB and character set > > >encodings. Rather than try and play go-between, I'm including him > > >here so he can directly ask any follow-on questions. > > > > > >Ken would like to understand how Derby makes use of character > > >encodings, and how it is affected by various settings. How does > > >Derby handle things if the encoding is set to something different from > > >our default of UTF-8? Are we impacted, or do we rely on Java routines > > >such as the Collator and Comparator class to handle this? > > > > > >Sorry if I'm talking out my ear, i18n is not one of my fortes. > > > > > >Thanks, > > > > > >David > > > > > > > > >
Re: Derby and character set encodings
I think I can actually answer some of these questions :) On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote: > Thanks David for sending this. > > Let me note a few questions: > > 1. when one creates a new database, > is the database created with a certain encoding that will be used ? > > And if so, is that encoding that of the locale I am in when I run > the create database commands or is it utf-8 always ? > (for example, for one of the Japanese locales of Solaris, the encoding of it > is euc-jp) > > or could it be that of the encoding of the locale the actual dbase server > is started in ? (which might be java's view of the users locale/encoding > which would be I think the same as the OS locale user is in) > > I saw this from derby docs: > "To support users in many different languages, Derby's SQL parser > understands all Unicode characters and allows any Unicode character or > number to be used in an identifier." > > but I don't know if it means that there is no concept of an encoding > for a database itself or not. > > I think with Oracle for example, there is an argument to create database > that lets one specify the encoding of it. > This question stumps me, I'll leave it to others... > > > 2. The locale the user is in when starting derby server - > what things are affected by that - ie encoding of dbase, messages to > user (if translated), time, date, etc ? > (vs user needing to set separate variables or properties) > I don't know what "encoding of the dbase" means, but the other display stuff: exception messages, time and date and money formats, etc., are all controlled by locale. > 3. I think its allowed for identifiers like database names, > table and column names, to have non ascii in them, if proper > quoting is used when referring to them ? > Yes, that's right. > > Thanks - Ken > > > David Van Couvering wrote: > > >Hi, all. I am getting some questions from Ken Frank NetBeans > >internationalization quality team about Java DB and character set > >encodings. Rather than try and play go-between, I'm including him > >here so he can directly ask any follow-on questions. > > > >Ken would like to understand how Derby makes use of character > >encodings, and how it is affected by various settings. How does > >Derby handle things if the encoding is set to something different from > >our default of UTF-8? Are we impacted, or do we rely on Java routines > >such as the Collator and Comparator class to handle this? > > > >Sorry if I'm talking out my ear, i18n is not one of my fortes. > > > >Thanks, > > > >David > > > > >
Re: Derby and character set encodings
This is mixing a lot of things up. I also may use the wrong terminology here. Character set encodings really only come into play with tools like ij, and import getting the string from the environment into derby. The more standard interaction is using jdbc to load a java string into derby. At that level we don't do anything with encodings. We happen to use a modified utf8 to store stuff to disk, and this is not configurable. But no user interface should depend on this encoding, and Derby could change this storage in the future. Logically all strings at runtime are converted to standard java char. Before 10.3 we always used standard java string compare which did a numerical comparison of the unicode value of chars to arrive at ordering. That is still the default. In 10.3 an option was added to set the territory based collation when the database is created such that comparison is dependent on the territory of the database. For this standard java rule based Collator interfaces are used. This is documented in the latest derby release. David Van Couvering wrote: Hi, all. I am getting some questions from Ken Frank NetBeans internationalization quality team about Java DB and character set encodings. Rather than try and play go-between, I'm including him here so he can directly ask any follow-on questions. Ken would like to understand how Derby makes use of character encodings, and how it is affected by various settings. How does Derby handle things if the encoding is set to something different from our default of UTF-8? Are we impacted, or do we rely on Java routines such as the Collator and Comparator class to handle this? Sorry if I'm talking out my ear, i18n is not one of my fortes. Thanks, David