Re: Derby and character set encodings

2007-09-12 Thread Daniel John Debrunner

Ken Frank wrote:
the one remaining question, for the folks at derby-user (and adding 
derby-dev) is the first one:


1.  when one creates a new derby database,
is the database created with a certain encoding that will be used ?


No. The database doesn't have an encoding.

Dan.


Re: Derby and character set encodings

2007-09-12 Thread Ken Frank
the one remaining question, for the folks at derby-user (and adding 
derby-dev) is the first one:


1.  when one creates a new derby database,
is the database created with a certain encoding that will be used ?

or is there an argument given to create command that can indicate the encoding 
to be used ?

And if so, is that encoding the default encoding of the locale I am in when I 
run
the create database command or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

that is, user might start the db server in some separate locale from where they 
start netbeans.

Thanks - Ken
===




David Van Couvering wrote:

I think I can actually answer some of these questions :)

On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote:
  

Thanks David for sending this.

Let me note a few questions:

1.  when one creates a new database,
is the database created with a certain encoding that will be used ?

And if so, is that encoding that of the locale I am in when I run
the create database commands or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

I saw this from derby docs:
"To support users in many different languages, Derby's SQL parser
understands all Unicode characters and allows any Unicode character or
number to be used in an identifier."

but I don't know if it means that there is no concept of an encoding
for a database itself or not.

I think with Oracle for example, there is an argument to create database
that lets one specify the encoding of it.




This question stumps me, I'll leave it to others...

  

2.  The locale the user is in when starting derby server -
what things are affected by that - ie encoding of dbase, messages to
user (if translated), time, date, etc ?
(vs user needing to set separate variables or properties)




I don't know what "encoding of the dbase" means, but the other display
stuff: exception messages, time and date and money formats, etc., are
all controlled by locale.

  

3.  I think its allowed for identifiers like database names,
table and column names, to have non ascii in them, if proper
quoting is used when referring to them  ?




Yes, that's right.

  

Thanks - Ken


David Van Couvering wrote:



Hi, all.  I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings.  Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.

Ken would like to understand how Derby makes use of character
encodings, and how it is affected by  various settings.  How does
Derby handle things if the encoding is set to something different from
our default of UTF-8?  Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?

Sorry if I'm talking out my ear, i18n is not one of my fortes.

Thanks,

David


  


--

if your reply to this mail bounces,
and reply was sent to kenf@,
then please reply to [EMAIL PROTECTED]  instead
===




Re: Derby and character set encodings

2007-09-06 Thread Ken Frank

Thanks David for sending this.

Let me note a few questions:

1.  when one creates a new database,
is the database created with a certain encoding that will be used ?

And if so, is that encoding that of the locale I am in when I run
the create database commands or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

I saw this from derby docs:
"To support users in many different languages, Derby's SQL parser 
understands all Unicode characters and allows any Unicode character or 
number to be used in an identifier."


but I don't know if it means that there is no concept of an encoding
for a database itself or not.

I think with Oracle for example, there is an argument to create database
that lets one specify the encoding of it.



2.  The locale the user is in when starting derby server -
what things are affected by that - ie encoding of dbase, messages to
user (if translated), time, date, etc ?
(vs user needing to set separate variables or properties)

3.  I think its allowed for identifiers like database names,
table and column names, to have non ascii in them, if proper
quoting is used when referring to them  ?


Thanks - Ken


David Van Couvering wrote:


Hi, all.  I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings.  Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.

Ken would like to understand how Derby makes use of character
encodings, and how it is affected by  various settings.  How does
Derby handle things if the encoding is set to something different from
our default of UTF-8?  Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?

Sorry if I'm talking out my ear, i18n is not one of my fortes.

Thanks,

David
 



Re: Derby and character set encodings

2007-09-06 Thread Ken Frank

its the correct Andrey; he works with me on i18n;
but also thanks for sending to Andrei also.

Ken


David Van Couvering wrote:


I think this was actually meant to go to a different Andrei (sorry Andrey)

On 9/6/07, David Van Couvering <[EMAIL PROTECTED]> wrote:
 


I think I can actually answer some of these questions :)

On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote:
   


Thanks David for sending this.

Let me note a few questions:

1.  when one creates a new database,
is the database created with a certain encoding that will be used ?

And if so, is that encoding that of the locale I am in when I run
the create database commands or is it utf-8 always ?
(for example, for one of the Japanese locales of Solaris, the encoding of it
is euc-jp)

or could it be that of the encoding of the locale the actual dbase server
is started in ?  (which might be java's view of the users locale/encoding
which would be I think the same as the OS locale user is in)

I saw this from derby docs:
"To support users in many different languages, Derby's SQL parser
understands all Unicode characters and allows any Unicode character or
number to be used in an identifier."

but I don't know if it means that there is no concept of an encoding
for a database itself or not.

I think with Oracle for example, there is an argument to create database
that lets one specify the encoding of it.

 


This question stumps me, I'll leave it to others...

   


2.  The locale the user is in when starting derby server -
what things are affected by that - ie encoding of dbase, messages to
user (if translated), time, date, etc ?
(vs user needing to set separate variables or properties)

 


I don't know what "encoding of the dbase" means, but the other display
stuff: exception messages, time and date and money formats, etc., are
all controlled by locale.

   


3.  I think its allowed for identifiers like database names,
table and column names, to have non ascii in them, if proper
quoting is used when referring to them  ?

 


Yes, that's right.

   


Thanks - Ken


David Van Couvering wrote:

 


Hi, all.  I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings.  Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.

Ken would like to understand how Derby makes use of character
encodings, and how it is affected by  various settings.  How does
Derby handle things if the encoding is set to something different from
our default of UTF-8?  Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?

Sorry if I'm talking out my ear, i18n is not one of my fortes.

Thanks,

David


   



Re: Derby and character set encodings

2007-09-06 Thread David Van Couvering
I think this was actually meant to go to a different Andrei (sorry Andrey)

On 9/6/07, David Van Couvering <[EMAIL PROTECTED]> wrote:
> I think I can actually answer some of these questions :)
>
> On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote:
> > Thanks David for sending this.
> >
> > Let me note a few questions:
> >
> > 1.  when one creates a new database,
> > is the database created with a certain encoding that will be used ?
> >
> > And if so, is that encoding that of the locale I am in when I run
> > the create database commands or is it utf-8 always ?
> > (for example, for one of the Japanese locales of Solaris, the encoding of it
> > is euc-jp)
> >
> > or could it be that of the encoding of the locale the actual dbase server
> > is started in ?  (which might be java's view of the users locale/encoding
> > which would be I think the same as the OS locale user is in)
> >
> > I saw this from derby docs:
> > "To support users in many different languages, Derby's SQL parser
> > understands all Unicode characters and allows any Unicode character or
> > number to be used in an identifier."
> >
> > but I don't know if it means that there is no concept of an encoding
> > for a database itself or not.
> >
> > I think with Oracle for example, there is an argument to create database
> > that lets one specify the encoding of it.
> >
>
> This question stumps me, I'll leave it to others...
>
> >
> >
> > 2.  The locale the user is in when starting derby server -
> > what things are affected by that - ie encoding of dbase, messages to
> > user (if translated), time, date, etc ?
> > (vs user needing to set separate variables or properties)
> >
>
> I don't know what "encoding of the dbase" means, but the other display
> stuff: exception messages, time and date and money formats, etc., are
> all controlled by locale.
>
> > 3.  I think its allowed for identifiers like database names,
> > table and column names, to have non ascii in them, if proper
> > quoting is used when referring to them  ?
> >
>
> Yes, that's right.
>
> >
> > Thanks - Ken
> >
> >
> > David Van Couvering wrote:
> >
> > >Hi, all.  I am getting some questions from Ken Frank NetBeans
> > >internationalization quality team about Java DB and character set
> > >encodings.  Rather than try and play go-between, I'm including him
> > >here so he can directly ask any follow-on questions.
> > >
> > >Ken would like to understand how Derby makes use of character
> > >encodings, and how it is affected by  various settings.  How does
> > >Derby handle things if the encoding is set to something different from
> > >our default of UTF-8?  Are we impacted, or do we rely on Java routines
> > >such as the Collator and Comparator class to handle this?
> > >
> > >Sorry if I'm talking out my ear, i18n is not one of my fortes.
> > >
> > >Thanks,
> > >
> > >David
> > >
> > >
> >
>


Re: Derby and character set encodings

2007-09-06 Thread David Van Couvering
I think I can actually answer some of these questions :)

On 9/6/07, Ken Frank <[EMAIL PROTECTED]> wrote:
> Thanks David for sending this.
>
> Let me note a few questions:
>
> 1.  when one creates a new database,
> is the database created with a certain encoding that will be used ?
>
> And if so, is that encoding that of the locale I am in when I run
> the create database commands or is it utf-8 always ?
> (for example, for one of the Japanese locales of Solaris, the encoding of it
> is euc-jp)
>
> or could it be that of the encoding of the locale the actual dbase server
> is started in ?  (which might be java's view of the users locale/encoding
> which would be I think the same as the OS locale user is in)
>
> I saw this from derby docs:
> "To support users in many different languages, Derby's SQL parser
> understands all Unicode characters and allows any Unicode character or
> number to be used in an identifier."
>
> but I don't know if it means that there is no concept of an encoding
> for a database itself or not.
>
> I think with Oracle for example, there is an argument to create database
> that lets one specify the encoding of it.
>

This question stumps me, I'll leave it to others...

>
>
> 2.  The locale the user is in when starting derby server -
> what things are affected by that - ie encoding of dbase, messages to
> user (if translated), time, date, etc ?
> (vs user needing to set separate variables or properties)
>

I don't know what "encoding of the dbase" means, but the other display
stuff: exception messages, time and date and money formats, etc., are
all controlled by locale.

> 3.  I think its allowed for identifiers like database names,
> table and column names, to have non ascii in them, if proper
> quoting is used when referring to them  ?
>

Yes, that's right.

>
> Thanks - Ken
>
>
> David Van Couvering wrote:
>
> >Hi, all.  I am getting some questions from Ken Frank NetBeans
> >internationalization quality team about Java DB and character set
> >encodings.  Rather than try and play go-between, I'm including him
> >here so he can directly ask any follow-on questions.
> >
> >Ken would like to understand how Derby makes use of character
> >encodings, and how it is affected by  various settings.  How does
> >Derby handle things if the encoding is set to something different from
> >our default of UTF-8?  Are we impacted, or do we rely on Java routines
> >such as the Collator and Comparator class to handle this?
> >
> >Sorry if I'm talking out my ear, i18n is not one of my fortes.
> >
> >Thanks,
> >
> >David
> >
> >
>


Re: Derby and character set encodings

2007-09-06 Thread Mike Matrigali

This is mixing a lot of things up.  I also may use the wrong
terminology here.

Character set encodings really only come into play with tools like
ij, and import getting the string from the environment into derby.  The more
standard interaction is using jdbc to load a java string into derby.
At that level we don't do anything with encodings.

We happen to use a modified utf8 to store stuff to disk, and this is
not configurable.  But no user interface should depend on this encoding, 
and Derby could change this storage in the future.


Logically all strings at runtime are converted to standard java char.

Before 10.3 we always used standard java string compare which did a 
numerical comparison of the unicode value of chars to arrive at 
ordering.  That is still the default.  In 10.3 an option was added to
set the territory based collation when the database is created such that 
comparison is dependent on the territory of the database.  For this 
standard java

rule based Collator interfaces are used.  This is documented in the latest
derby release.

David Van Couvering wrote:

Hi, all.  I am getting some questions from Ken Frank NetBeans
internationalization quality team about Java DB and character set
encodings.  Rather than try and play go-between, I'm including him
here so he can directly ask any follow-on questions.

Ken would like to understand how Derby makes use of character
encodings, and how it is affected by  various settings.  How does
Derby handle things if the encoding is set to something different from
our default of UTF-8?  Are we impacted, or do we rely on Java routines
such as the Collator and Comparator class to handle this?

Sorry if I'm talking out my ear, i18n is not one of my fortes.

Thanks,

David