Derby Char Column Type Versus VarChar

2010-04-05 Thread Mamatha Kodigehalli Venkatesh
Hello,

 

Please let us know what will be the performance impact when I use VarChar 
(4000) instead of Char.

I have found that the inserts to the table which contains this change are very 
slow...

 

But on oracle, there is not much of impact on this.

 

Based on the business requirement I need to use VarChar.

 

Any suggestions will be of great help.

 

Thanks

Mamatha

 

 



Re: Derby Char Column Type Versus VarChar

2010-04-05 Thread Peter Ondruška
I think Oracle Database and Derby behavior will be same. CHAR(size)
will preallocate size*characters in database page/block whereas
VARCHAR(size) will not. Maybe if you give us any hint what are you
trying to do we could help better. Peter

On Mon, Apr 5, 2010 at 3:41 PM, Mamatha Kodigehalli Venkatesh
mamatha.venkat...@ness.com wrote:
 Hello,



 Please let us know what will be the performance impact when I use VarChar
 (4000) instead of Char.

 I have found that the inserts to the table which contains this change are
 very slow…



 But on oracle, there is not much of impact on this.



 Based on the business requirement I need to use VarChar.



 Any suggestions will be of great help.



 Thanks

 Mamatha






Re: Derby internally converts char for bit data to hex string

2010-04-05 Thread Fedd Kraft

31.03.2010 13:56, Knut Anders Hatlen пишет:

On 03/31/10 08:26 AM, Fedd Kraft wrote:
   

Hello,

When profiling an application I've noticed that sometimes when performing a
simple query to one table without joins, like

select kee from res where qid=? and kee=?

it looks like Derby searching for a row makes a lot of comparisons by
converting binary data to hex strings and then comparing them.

And (here I am not sure) chooses to do a full scan. As the result, the whole
thing works too slow.

The table is defined like this.

create table res(
  qid char (16) for bit data not null,
  kee char (32) for bit data not null,
  dnk char (32) for bit data not null
);
create unique index res_unq on res (qid, kee, dnk);
create index res_idx on res (qid, dnk);
create index res_ix2 on res (qid, kee);

It also looks like it does this hex string comparison when the table is
quite big. And when it is quite small, call normal byte array comparison,
and there is no 'fullscan' classes in the stack trace, and it works
ultrafast.

Sorry there no reproducing scripts yet ready; please tell me may be I just
missing something. I thought that converting makes sense when we compare
values of different types or which may be null, but mine is not that case
(?)

 

Hi Fyodor,

Comparisons of binary data are supposed be performed byte-wise, without
converting the data to strings first, so if you can provide scripts or
code to reproduce this, please file a bug report here so that it can be
investigated:https://issues.apache.org/jira/browse/DERBY

You may first want to check that you're not using a debug build of
Derby, though, since it might be that the debug builds convert byte
arrays to hex strings to do some sanity checks. (If derby.jar contains a
class called SanityManager, it's a debug build.)

   

Hi Knut,

unfortunately I cannot yet reproduce this without my bigger project. My 
version in use has no SanityManager, and if I'd want to see which method 
Derby uses for the comparison, I'd have to use the debug version... 
Anyway, looking at the code, I see (db-derby-10.5.3.0-src.zip) that in 
the method org.apache.derby.iapi.types.SQLBinary.compare(int op, 
DataValueDescriptor other, boolean orderedNulls, boolean unknownRV) is 
called with orderedNulls parameter set to false, it always creates hex 
strings, see:


public final boolean compare(int op,
DataValueDescriptor other,
boolean orderedNulls,
boolean unknownRV)
throws StandardException
{
if (!orderedNulls) // nulls are unordered
{
// sanity manager stuff skipped
String otherString = other.getString(); //  line 574 HERE IS HEX 
STRING CREATION

if (this.getString() == null || otherString == null) // === AND HERE
return unknownRV;
}
return super.compare(op, other, orderedNulls, unknownRV);
}

The hex string is created through StringBuffer tens of thounsands times 
according to profiler; (and the debugger stopped on line 574, when used 
a debug build). Well, I was wrong that hex strings are compared; they 
are prepared only to know whether any of the values is null.



There is also a query plan problem with this table, and also when the 
table grows big:


create table res(
qid char (16) for bit data not null,
kee char (32) for bit data not null,
dnk char (32) for bit data not null
);
create unique index res_unq on res (qid, kee, dnk);
create index res_idx on res (qid, dnk);
create index res_ix2 on res (qid, kee);

The query is

select kee from res where dnk=? and qid=?

And here are two plans, one for big res table, and another for small. 
The first plan shows that the wrong index was chosen, and we read too 
many records to find the result. The table is populated right before it 
gets queried; Maybe there is some statistic stuff, but I would like to 
avoid running some Derby-specific calls in my application...


Thank you!


=


2010-04-05 18:27:29.062 GMT 
Thread[mety.store-(zipcode,(city,(*VALUE*,(5022740b-a32a-4d4a-8a38-511811e32dc5,5,m
ain] (XID = 1364486), (SESSIONID = 5), select kee from res where dnk=? 
and qid=? *** Project-Restrict ResultSet (2):

Number of opens = 34
Rows seen = 34
Rows filtered = 0
restriction = false
projection = true
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
optimizer estimated row count: 1488,65
optimizer estimated cost: 12706,02

Source result set:
Index Scan ResultSet for RES using index RES_UNQ at read committed 
isolation level using instantaneous share row

locking chosen by the optimizer
Number of opens = 34
Rows seen = 34
Rows filtered = 0
Fetch Size = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 0
close time (milliseconds) = 0
next time in milliseconds/row = 0

scan information:
Bit set of columns fetched={0, 1, 2}
Number of columns fetched=3
Number of deleted rows visited=0
Number of pages 

RE: Derby Char Column Type Versus VarChar

2010-04-05 Thread Mamatha Kodigehalli Venkatesh
Hello,

 

Below was my table structure in Derby

 

CREATE TABLE TIDLRREP ( REPOSITORY_KEY CHAR(10) NOT NULL, TRANSLATION_SOURCE 
CHAR(10) NOT NULL, SOURCE_TYPE CHAR(10) NOT NULL, SOURCE_SUB_TYPE CHAR(10) NOT 
NULL, PANEL_ID_7 CHAR(7) NOT NULL, TABLE_KEY CHAR(100) NOT NULL, OCCURRENCE 
CHAR(9) NOT NULL, BASE_LANGUAGE_TERM CHAR(79) NOT NULL, ELEM_NBR CHAR(8) NOT 
NULL, TRANSLATED_TERM_79 CHAR(79) NOT NULL, TRANS_TERM_SIZE INTEGER NOT NULL, 
MAX_TERM_SIZE INTEGER DEFAULT 0 NOT NULL, UPDATE_BY CHAR(8) NOT NULL, 
UPDATE_DATE CHAR(8) NOT NULL, VERIFICATION_DATE CHAR(8) NOT NULL, 
VERIFICATION_BY CHAR(8) NOT NULL, REFRESHED_DATE CHAR(8) NOT NULL, EXPORT_DATE 
CHAR(8) NOT NULL, REPOSITORY_STATUS CHAR(2) NOT NULL, STATUS_DATE CHAR(8) NOT 
NULL, STATUS_TIME CHAR(5) NOT NULL, LOCKED_REPOSITORY CHAR(1) NOT NULL, 
MULTIPLE_ROW_ENTRY CHAR(1) NOT NULL, REP_SYSTEM_DATA CHAR(1) NOT NULL, 
ALERT_GROUP_NAME CHAR(10) NOT NULL, FILE_NAME CHAR(50) NOT NULL, RESERVED_IND 
CHAR(1) NOT NULL, APPLICATION_VER CHAR(8) NOT NULL, MID_NUMBER CHAR(19) NOT 
NULL, LITERAL_ELEMENT_ID CHAR(8) NOT NULL, TIME_STAMP CHAR(26) NOT NULL);

 

Now it has been changed to reflect 

 

 

CREATE TABLE TIDLRREP ( REPOSITORY_KEY CHAR(10) NOT NULL, TRANSLATION_SOURCE 
CHAR(10) NOT NULL, SOURCE_TYPE CHAR(10) NOT NULL, SOURCE_SUB_TYPE CHAR(10) NOT 
NULL, PANEL_ID_7 CHAR(7) NOT NULL, TABLE_KEY CHAR(100) NOT NULL, OCCURRENCE 
CHAR(9) NOT NULL, BASE_LANGUAGE_TERM VARCHAR(4000) NOT NULL, ELEM_NBR CHAR(8) 
NOT NULL, TRANSLATED_TERM_79 VARCHAR(4000) NOT NULL, TRANS_TERM_SIZE INTEGER 
NOT NULL, MAX_TERM_SIZE INTEGER DEFAULT 0 NOT NULL, UPDATE_BY CHAR(8) NOT NULL, 
UPDATE_DATE CHAR(8) NOT NULL, VERIFICATION_DATE CHAR(8) NOT NULL, 
VERIFICATION_BY CHAR(8) NOT NULL, REFRESHED_DATE CHAR(8) NOT NULL, EXPORT_DATE 
CHAR(8) NOT NULL, REPOSITORY_STATUS CHAR(2) NOT NULL, STATUS_DATE CHAR(8) NOT 
NULL, STATUS_TIME CHAR(5) NOT NULL, LOCKED_REPOSITORY CHAR(1) NOT NULL, 
MULTIPLE_ROW_ENTRY CHAR(1) NOT NULL, REP_SYSTEM_DATA CHAR(1) NOT NULL, 
ALERT_GROUP_NAME CHAR(10) NOT NULL, FILE_NAME CHAR(50) NOT NULL, RESERVED_IND 
CHAR(1) NOT NULL, APPLICATION_VER CHAR(8) NOT NULL, MID_NUMBER CHAR(19) NOT 
NULL, LITERAL_ELEMENT_ID CHAR(8) NOT NULL, TIME_STAMP CHAR(26) NOT NULL);

 

 

Inserts to this table TIDLRREP was much faster (2secs for 50 records) now with 
the table changes the imports are very slow it is taking 52 secs for 50 records.

 

The same changes on oracle DB is not impacting the insert time for the same set 
of 50 records.

 

TIDLRREP contains around 180,000 records.

 

Please let me know if you need any other details.

 

Thanks

Mamatha

 

 

 

-Original Message-
From: Peter Ondruška [mailto:peter.ondru...@gmail.com] 
Sent: Monday, April 05, 2010 7:37 PM
To: Derby Discussion
Subject: Re: Derby Char Column Type Versus VarChar

 

I think Oracle Database and Derby behavior will be same. CHAR(size)

will preallocate size*characters in database page/block whereas

VARCHAR(size) will not. Maybe if you give us any hint what are you

trying to do we could help better. Peter

 

On Mon, Apr 5, 2010 at 3:41 PM, Mamatha Kodigehalli Venkatesh

mamatha.venkat...@ness.com wrote:

 Hello,

 

 

 

 Please let us know what will be the performance impact when I use VarChar

 (4000) instead of Char.

 

 I have found that the inserts to the table which contains this change are

 very slow...

 

 

 

 But on oracle, there is not much of impact on this.

 

 

 

 Based on the business requirement I need to use VarChar.

 

 

 

 Any suggestions will be of great help.

 

 

 

 Thanks

 

 Mamatha