Derby Char Column Type Versus VarChar
Hello, Please let us know what will be the performance impact when I use VarChar (4000) instead of Char. I have found that the inserts to the table which contains this change are very slow... But on oracle, there is not much of impact on this. Based on the business requirement I need to use VarChar. Any suggestions will be of great help. Thanks Mamatha
Re: Derby Char Column Type Versus VarChar
I think Oracle Database and Derby behavior will be same. CHAR(size) will preallocate size*characters in database page/block whereas VARCHAR(size) will not. Maybe if you give us any hint what are you trying to do we could help better. Peter On Mon, Apr 5, 2010 at 3:41 PM, Mamatha Kodigehalli Venkatesh mamatha.venkat...@ness.com wrote: Hello, Please let us know what will be the performance impact when I use VarChar (4000) instead of Char. I have found that the inserts to the table which contains this change are very slow… But on oracle, there is not much of impact on this. Based on the business requirement I need to use VarChar. Any suggestions will be of great help. Thanks Mamatha
Re: Derby internally converts char for bit data to hex string
31.03.2010 13:56, Knut Anders Hatlen пишет: On 03/31/10 08:26 AM, Fedd Kraft wrote: Hello, When profiling an application I've noticed that sometimes when performing a simple query to one table without joins, like select kee from res where qid=? and kee=? it looks like Derby searching for a row makes a lot of comparisons by converting binary data to hex strings and then comparing them. And (here I am not sure) chooses to do a full scan. As the result, the whole thing works too slow. The table is defined like this. create table res( qid char (16) for bit data not null, kee char (32) for bit data not null, dnk char (32) for bit data not null ); create unique index res_unq on res (qid, kee, dnk); create index res_idx on res (qid, dnk); create index res_ix2 on res (qid, kee); It also looks like it does this hex string comparison when the table is quite big. And when it is quite small, call normal byte array comparison, and there is no 'fullscan' classes in the stack trace, and it works ultrafast. Sorry there no reproducing scripts yet ready; please tell me may be I just missing something. I thought that converting makes sense when we compare values of different types or which may be null, but mine is not that case (?) Hi Fyodor, Comparisons of binary data are supposed be performed byte-wise, without converting the data to strings first, so if you can provide scripts or code to reproduce this, please file a bug report here so that it can be investigated:https://issues.apache.org/jira/browse/DERBY You may first want to check that you're not using a debug build of Derby, though, since it might be that the debug builds convert byte arrays to hex strings to do some sanity checks. (If derby.jar contains a class called SanityManager, it's a debug build.) Hi Knut, unfortunately I cannot yet reproduce this without my bigger project. My version in use has no SanityManager, and if I'd want to see which method Derby uses for the comparison, I'd have to use the debug version... Anyway, looking at the code, I see (db-derby-10.5.3.0-src.zip) that in the method org.apache.derby.iapi.types.SQLBinary.compare(int op, DataValueDescriptor other, boolean orderedNulls, boolean unknownRV) is called with orderedNulls parameter set to false, it always creates hex strings, see: public final boolean compare(int op, DataValueDescriptor other, boolean orderedNulls, boolean unknownRV) throws StandardException { if (!orderedNulls) // nulls are unordered { // sanity manager stuff skipped String otherString = other.getString(); // line 574 HERE IS HEX STRING CREATION if (this.getString() == null || otherString == null) // === AND HERE return unknownRV; } return super.compare(op, other, orderedNulls, unknownRV); } The hex string is created through StringBuffer tens of thounsands times according to profiler; (and the debugger stopped on line 574, when used a debug build). Well, I was wrong that hex strings are compared; they are prepared only to know whether any of the values is null. There is also a query plan problem with this table, and also when the table grows big: create table res( qid char (16) for bit data not null, kee char (32) for bit data not null, dnk char (32) for bit data not null ); create unique index res_unq on res (qid, kee, dnk); create index res_idx on res (qid, dnk); create index res_ix2 on res (qid, kee); The query is select kee from res where dnk=? and qid=? And here are two plans, one for big res table, and another for small. The first plan shows that the wrong index was chosen, and we read too many records to find the result. The table is populated right before it gets queried; Maybe there is some statistic stuff, but I would like to avoid running some Derby-specific calls in my application... Thank you! = 2010-04-05 18:27:29.062 GMT Thread[mety.store-(zipcode,(city,(*VALUE*,(5022740b-a32a-4d4a-8a38-511811e32dc5,5,m ain] (XID = 1364486), (SESSIONID = 5), select kee from res where dnk=? and qid=? *** Project-Restrict ResultSet (2): Number of opens = 34 Rows seen = 34 Rows filtered = 0 restriction = false projection = true constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 restriction time (milliseconds) = 0 projection time (milliseconds) = 0 optimizer estimated row count: 1488,65 optimizer estimated cost: 12706,02 Source result set: Index Scan ResultSet for RES using index RES_UNQ at read committed isolation level using instantaneous share row locking chosen by the optimizer Number of opens = 34 Rows seen = 34 Rows filtered = 0 Fetch Size = 16 constructor time (milliseconds) = 0 open time (milliseconds) = 0 next time (milliseconds) = 0 close time (milliseconds) = 0 next time in milliseconds/row = 0 scan information: Bit set of columns fetched={0, 1, 2} Number of columns fetched=3 Number of deleted rows visited=0 Number of pages
RE: Derby Char Column Type Versus VarChar
Hello, Below was my table structure in Derby CREATE TABLE TIDLRREP ( REPOSITORY_KEY CHAR(10) NOT NULL, TRANSLATION_SOURCE CHAR(10) NOT NULL, SOURCE_TYPE CHAR(10) NOT NULL, SOURCE_SUB_TYPE CHAR(10) NOT NULL, PANEL_ID_7 CHAR(7) NOT NULL, TABLE_KEY CHAR(100) NOT NULL, OCCURRENCE CHAR(9) NOT NULL, BASE_LANGUAGE_TERM CHAR(79) NOT NULL, ELEM_NBR CHAR(8) NOT NULL, TRANSLATED_TERM_79 CHAR(79) NOT NULL, TRANS_TERM_SIZE INTEGER NOT NULL, MAX_TERM_SIZE INTEGER DEFAULT 0 NOT NULL, UPDATE_BY CHAR(8) NOT NULL, UPDATE_DATE CHAR(8) NOT NULL, VERIFICATION_DATE CHAR(8) NOT NULL, VERIFICATION_BY CHAR(8) NOT NULL, REFRESHED_DATE CHAR(8) NOT NULL, EXPORT_DATE CHAR(8) NOT NULL, REPOSITORY_STATUS CHAR(2) NOT NULL, STATUS_DATE CHAR(8) NOT NULL, STATUS_TIME CHAR(5) NOT NULL, LOCKED_REPOSITORY CHAR(1) NOT NULL, MULTIPLE_ROW_ENTRY CHAR(1) NOT NULL, REP_SYSTEM_DATA CHAR(1) NOT NULL, ALERT_GROUP_NAME CHAR(10) NOT NULL, FILE_NAME CHAR(50) NOT NULL, RESERVED_IND CHAR(1) NOT NULL, APPLICATION_VER CHAR(8) NOT NULL, MID_NUMBER CHAR(19) NOT NULL, LITERAL_ELEMENT_ID CHAR(8) NOT NULL, TIME_STAMP CHAR(26) NOT NULL); Now it has been changed to reflect CREATE TABLE TIDLRREP ( REPOSITORY_KEY CHAR(10) NOT NULL, TRANSLATION_SOURCE CHAR(10) NOT NULL, SOURCE_TYPE CHAR(10) NOT NULL, SOURCE_SUB_TYPE CHAR(10) NOT NULL, PANEL_ID_7 CHAR(7) NOT NULL, TABLE_KEY CHAR(100) NOT NULL, OCCURRENCE CHAR(9) NOT NULL, BASE_LANGUAGE_TERM VARCHAR(4000) NOT NULL, ELEM_NBR CHAR(8) NOT NULL, TRANSLATED_TERM_79 VARCHAR(4000) NOT NULL, TRANS_TERM_SIZE INTEGER NOT NULL, MAX_TERM_SIZE INTEGER DEFAULT 0 NOT NULL, UPDATE_BY CHAR(8) NOT NULL, UPDATE_DATE CHAR(8) NOT NULL, VERIFICATION_DATE CHAR(8) NOT NULL, VERIFICATION_BY CHAR(8) NOT NULL, REFRESHED_DATE CHAR(8) NOT NULL, EXPORT_DATE CHAR(8) NOT NULL, REPOSITORY_STATUS CHAR(2) NOT NULL, STATUS_DATE CHAR(8) NOT NULL, STATUS_TIME CHAR(5) NOT NULL, LOCKED_REPOSITORY CHAR(1) NOT NULL, MULTIPLE_ROW_ENTRY CHAR(1) NOT NULL, REP_SYSTEM_DATA CHAR(1) NOT NULL, ALERT_GROUP_NAME CHAR(10) NOT NULL, FILE_NAME CHAR(50) NOT NULL, RESERVED_IND CHAR(1) NOT NULL, APPLICATION_VER CHAR(8) NOT NULL, MID_NUMBER CHAR(19) NOT NULL, LITERAL_ELEMENT_ID CHAR(8) NOT NULL, TIME_STAMP CHAR(26) NOT NULL); Inserts to this table TIDLRREP was much faster (2secs for 50 records) now with the table changes the imports are very slow it is taking 52 secs for 50 records. The same changes on oracle DB is not impacting the insert time for the same set of 50 records. TIDLRREP contains around 180,000 records. Please let me know if you need any other details. Thanks Mamatha -Original Message- From: Peter Ondruška [mailto:peter.ondru...@gmail.com] Sent: Monday, April 05, 2010 7:37 PM To: Derby Discussion Subject: Re: Derby Char Column Type Versus VarChar I think Oracle Database and Derby behavior will be same. CHAR(size) will preallocate size*characters in database page/block whereas VARCHAR(size) will not. Maybe if you give us any hint what are you trying to do we could help better. Peter On Mon, Apr 5, 2010 at 3:41 PM, Mamatha Kodigehalli Venkatesh mamatha.venkat...@ness.com wrote: Hello, Please let us know what will be the performance impact when I use VarChar (4000) instead of Char. I have found that the inserts to the table which contains this change are very slow... But on oracle, there is not much of impact on this. Based on the business requirement I need to use VarChar. Any suggestions will be of great help. Thanks Mamatha