Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Tito Ciuro wrote: Due to application requirements, I must rely on LIKE and GLOB in order to match data, matching data that contains some value (sensitive or insensitive match). Now, it seems to me that using either LIKE or GLOB will force a row scan anyhow, since it can't use the index, correct? So your solution would force me to perform the row scan for 250K rows when matching values. That is 10 times more data to scan through. Tito, I'm glad to see you are looking at this. First, the attribute value table is storing the same string data in my proposal and your value table. In my case there is some overhead for the additional rowids (3 bytes for 250K rows), and the FileId column (2 bytes for 25K files), but in your case you have the overhead of the field separator commas (13 bytes in your example with 14 attributes). My table has 250K short rows, and yours has 25K long rows. There are 10 times as many rows in my table, but your rows are about 10 times a large. There may be a small difference in the total amount of data, but it is not going to be significant one way or the other. The real heart of the problem is your requirement to match data using LIKE due to "application requirements". Can you expand on that statement so that I can understand the details of your requirements? From your examples, what you are doing is using LIKE to do equality tests for a subfield in your large value column. With the values separated into separate rows there is no need to use LIKE to do that. WHERE CMValues LIKE '%2004-12-16 10:11:35 -0800%' would become WHERE Attribute.Value = '2004-12-16 10:11:35 -0800' If you need case insensitive searches you simply convert the value string to the same case as your test string in the equality test. Or if you are unsure of the case of the test string, you can convert them both. WHERE upper(Attribute.Value) = 'ABIWORD.PROFILE' or WHERE upper(Attribute.Value) = upper(:match_value) Now, these comparisons will scan all the values, since they don't specify which attribute you want to check. This may be what you want if you are searching for a date and you don't care if it matches the create date or the modification date, but generally that is not what you want to do (Why look at file size or album name data when looking for a date?). Usually you will want to check the value of a particular attribute as shown below. WHERE AttributeDefinition.Name = ' Modification Date' AND Attribute.Value = :mod_date This query will use the AttributeValue index to reduce the rows scanned to only those that hold values of this attribute. In this case each file has a mod date so it will search 25K rows. For a query that is looking for a attribute like 'Album Name' which only exists for your 1K music files, it will only scan those 1K rows checking for album name matches. In some cases you may want to look in a set of attributes, which you can do as shown below. WHERE AttributeDefinition.Name IN ( ' Modification Date', 'Create Date', 'Backup Date') AND Attribute.Value = :date The possibilities are, quite literally, endless. If you have some sample queries that you need to make please tell me what they are so we can talk about more concrete examples. HTH Dennis Cote
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Hi Denis, I've been reading your email carefully and I'd like to comment it. On 28/03/2006, at 14:24, Dennis Cote wrote: With these tables you will have 25K rows in the File table, one per file, and 250K rows in the Attribute table assuming an average of 10 attributes per file (your example had 14). An index on the attribute name and value pairs will greatly speedup lookups of a particular attribute. There is still some redundancy in this set of tables since the names of the attributes are being repeated over and over in the Attribute table. This may not be a big problem, but you really only have 100 attributes, so there is no sense storing 25K copies of the name of a common attribute that applies to all files. This information can be factored out into a third table as shown below. [...] This will give you three tables (which will not cause a noticeable difference in the database open time). One with 25K rows of file names. One with 100 rows of attribute definition data, which includes the attribute name. And one with 250K rows of attribute value data. Due to application requirements, I must rely on LIKE and GLOB in order to match data, matching data that contains some value (sensitive or insensitive match). Now, it seems to me that using either LIKE or GLOB will force a row scan anyhow, since it can't use the index, correct? So your solution would force me to perform the row scan for 250K rows when matching values. That is 10 times more data to scan through. I'm not sure if there is a way to speed up LIKE or GLOB queries such as: SELECT myvalue FROM files WHERE myvalue LIKE '%finit%'; Thanks a lot for the advice. Regards, -- Tito
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
No problem. Your approach is a bit non-standard, but I'm glad you got it working. It would be interesting to try what the others on the list are suggesting (splitting your data into multiple columns, so that the db can index them rather than forcing a full string search for every query) to see if you got a speed up, but if "non-standard" is fast enough for, I guess it's fast enough. They're only tring to help. ;) Regards, Martin - Original Message - From: "Tito Ciuro" <[EMAIL PROTECTED]> To: Sent: Tuesday, March 28, 2006 4:32 PM Subject: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers? Hi Martin, I haven't been able to pick up pace my email until this morning. Sorry about that. As it turns out, the error was on my side (surprise). The code seemed to be working fine: early tests showed data being churned as expected, so I started cleaning it up and factoring up the code. To make the story short, I accidentally deleted the line that was incrementing the index keeping in sync with the array enumerator. Result: I was always storing element zero in the database. No wonder I had 4000+ entries looking exactly the same! The first version of the code was working fine, so I assumed that LIKE and GLOB were there culprits after I started noticing weird results. Oh!... just a quick message for the row-scan non-believers out there: SQLite flies, even under this scenario. I'm getting wonderful performance numbers by keeping everything within SQLite-land. The code is simple and I let SQLite do all the magic. What else can I ask for? :-) Thanks everybody for your help and comments. Regards, -- Tito On 27/03/2006, at 3:40, Martin Jenkins wrote: Tito, I knocked up a quick test with python and apsw and it worked as intended. My data isn't exactly the same as yours in that I don't have the variety in the keys, but you're not having problems with those. My test database contains your data with/without embedded carriage returns - as expected, it makes no difference. In the following, zip(..) is a quick hack to get all the results from the query. The spurious '[', ']' and other brackets surrounding the results are a result of the way that apsw returns data (as lists of python tuples). Apologies for the extreme width of the following lines. :( zip(csr.execute("select * from t")) [ (('file5809', '(0,NSFileTypeRegular,0,22537,0,staff, 234881026,294022,2004-12-16 10:11:00 -0800,tciuro,384,2006-03-26 08:01:55 -0800,502,20)'),), (('file0101581a', '(1,NSFileTypeRegular,1,22554,0,staff, 234881026,294022,2004-12-16 10:11:03 -0800,tciuro,384,2006-03-26 08:04:55 -0800,502,20)'),), (('file0202582b', '(2,NSFileTypeRegular,2,22571,0,staff, 234881026,294022,2004-12-16 10:11:06 -0800,tciuro,384,2006-03-26 08:07:55 -0800,502,20)'),), ... (('file595d', '(\n 0,\nNSFileTypeRegular,\n0,\n 22877,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:00 -0800,\ntciuro,\n384,\n2006-03-26 08:01:55 -0800,\n 502,\n20\n)'),), (('file0101596e', '(\n 1,\nNSFileTypeRegular,\n1,\n 22894,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:03 -0800,\ntciuro,\n384,\n2006-03-26 08:04:55 -0800,\n 502,\n20\n)'),), (('file0202597f', '(\n 2,\nNSFileTypeRegular,\n2,\n22911,\n0,\n staff,\n 234881026,\n294022,\n2004-12-16 10:11:06 -0800, \n tciuro,\n 384,\n2006-03-26 08:07:55 -0800,\n502,\n 20\n)'),), ... ] zip(csr.execute("SELECT * FROM t WHERE CMValues GLOB '*2004-12-16 10:11:45 -0800*'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff, 234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n20\n)'),) ] zip(csr.execute("SELECT * FROM t WHERE CMValues LIKE '%2004-12-16 10:11:45 -0800%'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff, 234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n 20\n)'),) ] Could you try reducing your search strings and see if there's a point at which they start working
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Tito, I am replying off list because I don't want to contribute to turning this thread into a flame war... You have shown wonderful patience and politeness on this list. You have my gratitude and respect for this. Regards, e Tuesday, March 28, 2006, 5:35:37 PM, Tito Ciuro wrote: > Hi Dennis, > On 28/03/2006, at 14:24, Dennis Cote wrote: >> If you give this a try, you might be surprised at how flexible it is. > That was a very clear explanation. I will sure give it a try when I > have a chance. >> Just because something works doesn't mean that it can't be improved. > I couldn't agree more with you. > Thanks for the help! > Regards, > -- Tito -- Doug Currie Londonderry, NH
RE: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Very well thought out response. Now perhaps we will see if various people here attempting to help have been, as we say in east Texas, "Talking to a fence post." :-) Fred > -Original Message- > From: Dennis Cote [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 28, 2006 4:24 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with > numbers? > > > Tito Ciuro wrote: > > > > > I have no idea why you're so angry. Anyway, there are so ... > While the tone of MGC's post may have been a little over the > top, many > of his points were valid. > > All he was really suggesting is that a set of normalized > tables would do > the same thing, and probably preform better, especially when > scaled to > larger tables. ...
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Hi Dennis, On 28/03/2006, at 14:24, Dennis Cote wrote: If you give this a try, you might be surprised at how flexible it is. That was a very clear explanation. I will sure give it a try when I have a chance. Just because something works doesn't mean that it can't be improved. I couldn't agree more with you. Thanks for the help! Regards, -- Tito
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Tito Ciuro wrote: I have no idea why you're so angry. Anyway, there are so many things I can think of saying, I'll just make it brief and to the point. This thing won't scale. I'd like to see it when you have the 4.5 million records my database contains, and that is still tiny for all intents-and-purposes. For the type of application I'm building, it doesn't need to scale. At most I'll have 25K records. Even with the 100K database I've used for testing, it simply flies. I'm not building anything for the outside world. Just something that serves me well. I'm storing variable-length data, with a very different set of attributes. Some may have 1 attribute, others tens of them. Perhaps even a hundred. Using a column per attribute is not a good idea. A few days ago I asked this question and Dr. Hipp mentioned: The more tables you have, the slower the first query will run and the more memory SQLite will use. For long-running applications where the startup time is not a significant factor, 100s or 1000s of tables is fine. For a CGI script that starts itself up anew several times per second, then you should try to keep the number of tables below a 100, I think. Less than that if you can. You should also try and keep down the number of tables in low-memory embedded applications, in order to save on memory usages. Each table takes a few hundred bytes of memory - depending on the number of columns and features. Having two columns (one for the key and the other one for the data itself) seems like a good balance between speed and ease of use. I don't care if it doesn't scale because the intended deployment is 25K at most, as I said earlier. Even with data sets 4x that size works fine. There is absolutely no reason this could not be properly designed and STILL fit any possible need for that ugly packed record. Tito, While the tone of MGC's post may have been a little over the top, many of his points were valid. All he was really suggesting is that a set of normalized tables would do the same thing, and probably preform better, especially when scaled to larger tables. In your application you have 25K files each with 1 to 100 attributes. You can still store your data in two normalized tables. The main idea is to use more than one row in the value table for each file. These values are all tied to the file name by a common file id value. This table schema is shown below. -- basic two table attribute setup create table File ( Id integer primary key; Nametext; -- name of file including path ); create table Attribute ( FileId integer references File(Id); -- foreign key Nametext; -- name of attribute Value text; -- value of attribute primary key (FileId, Name); ); create index AttributeValue on Attribute(Name, Value); With these tables you will have 25K rows in the File table, one per file, and 250K rows in the Attribute table assuming an average of 10 attributes per file (your example had 14). An index on the attribute name and value pairs will greatly speedup lookups of a particular attribute. There is still some redundancy in this set of tables since the names of the attributes are being repeated over and over in the Attribute table. This may not be a big problem, but you really only have 100 attributes, so there is no sense storing 25K copies of the name of a common attribute that applies to all files. This information can be factored out into a third table as shown below. -- Factor out the attribute definitions into a separate table create table File ( Id integer primary key; Nametext; -- name of file including path ); create table AttributeDefinition ( Id integer primary key; Nametext unique;-- name of the attribute Typetext; -- type of the attribute ); create table Attribute ( FileId integer references File(Id); AttId integer references AttributeDefinition(Id); Value text; -- value of attribute primary key (FileId, AttId); ); create index AttributeValue on Attribute(AttId, Value); This will give you three tables (which will not cause a noticeable difference in the database open time). One with 25K rows of file names. One with 100 rows of attribute definition data, which includes the attribute name. And one with 250K rows of attribute value data. The third table assigns each attribute an id number which will fit in a single byte, and that id is used to replace the 25K copies of the attribute name. This will be a sizable reduction in stored data for all but the shortest names. It also allows other definition data to be
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
John, Did you read my replies at all? If not, please take 15 seconds to do so. I thanked *all of you* in all three emails. Just in case, if that wasn't enough, allow me to do that for the fourth time: I most sincerely appreciate the time and effort that you guys have taken to answer my emails. I really do appreciate it your help. Best regards, -- Tito On 28/03/2006, at 13:45, John Stanton wrote: Tito, Several people have bothered to spend the time and effort to give you some well considered help. That is what forums like this are for, professional guidance and development. Thank them.
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Tito, Several people have bothered to spend the time and effort to give you some well considered help. That is what forums like this are for, professional guidance and development. Thank them. Tito Ciuro wrote: MGC, I have no idea why you're so angry. Anyway, there are so many things I can think of saying, I'll just make it brief and to the point. 1) Regarding your statement: This thing won't scale. I'd like to see it when you have the 4.5 million records my database contains, and that is still tiny for all intents-and-purposes. For the type of application I'm building, it doesn't need to scale. At most I'll have 25K records. Even with the 100K database I've used for testing, it simply flies. I'm not building anything for the outside world. Just something that serves me well. 2) Regarding: Absolutely NO ONE suggested moving anything out of "SQLite-land". What!?!? You mentioned it two days ago!: Stuff it into a sorted flat file. that would be faster and simpler. 3) Regarding your statement: As to your 'real good reason' for doing it this way, I'd bet cash money it's crap and based on nothing more than 'Because that's the way I decided to do it, and I'm smart'. Talk about making things up... you're a funny guy :-) I'm storing variable-length data, with a very different set of attributes. Some may have 1 attribute, others tens of them. Perhaps even a hundred. Using a column per attribute is not a good idea. A few days ago I asked this question and Dr. Hipp mentioned: The more tables you have, the slower the first query will run and the more memory SQLite will use. For long-running applications where the startup time is not a significant factor, 100s or 1000s of tables is fine. For a CGI script that starts itself up anew several times per second, then you should try to keep the number of tables below a 100, I think. Less than that if you can. You should also try and keep down the number of tables in low-memory embedded applications, in order to save on memory usages. Each table takes a few hundred bytes of memory - depending on the number of columns and features. Having two columns (one for the key and the other one for the data itself) seems like a good balance between speed and ease of use. I don't care if it doesn't scale because the intended deployment is 25K at most, as I said earlier. Even with data sets 4x that size works fine. There is absolutely no reason this could not be properly designed and STILL fit any possible need for that ugly packed record. So you know it all, eh? And you call me arrogant? :-) I'm very happy for the solution: it's speedy and is simple. As for the original question I posted, I'm also glad to report that LIKE and GLOB works fine. Cheers, -- Tito
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
MGC, I have no idea why you're so angry. Anyway, there are so many things I can think of saying, I'll just make it brief and to the point. 1) Regarding your statement: This thing won't scale. I'd like to see it when you have the 4.5 million records my database contains, and that is still tiny for all intents-and-purposes. For the type of application I'm building, it doesn't need to scale. At most I'll have 25K records. Even with the 100K database I've used for testing, it simply flies. I'm not building anything for the outside world. Just something that serves me well. 2) Regarding: Absolutely NO ONE suggested moving anything out of "SQLite-land". What!?!? You mentioned it two days ago!: Stuff it into a sorted flat file. that would be faster and simpler. 3) Regarding your statement: As to your 'real good reason' for doing it this way, I'd bet cash money it's crap and based on nothing more than 'Because that's the way I decided to do it, and I'm smart'. Talk about making things up... you're a funny guy :-) I'm storing variable-length data, with a very different set of attributes. Some may have 1 attribute, others tens of them. Perhaps even a hundred. Using a column per attribute is not a good idea. A few days ago I asked this question and Dr. Hipp mentioned: The more tables you have, the slower the first query will run and the more memory SQLite will use. For long-running applications where the startup time is not a significant factor, 100s or 1000s of tables is fine. For a CGI script that starts itself up anew several times per second, then you should try to keep the number of tables below a 100, I think. Less than that if you can. You should also try and keep down the number of tables in low-memory embedded applications, in order to save on memory usages. Each table takes a few hundred bytes of memory - depending on the number of columns and features. Having two columns (one for the key and the other one for the data itself) seems like a good balance between speed and ease of use. I don't care if it doesn't scale because the intended deployment is 25K at most, as I said earlier. Even with data sets 4x that size works fine. There is absolutely no reason this could not be properly designed and STILL fit any possible need for that ugly packed record. So you know it all, eh? And you call me arrogant? :-) I'm very happy for the solution: it's speedy and is simple. As for the original question I posted, I'm also glad to report that LIKE and GLOB works fine. Cheers, -- Tito
Re: [sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Tito Ciuro wrote: Oh!... just a quick message for the row-scan non-believers out there: SQLite flies, even under this scenario. I'm getting wonderful performance numbers by keeping everything within SQLite-land. The code is simple and I let SQLite do all the magic. What else can I ask for? :-) If only you would listen enough to understand how clueless this statement is You are clearly too arrogant to even listen to the reasons you have built and will continue to build poor relational database designs. This thing won't scale. I'd like to see it when you have the 4.5 million records my database contains, and that is still tiny for all intents-and-purposes. Absolutely NO ONE suggested moving anything out of "SQLite-land". It was pointed out that: You have defeated the purpose even using a relational database, this thing is just a data dump. You have neglected the most basic rules of design. You will never have the advantage of using an index. You are forcing the database to read and parse every single record from that table every single time you select a set of records. If you refuse to understand why it is better to design a properly normalized database that will find and return a single record from a table with 4.5 million rows in 2 dozen disk reads, or fewer if properly cached, rather than to force it to read and parse every single one of those 4.5 million rows, you deserve what you get. As to your 'real good reason' for doing it this way, I'd bet cash money it's crap and based on nothing more than 'Because that's the way I decided to do it, and I'm smart'. There is absolutely no reason this could not be properly designed and STILL fit any possible need for that ugly packed record. MGC
[sqlite] [SOLVED] Re: [sqlite] LIKE and GLOB bug with numbers?
Hi Martin, I haven't been able to pick up pace my email until this morning. Sorry about that. As it turns out, the error was on my side (surprise). The code seemed to be working fine: early tests showed data being churned as expected, so I started cleaning it up and factoring up the code. To make the story short, I accidentally deleted the line that was incrementing the index keeping in sync with the array enumerator. Result: I was always storing element zero in the database. No wonder I had 4000+ entries looking exactly the same! The first version of the code was working fine, so I assumed that LIKE and GLOB were there culprits after I started noticing weird results. Oh!... just a quick message for the row-scan non-believers out there: SQLite flies, even under this scenario. I'm getting wonderful performance numbers by keeping everything within SQLite-land. The code is simple and I let SQLite do all the magic. What else can I ask for? :-) Thanks everybody for your help and comments. Regards, -- Tito On 27/03/2006, at 3:40, Martin Jenkins wrote: Tito, I knocked up a quick test with python and apsw and it worked as intended. My data isn't exactly the same as yours in that I don't have the variety in the keys, but you're not having problems with those. My test database contains your data with/without embedded carriage returns - as expected, it makes no difference. In the following, zip(..) is a quick hack to get all the results from the query. The spurious '[', ']' and other brackets surrounding the results are a result of the way that apsw returns data (as lists of python tuples). Apologies for the extreme width of the following lines. :( zip(csr.execute("select * from t")) [ (('file5809', '(0,NSFileTypeRegular,0,22537,0,staff, 234881026,294022,2004-12-16 10:11:00 -0800,tciuro,384,2006-03-26 08:01:55 -0800,502,20)'),), (('file0101581a', '(1,NSFileTypeRegular,1,22554,0,staff, 234881026,294022,2004-12-16 10:11:03 -0800,tciuro,384,2006-03-26 08:04:55 -0800,502,20)'),), (('file0202582b', '(2,NSFileTypeRegular,2,22571,0,staff, 234881026,294022,2004-12-16 10:11:06 -0800,tciuro,384,2006-03-26 08:07:55 -0800,502,20)'),), ... (('file595d', '(\n 0,\nNSFileTypeRegular,\n0,\n 22877,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:00 -0800,\ntciuro,\n384,\n2006-03-26 08:01:55 -0800,\n 502,\n20\n)'),), (('file0101596e', '(\n 1,\nNSFileTypeRegular,\n1,\n 22894,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:03 -0800,\ntciuro,\n384,\n2006-03-26 08:04:55 -0800,\n 502,\n20\n)'),), (('file0202597f', '(\n 2,\nNSFileTypeRegular,\n2,\n22911,\n0,\n staff,\n 234881026,\n294022,\n2004-12-16 10:11:06 -0800, \ntciuro,\n 384,\n2006-03-26 08:07:55 -0800,\n502,\n 20\n)'),), ... ] zip(csr.execute("SELECT * FROM t WHERE CMValues GLOB '*2004-12-16 10:11:45 -0800*'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff, 234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n20\n)'),) ] zip(csr.execute("SELECT * FROM t WHERE CMValues LIKE '%2004-12-16 10:11:45 -0800%'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff, 234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n 2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n20\n)'),) ] Could you try reducing your search strings and see if there's a point at which they start working? HTH, Martin Jenkins XQP Ltd Ascot, UK - Original Message - From: "Tito Ciuro" <[EMAIL PROTECTED]> To: "Forum SQLite" Sent: Sunday, March 26, 2006 6:50 PM Subject: [sqlite] LIKE and GLOB bug with numbers? Hello, I've populated a datafile with 40.176 records which contain file attributes and file paths. I have two columns, CMKey and CMValues. The column CMKey contains the path to the file and the column CMValues contains the attribute values. For example: CMKey: Application Support/AbiSuite/AbiWord.Profile CMValues: ( 0, NSFileTypeRegular, 1, 21508, 0, st
Re: [sqlite] LIKE and GLOB bug with numbers?
Tito, I knocked up a quick test with python and apsw and it worked as intended. My data isn't exactly the same as yours in that I don't have the variety in the keys, but you're not having problems with those. My test database contains your data with/without embedded carriage returns - as expected, it makes no difference. In the following, zip(..) is a quick hack to get all the results from the query. The spurious '[', ']' and other brackets surrounding the results are a result of the way that apsw returns data (as lists of python tuples). Apologies for the extreme width of the following lines. :( zip(csr.execute("select * from t")) [ (('file5809', '(0,NSFileTypeRegular,0,22537,0,staff,234881026,294022,2004-12-16 10:11:00 -0800,tciuro,384,2006-03-26 08:01:55 -0800,502,20)'),), (('file0101581a', '(1,NSFileTypeRegular,1,22554,0,staff,234881026,294022,2004-12-16 10:11:03 -0800,tciuro,384,2006-03-26 08:04:55 -0800,502,20)'),), (('file0202582b', '(2,NSFileTypeRegular,2,22571,0,staff,234881026,294022,2004-12-16 10:11:06 -0800,tciuro,384,2006-03-26 08:07:55 -0800,502,20)'),), ... (('file595d', '(\n 0,\nNSFileTypeRegular,\n0,\n 22877,\n0,\nstaff,\n234881026,\n294022,\n2004-12-16 10:11:00 -0800,\ntciuro,\n384,\n2006-03-26 08:01:55 -0800,\n 502,\n20\n)'),), (('file0101596e', '(\n 1,\nNSFileTypeRegular,\n1,\n 22894,\n0,\nstaff,\n234881026,\n294022,\n2004-12-16 10:11:03 -0800,\ntciuro,\n384,\n2006-03-26 08:04:55 -0800,\n 502,\n20\n)'),), (('file0202597f', '(\n 2,\nNSFileTypeRegular,\n2,\n22911,\n0,\nstaff,\n 234881026,\n294022,\n2004-12-16 10:11:06 -0800,\ntciuro,\n 384,\n2006-03-26 08:07:55 -0800,\n502,\n20\n)'),), ... ] zip(csr.execute("SELECT * FROM t WHERE CMValues GLOB '*2004-12-16 10:11:45 -0800*'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff,234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n20\n)'),) ] zip(csr.execute("SELECT * FROM t WHERE CMValues LIKE '%2004-12-16 10:11:45 -0800%'")) [ (('file15155908', '(15,NSFileTypeRegular,15,22792,0,staff,234881026,294022,2004-12-16 10:11:45 -0800,tciuro,384,2006-03-26 08:46:55 -0800,502,20)'),), (('file15155a5c', '(\n 15,\nNSFileTypeRegular,\n15,\n 23132,\n0,\nstaff,\n234881026,\n294022,\n2004-12-16 10:11:45 -0800,\ntciuro,\n384,\n2006-03-26 08:46:55 -0800,\n 502,\n20\n)'),) ] Could you try reducing your search strings and see if there's a point at which they start working? HTH, Martin Jenkins XQP Ltd Ascot, UK - Original Message - From: "Tito Ciuro" <[EMAIL PROTECTED]> To: "Forum SQLite" Sent: Sunday, March 26, 2006 6:50 PM Subject: [sqlite] LIKE and GLOB bug with numbers? Hello, I've populated a datafile with 40.176 records which contain file attributes and file paths. I have two columns, CMKey and CMValues. The column CMKey contains the path to the file and the column CMValues contains the attribute values. For example: CMKey: Application Support/AbiSuite/AbiWord.Profile CMValues: ( 0, NSFileTypeRegular, 1, 21508, 0, staff, 234881026, 294022, 2004-12-16 10:11:35 -0800, tciuro, 384, 2006-03-26 08:35:55 -0800, 502, 20 ) Both columns are of type TEXT. This is what I've found: 1) SELECT * FROM FinderFiles WHERE CMKey GLOB '*AbiWord.Profile*' returns 1 match. This is correct. 2) SELECT * FROM FinderFiles WHERE CMKey LIKE '%ABIWORD.Profile%' returns 1 match. This is correct. 3) SELECT * FROM FinderFiles WHERE CMValues GLOB '*2004-12-16 10:11:35 -0800*' returns 40.176 matches. This is not correct. There is no way I created these 40.176 file at the *very same* time. Just to be sure, I looked at one random file (of the 40.176) and I've obtained the following creation date attribute: NSFileCreationDate = 2004-02-21 06:12:43 -0800; The same problem occurs if I perform the query: SELECT * FROM FinderFiles WHERE CMValues LIKE '%2004-12-16 0:11:35 -0800%' This problem seems to occur when trying to match something with numbers: - If I look for NSFilePosixPermissions 448 (which I know exists) I get zero matches - If I look for strings, such as in step #1 or #2, it works fine. Something is wrong, I just can't figure out why... Any ideas? Is this a bug? Thanks, -- Tito
Re: [sqlite] LIKE and GLOB bug with numbers?
Hello everybody, On 26/03/2006, at 10:08, John Stanton wrote: LIKE and GLOB do a row scan, and give you none of the advantages of an RDBMS. Why not use a flat file and grep and get simplicity and greater speed? I'm very well aware that LIKE and GLOB perform a row scan. I do appreciate your concerns about the presumed lack of suitability of my design, but that is not open for discussion. The original question was related about something very specific: whether there is a bug in LIKE and GLOB when dealing with numbers. Regardless of whether my design is suitable or not, I think it's important to clarify what's happening. Other users might be suffering from this issue without knowing it. Again, thanks for your responses. Regards, -- Tito
Re: [sqlite] LIKE and GLOB bug with numbers?
John Stanton wrote: Tito Ciuro wrote: On 26/03/2006, at 10:51, MGC wrote: Your design is fundamentaly wrong. I don't know what your intended use is for this data, but I am logging identical fstat file info along with an MD5 sums. Well... if you don't know what is the intended use for the data, how can you say that my design is fundamentally wrong? :-) It's not wrong. That's the way it has to be. Now, if I could match the data properly with LIKE and GLOB, that would be great. Thanks for your response though. Regards, -- Tito LIKE and GLOB do a row scan, and give you none of the advantages of an RDBMS. Why not use a flat file and grep and get simplicity and greater speed? Or even simply split your CSV attributes array into separate columns? Or at least those values which you need to use in queries if you're resisting doing that? Cheers! -- -Boris http://bpopov.wordpress.com
Re: [sqlite] LIKE and GLOB bug with numbers?
Tito Ciuro wrote: On 26/03/2006, at 10:51, MGC wrote: Your design is fundamentaly wrong. I don't know what your intended use is for this data, but I am logging identical fstat file info along with an MD5 sums. Well... if you don't know what is the intended use for the data, how can you say that my design is fundamentally wrong? :-) It's not wrong. That's the way it has to be. Now, if I could match the data properly with LIKE and GLOB, that would be great. Thanks for your response though. Regards, -- Tito LIKE and GLOB do a row scan, and give you none of the advantages of an RDBMS. Why not use a flat file and grep and get simplicity and greater speed?
Re: [sqlite] LIKE and GLOB bug with numbers?
On 26/03/2006, at 10:51, MGC wrote: Your design is fundamentaly wrong. I don't know what your intended use is for this data, but I am logging identical fstat file info along with an MD5 sums. Well... if you don't know what is the intended use for the data, how can you say that my design is fundamentally wrong? :-) It's not wrong. That's the way it has to be. Now, if I could match the data properly with LIKE and GLOB, that would be great. Thanks for your response though. Regards, -- Tito
Re: [sqlite] LIKE and GLOB bug with numbers?
There may be an issue, but. Your design is fundamentaly wrong. I don't know what your intended use is for this data, but I am logging identical fstat file info along with an MD5 sums. Each informational element needs to be stored in an individual column. Stuffing all those fields into a single string that needs to be parsed to find ownership or last mod times removes all the benefit from having it in a database in the first place. Stuff it into a sorted flat file. that would be faster and simpler. A database would be 'much better' IMHO but only if it it properly 'normalized'. Mgc ___ Sent with SnapperMail www.snappermail.com .. Original Message ... On Sun, 26 Mar 2006 09:50:31 -0800 "Tito Ciuro" <[EMAIL PROTECTED]> wrote: >Hello, > >I've populated a datafile with 40.176 records which contain file >attributes and file paths. I have two columns, CMKey and CMValues. >The column CMKey contains the path to the file and the column >CMValues contains the attribute values. For example: > >CMKey: Application Support/AbiSuite/AbiWord.Profile > >CMValues: >( > 0, > NSFileTypeRegular, > 1, > 21508, > 0, > staff, > 234881026, > 294022, > 2004-12-16 10:11:35 -0800, > tciuro, > 384, > 2006-03-26 08:35:55 -0800, > 502, > 20 >) > >Both columns are of type TEXT. > >This is what I've found: > >1) SELECT * FROM FinderFiles WHERE CMKey GLOB '*AbiWord.Profile*' >returns 1 match. This is correct. > >2) SELECT * FROM FinderFiles WHERE CMKey LIKE '%ABIWORD.Profile%' >returns 1 match. This is correct. > >3) SELECT * FROM FinderFiles WHERE CMValues GLOB '*2004-12-16 >10:11:35 -0800*' returns 40.176 matches. This is not correct. There >is no way I created these 40.176 file at the *very same* time. Just >to be sure, I looked at one random file (of the 40.176) and I've >obtained the following creation date attribute: > >NSFileCreationDate = 2004-02-21 06:12:43 -0800; > >The same problem occurs if I perform the query: > >SELECT * FROM FinderFiles WHERE CMValues LIKE '%2004-12-16 10:11:35 >-0800%' > >This problem seems to occur when trying to match something with numbers: > >- If I look for NSFilePosixPermissions 448 (which I know exists) I >get zero matches >- If I look for strings, such as in step #1 or #2, it works fine. > >Something is wrong, I just can't figure out why... > >Any ideas? Is this a bug? > >Thanks, > >-- Tito
[sqlite] LIKE and GLOB bug with numbers?
Hello, I've populated a datafile with 40.176 records which contain file attributes and file paths. I have two columns, CMKey and CMValues. The column CMKey contains the path to the file and the column CMValues contains the attribute values. For example: CMKey: Application Support/AbiSuite/AbiWord.Profile CMValues: ( 0, NSFileTypeRegular, 1, 21508, 0, staff, 234881026, 294022, 2004-12-16 10:11:35 -0800, tciuro, 384, 2006-03-26 08:35:55 -0800, 502, 20 ) Both columns are of type TEXT. This is what I've found: 1) SELECT * FROM FinderFiles WHERE CMKey GLOB '*AbiWord.Profile*' returns 1 match. This is correct. 2) SELECT * FROM FinderFiles WHERE CMKey LIKE '%ABIWORD.Profile%' returns 1 match. This is correct. 3) SELECT * FROM FinderFiles WHERE CMValues GLOB '*2004-12-16 10:11:35 -0800*' returns 40.176 matches. This is not correct. There is no way I created these 40.176 file at the *very same* time. Just to be sure, I looked at one random file (of the 40.176) and I've obtained the following creation date attribute: NSFileCreationDate = 2004-02-21 06:12:43 -0800; The same problem occurs if I perform the query: SELECT * FROM FinderFiles WHERE CMValues LIKE '%2004-12-16 10:11:35 -0800%' This problem seems to occur when trying to match something with numbers: - If I look for NSFilePosixPermissions 448 (which I know exists) I get zero matches - If I look for strings, such as in step #1 or #2, it works fine. Something is wrong, I just can't figure out why... Any ideas? Is this a bug? Thanks, -- Tito