Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
I am Sorry I did not see this mail.So I did not send the prototype link in proposal On Thu, Mar 24, 2016 at 10:41 PM, Sergei Golubchik wrote: > Hi, Sachin! > > As you might have noticed on the mailing list, there is another student > who wants to do this project. > > It would help us choose, if you'd put your prototype code on github and > put the link to the repository into your final proposal. > > Thanks! > > On Mar 15, Sachin Setia wrote: > > Dear Developers, > > > > I was doing prototype for this project as Mr Sergei Golubchik suggested > > > > Regards, > Sergei > Chief Architect MariaDB > and secur...@mariadb.org > ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 23, Sachin Setia wrote: > Hello Sergei > Today I made some progress related to project. > MyISAM/ARIA > Got clear understanding of how to implement unique index for query like > create table tbl(col1 int primary key , col2 blob ,col3 blob , > unique(col2,col3)) > InnoDB > Reading about it.Actually Sir, I want to do this project whether I will > select in gsoc or not(because InnoDB is amazing). :) > Proposal > Still Writing don't miss the deadline :) > Actually sir i have one doubt in table2myisam function definition > > recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF), > ^ ^ ^ > why we allocating these many number of recinfo because we only require > share->fields + 1 . good question :) I don't know. this line apparently goes back at least to 2001, there is no revision history beyond that date. It could be that it allocates twice as much columns as necessary nowadays, even if it made sense many years ago. > One more doubt in optimizing "select distinct coloumn_name(here it is > a blob coloumn) from table" query. In mi write which take one record > and write it we check for unique constraint. It takes O(n^2) time. I This isnt O(n^2), because hashes are stored in the index, in a b-tree. So, it's O(n*log(n)). > was thinking if we can optimize this by first fetching the whole table > record and calculating hash for each record.Instead of comparing one > hash with all other we can sort the hashes and ignore the duplicate > (we can make an array of 0 and 1 and if it 1 that means record is not > duplicate and for 0 it is duplicte). by doing this we can reduce the > time complexity to O(nlog(n)). As you see, we already have O(n*log(n)). But if we put these hashes into a hash table in memory (instead of just sorting them), the cost will go down to O(n). Sounds interesting :) Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hello Sergei Today I made some progress related to project. MyISAM/ARIA Got clear understanding of how to implement unique index for query like create table tbl(col1 int primary key , col2 blob ,col3 blob , unique(col2,col3)) InnoDB Reading about it.Actually Sir, I want to do this project whether I will select in gsoc or not(because InnoDB is amazing). Proposal Still Writing Actually sir i have one doubt in table2myisam function definition recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF), ^ ^ ^ why we allocating these many number of recinfo because we only require share->fields + 1 . One more doubt in optimizing "select distinct coloumn_name(here it is a blob coloumn) from table" query. In mi write which take one record and write it we check for unique constraint. It takes O(n^2) time. I was thinking if we can optimize this by first fetching the whole table record and calculating hash for each record.Instead of comparing one hash with all other we can sort the hashes and ignore the duplicate (we can make an array of 0 and 1 and if it 1 that means record is not duplicate and for 0 it is duplicte) .buy doing this we can reduce the time complexity to O(nlog(n)).This will work fast if we have enough buffer_storage in case of low buffer memory this will turn to tradeoff between cpu and i/o requests because in order to sort keys in low ram we need to use m way merge sort which ultimately result in more I/O because we have to send back records to hard disk which we can not store in ram and then once again fetch unique record for storing in tmp table.But we can get performance if records fit in ram .For caching the records we can do it over here sql/sql_select.cc 18313 error= info->read_record(info); Regards sachin On Wed, Mar 23, 2016 at 12:06 AM, Sergei Golubchik wrote: > Hi, Sachin! > > On Mar 22, Sachin Setia wrote: > > Hello Sergei > > Actually I was prototyping for blob and varchar for aria and myisam > > storage engine. > > My prototype worked for complex definations like > > craete table(abc int primary key, blob_col blob unique, varchar_col > > varchar(1000) unique) engine=myisam; > > Solved the issue of frm file incosistance. > > > > As you suggested for doing it for innodb i am current working on > it.Innodb > > *I* did not suggest that. But you're welcome to try, of course. > If you think that just MyISAM is too simple for a three month project. > (Aria doesn't count it's a couple of days after you done MyISAM). > > > does not natively support hash based index. > > when we run select distinct column from tbl; > > it use create_internal_tmp_table() which uses maria storage engine for > > creating tmp table. > > But query like this works > > MariaDB [sachin]> create table iu2(abc blob unique); > > Query OK, 0 rows affected (0.04 sec) > > > > MariaDB [sachin]> insert into iu2 values(1); > > Query OK, 1 row affected (0.03 sec) > > > > MariaDB [sachin]> insert into iu2 values(1); > > ERROR 1062 (23000): Duplicate entry '1' for key 'abc' > > this query does not use hash but it simply compares values > > Interesting. > > > Will write a proposal shortly. > > Okay. > > Regards, > Sergei > Chief Architect MariaDB > and secur...@mariadb.org > ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 22, Sachin Setia wrote: > Hello Sergei > Actually I was prototyping for blob and varchar for aria and myisam > storage engine. > My prototype worked for complex definations like > craete table(abc int primary key, blob_col blob unique, varchar_col > varchar(1000) unique) engine=myisam; > Solved the issue of frm file incosistance. > > As you suggested for doing it for innodb i am current working on it.Innodb *I* did not suggest that. But you're welcome to try, of course. If you think that just MyISAM is too simple for a three month project. (Aria doesn't count it's a couple of days after you done MyISAM). > does not natively support hash based index. > when we run select distinct column from tbl; > it use create_internal_tmp_table() which uses maria storage engine for > creating tmp table. > But query like this works > MariaDB [sachin]> create table iu2(abc blob unique); > Query OK, 0 rows affected (0.04 sec) > > MariaDB [sachin]> insert into iu2 values(1); > Query OK, 1 row affected (0.03 sec) > > MariaDB [sachin]> insert into iu2 values(1); > ERROR 1062 (23000): Duplicate entry '1' for key 'abc' > this query does not use hash but it simply compares values Interesting. > Will write a proposal shortly. Okay. Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hello Sergei Actually I was prototyping for blob and varchar for aria and myisam storage engine. My prototype worked for complex definations like craete table(abc int primary key, blob_col blob unique, varchar_col varchar(1000) unique) engine=myisam; Solved the issue of frm file incosistance. As you suggested for doing it for innodb i am current working on it.Innodb does not natively support hash based index. when we run select distinct column from tbl; it use create_internal_tmp_table() which uses maria storage engine for creating tmp table. But query like this works MariaDB [sachin]> create table iu2(abc blob unique); Query OK, 0 rows affected (0.04 sec) MariaDB [sachin]> insert into iu2 values(1); Query OK, 1 row affected (0.03 sec) MariaDB [sachin]> insert into iu2 values(1); ERROR 1062 (23000): Duplicate entry '1' for key 'abc' this query does not use hash but it simply compares values Will write a proposal shortly. On Tue, Mar 22, 2016 at 4:20 PM, Sachin Setia wrote: > Hello Sergi > Actually I was prototyping for blob and varchar for aria and myisam > storage engine. > My prototype worked for complex definations like > craete table(abc int primary key, blob_col blob unique, varchar_col > varchar(1000) unique) engine=myisam; > Solved the issue of frm file incosistance. > > As you suggested for doing it for innodb i am current working on it.Innodb > does not natively support hash based index. > when we run select distinct column from tbl; > it use create_internal_tmp_table() which uses maria storage engine for > creating tmp table. > But query like this works > MariaDB [sachin]> create table iu2(abc blob unique); > Query OK, 0 rows affected (0.04 sec) > > MariaDB [sachin]> insert into iu2 values(1); > Query OK, 1 row affected (0.03 sec) > > MariaDB [sachin]> insert into iu2 values(1); > ERROR 1062 (23000): Duplicate entry '1' for key 'abc' > this query does not use hash but it simply compares values > Will write a proposal shortly. > > > Regards > sachin > > On Sat, Mar 19, 2016 at 1:52 AM, Sergei Golubchik > wrote: > >> Hi, Sachin! >> >> On Mar 18, Sachin Setia wrote: >> > >> > ERROR 1030 (HY000): Got error 190 "Incompatible key or row definition >> > between the MariaDB .frm file and the information in the storage engine. >> > You have to dump an" from storage engine MyISAM >> > >> > We are getting this becuase in mi_create for each unique_def it creates >> one >> > keydef and writes it.And this creates two problem >> > 1. frm keydef algorithm is different from saved kefdef algorithm(always >> > zero) for the time I have solved this problem . >> > >> > 2.Mismatch between keydef's keysegs the reason for this is when >> mi_create >> > creates keyseg for unique_def it did not keeps the orignal uniquedef's >> > keyseg parameters in mind like language start length which creates >> problem >> > in check_definition function in ha_myisam.cc.I am currently working on >> it >> > Once again sorry for this foolish mistake. >> > Regars >> > sachin >> >> No problem, everyone makes mistakes :) >> >> It's a prototype, after all. It's much more important that you >> understand how the code works and why it doesn't work. >> >> Regards, >> Sergei >> Chief Architect MariaDB >> and secur...@mariadb.org >> > > ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 16, Sachin Setia wrote: > Thanks sir for your reply > I have done two thing > 1. First commenting some code to remove the error we get when we try to > create unique blob > file=sql/sql_table.cc > line no=3877 It'd be easier to read, if you'd sent a patch instead. > /* > * Gsoc 2016 > * I am implementing this so comment this stuff out > */ > // if (!column->length) > // { > //my_error(ER_BLOB_KEY_WITHOUT_LENGTH, MYF(0), column->field_name.str); > //DBUG_RETURN(TRUE); > // } > > 2. here I am assuming this thing for for just prototyping > Assumption My table will just contain two column first will be primary key > second will be unique blob > So basically what i am doing is create a custom unique key and passing it > to mi_create > Of course in real patching i will replace this code with logic like if i > have m unique blobs (i will find this using key_length ==0) > create m unique key array and pass it > > file=storage/myisam/ha_myisam.cc > line no=2067 > //some tweak in share for prototype > share->keys--; > share->uniques=1; > MI_UNIQUEDEF uniquedef; > MI_KEYDEF keydef_blob=*(keydef+1); > bzero((char*) &uniquedef,sizeof(uniquedef)); > uniquedef.keysegs=1; > uniquedef.seg=keydef_blob.seg; > uniquedef.null_are_equal=1; > > /* TODO: Check that the following fn_format is really needed */ > error= mi_create(fn_format(buff, name, "", "", > MY_UNPACK_FILENAME|MY_APPEND_EXT), >share->keys, keydef, >record_count, recinfo, >1, &uniquedef, >&create_info, create_flags); > > If i am doing it wrongly please let me know Looks fine for a prototype. Very good! Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 18, Sachin Setia wrote: > > ERROR 1030 (HY000): Got error 190 "Incompatible key or row definition > between the MariaDB .frm file and the information in the storage engine. > You have to dump an" from storage engine MyISAM > > We are getting this becuase in mi_create for each unique_def it creates one > keydef and writes it.And this creates two problem > 1. frm keydef algorithm is different from saved kefdef algorithm(always > zero) for the time I have solved this problem . > > 2.Mismatch between keydef's keysegs the reason for this is when mi_create > creates keyseg for unique_def it did not keeps the orignal uniquedef's > keyseg parameters in mind like language start length which creates problem > in check_definition function in ha_myisam.cc.I am currently working on it > Once again sorry for this foolish mistake. > Regars > sachin No problem, everyone makes mistakes :) It's a prototype, after all. It's much more important that you understand how the code works and why it doesn't work. Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Thanks sir for your reply I have done two thing 1. First commenting some code to remove the error we get when we try to create unique blob file=sql/sql_table.cc line no=3877 /* * Gsoc 2016 * I am implementing this so comment this stuff out */ // if (!column->length) // { //my_error(ER_BLOB_KEY_WITHOUT_LENGTH, MYF(0), column->field_name.str); //DBUG_RETURN(TRUE); // } 2. here I am assuming this thing for for just prototyping Assumption My table will just contain two column first will be primary key second will be unique blob So basically what i am doing is create a custom unique key and passing it to mi_create Of course in real patching i will replace this code with logic like if i have m unique blobs (i will find this using key_length ==0) create m unique key array and pass it file=storage/myisam/ha_myisam.cc line no=2067 //some tweak in share for prototype share->keys--; share->uniques=1; MI_UNIQUEDEF uniquedef; MI_KEYDEF keydef_blob=*(keydef+1); bzero((char*) &uniquedef,sizeof(uniquedef)); uniquedef.keysegs=1; uniquedef.seg=keydef_blob.seg; uniquedef.null_are_equal=1; /* TODO: Check that the following fn_format is really needed */ error= mi_create(fn_format(buff, name, "", "", MY_UNPACK_FILENAME|MY_APPEND_EXT), share->keys, keydef, record_count, recinfo, 1, &uniquedef, &create_info, create_flags); If i am doing it wrongly please let me know Regards sachin On Wed, Mar 16, 2016 at 12:37 AM, Sergei Golubchik wrote: > Hi, Sachin! > > On Mar 15, Sachin Setia wrote: > > > > I was doing prototype for this project as Mr Sergei Golubchik suggested > > > > "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF > > "keys" for long unique constraints." > > > > Here is my output > > > > MariaDB [sachin]> create table tbl(int_key int primary key , blob_key > > blob unique); > > Query OK, 0 rows affected (0.03 sec) > > > > MariaDB [sachin]> insert into tbl values(1,1); > > Query OK, 1 row affected (0.01 sec) > > > > MariaDB [sachin]> insert into tbl values(2,1); > > ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key' > > Very cool! > > > Should i mail you the source code.Please let me know > > Regards > > sachin > > Sure! Please, do. > > Regards, > Sergei > Chief Architect MariaDB > and secur...@mariadb.org > ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 15, Sachin Setia wrote: > > I was doing prototype for this project as Mr Sergei Golubchik suggested > > "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF > "keys" for long unique constraints." > > Here is my output > > MariaDB [sachin]> create table tbl(int_key int primary key , blob_key > blob unique); > Query OK, 0 rows affected (0.03 sec) > > MariaDB [sachin]> insert into tbl values(1,1); > Query OK, 1 row affected (0.01 sec) > > MariaDB [sachin]> insert into tbl values(2,1); > ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key' Very cool! > Should i mail you the source code.Please let me know > Regards > sachin Sure! Please, do. Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Sorry one correction @-- I am assuming that unique blob will have 0 zero key @++ I am assuming that unique blob will have 0 zero key length Regards sachin On Tue, Mar 15, 2016 at 11:02 PM, Sachin Setia wrote: > Dear Developers, > > I was doing prototype for this project as Mr Sergei Golubchik suggested > > "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF > "keys" for long unique constraints." > > Here is my output > > MariaDB [sachin]> create table tbl(int_key int primary key , blob_key blob > unique); > Query OK, 0 rows affected (0.03 sec) > > MariaDB [sachin]> insert into tbl values(1,1); > Query OK, 1 row affected (0.01 sec) > > MariaDB [sachin]> insert into tbl values(2,1); > ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key' > MariaDB [sachin]> insert into tbl values(2,2); > Query OK, 1 row affected (0.00 sec) > > MariaDB [sachin]> insert into tbl values(3,3); > Query OK, 1 row affected (0.00 sec) > > MariaDB [sachin]> select * from tbl; > +-+--+ > | int_key | blob_key | > +-+--+ > | 1 | 1| > | 2 | 2| > | 3 | 3| > +-+--+ > 3 rows in set (0.00 sec) > > > I am assuming that unique blob will have 0 zero key . > Should i mail you the source code.Please let me know > Regards > sachin > ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
[Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Dear Developers, I was doing prototype for this project as Mr Sergei Golubchik suggested "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF "keys" for long unique constraints." Here is my output MariaDB [sachin]> create table tbl(int_key int primary key , blob_key blob unique); Query OK, 0 rows affected (0.03 sec) MariaDB [sachin]> insert into tbl values(1,1); Query OK, 1 row affected (0.01 sec) MariaDB [sachin]> insert into tbl values(2,1); ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key' MariaDB [sachin]> insert into tbl values(2,2); Query OK, 1 row affected (0.00 sec) MariaDB [sachin]> insert into tbl values(3,3); Query OK, 1 row affected (0.00 sec) MariaDB [sachin]> select * from tbl; +-+--+ | int_key | blob_key | +-+--+ | 1 | 1| | 2 | 2| | 3 | 3| +-+--+ 3 rows in set (0.00 sec) I am assuming that unique blob will have 0 zero key . Should i mail you the source code.Please let me know Regards sachin ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! First: please send questions like this to maria-developers@lists.launchpad.net it's a public mailing list dedicated to MySQL and MariaDB internals, source code, and related things. I am subscribed, so I'll see you mail there, and you may be sure I will, because it won't be accidentally catched by my spam filter, or sorted out in some obscure folder. Furthermore other subscribers will see your question and could reply if I will be not available (e.g. I could be travelling). Thank you. On Mar 14, Sachin Setia wrote: > Thank you sir for your reply but my question is what this pack flag will do > and basically what is pack record will you please suggest source file in > which i can look into to get more understanding of pack length ,pack flag > and pack record There isn't much I can answer - your question is too generic. "pack_flag" is a set of flags that is stored per field in the frm file and it describes some aspects of the field. Search for "pack_flag" in the source tree. If you ask a specific question, I'll be able to provide a more detailed answer. Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hi, Sachin! On Mar 14, Sachin Setia wrote: > Hello Developers >My name is sachin , a third year undergrad student in India.I am > interested in gsoc 2016 > I was debugging the maria db for following query > > create table c4(a1 int primary key,b1 int unique,c1 blob) engine=myisam; > > In mysql_prepare_create_table there is a function > name prepare_create_field which sets the sql_field pack flag . for a1 field > it sets the pack flag =27 . do not quiet get what it means. If you check prepare_create_field, you'll see how pack_flag is set. Regards, Sergei Chief Architect MariaDB and secur...@mariadb.org ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
[Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob
Hello Developers My name is sachin , a third year undergrad student in India.I am interested in gsoc 2016 I was debugging the maria db for following query create table c4(a1 int primary key,b1 int unique,c1 blob) engine=myisam; In mysql_prepare_create_table there is a function name prepare_create_field which sets the sql_field pack flag . for a1 field it sets the pack flag =27 . do not quiet get what it means. ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp