Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-28 Thread Sachin Setia
I am Sorry I did not see this mail.So I did not send the prototype link in
proposal

On Thu, Mar 24, 2016 at 10:41 PM, Sergei Golubchik  wrote:

> Hi, Sachin!
>
> As you might have noticed on the mailing list, there is another student
> who wants to do this project.
>
> It would help us choose, if you'd put your prototype code on github and
> put the link to the repository into your final proposal.
>
> Thanks!
>
> On Mar 15, Sachin Setia wrote:
> > Dear Developers,
> >
> > I was doing prototype for this project as Mr Sergei Golubchik suggested
> >
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-23 Thread Sergei Golubchik
Hi, Sachin!

On Mar 23, Sachin Setia wrote:
> Hello Sergei
> Today I made some progress related to project.
> MyISAM/ARIA
>   Got clear understanding of how to implement unique index for query like
>   create table tbl(col1 int primary key , col2 blob ,col3 blob , 
> unique(col2,col3))
> InnoDB
>   Reading about it.Actually Sir, I want to do this project whether I will
>   select in gsoc or not(because InnoDB is amazing).

:)

> Proposal
>   Still Writing

don't miss the deadline :)

> Actually sir i have one doubt in table2myisam function definition
> 
> recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
> ^ ^   ^
> why we allocating these many number of recinfo because we only require
> share->fields + 1 .

good question :) I don't know. this line apparently goes back at least
to 2001, there is no revision history beyond that date.

It could be that it allocates twice as much columns as necessary
nowadays, even if it made sense many years ago.

> One more doubt in optimizing "select distinct coloumn_name(here it is
> a blob coloumn)  from table" query. In mi write which take one record
> and write it we check for unique constraint. It takes O(n^2) time. I

This isnt O(n^2), because hashes are stored in the index, in a b-tree.
So, it's O(n*log(n)).

> was thinking if we can optimize this by first fetching the whole table
> record and calculating hash for each record.Instead of comparing one
> hash with all other we can sort the hashes and ignore the duplicate
> (we can make an array of 0 and 1 and if it 1 that means record is not
> duplicate and for 0 it is duplicte). by doing this we can reduce the
> time complexity to O(nlog(n)).

As you see, we already have O(n*log(n)). But if we put these hashes into
a hash table in memory (instead of just sorting them), the cost will go
down to O(n). Sounds interesting :)

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-23 Thread Sachin Setia
Hello Sergei
Today I made some progress related to project.
MyISAM/ARIA
Got clear understanding of how to implement unique index for query like
create table tbl(col1 int primary key , col2 blob ,col3 blob ,
unique(col2,col3))
InnoDB
Reading about it.Actually Sir, I want to do this project whether I will
select in
gsoc or not(because InnoDB is amazing).
Proposal
Still Writing

Actually sir i have one doubt in table2myisam function definition

recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
^ ^   ^
why we allocating these many number of recinfo because we only require
share->fields + 1 .

One more doubt in optimizing "select distinct coloumn_name(here it is a
blob coloumn)  from table"
query. In mi write which take one record and write it we check for unique
constraint. It takes O(n^2)
time. I was thinking if we can optimize this by first fetching the whole
table record and calculating hash for
each record.Instead of comparing one hash with all other we can sort the
hashes and ignore the duplicate
(we can make an array of 0 and 1 and if it 1 that means record is not
duplicate and for 0 it is duplicte)
.buy doing this we can reduce the time complexity to O(nlog(n)).This will
work fast if we have enough buffer_storage
in case of low buffer memory this will turn to tradeoff between cpu and i/o
requests because in order to sort keys
in low ram we need to use m way merge sort which ultimately result in more
I/O because we have to send back records to
hard disk which we can not store in ram and then once again fetch unique
record for storing in tmp table.But we can get
performance if records fit in ram .For caching the records  we can do it
over here
sql/sql_select.cc
  18313 error= info->read_record(info);
 Regards
 sachin

On Wed, Mar 23, 2016 at 12:06 AM, Sergei Golubchik  wrote:

> Hi, Sachin!
>
> On Mar 22, Sachin Setia wrote:
> > Hello Sergei
> > Actually I was prototyping for blob and varchar   for aria and myisam
> > storage engine.
> > My prototype worked for complex definations like
> > craete table(abc int primary key, blob_col blob unique, varchar_col
> > varchar(1000) unique) engine=myisam;
> > Solved the issue of frm file incosistance.
> >
> > As you suggested for doing it for innodb i am current working on
> it.Innodb
>
> *I* did not suggest that. But you're welcome to try, of course.
> If you think that just MyISAM is too simple for a three month project.
> (Aria doesn't count it's a couple of days after you done MyISAM).
>
> > does not natively support hash based index.
> > when we run select distinct column from tbl;
> > it use create_internal_tmp_table() which uses maria storage engine for
> > creating tmp table.
> > But query like this works
> > MariaDB [sachin]> create table iu2(abc blob unique);
> > Query OK, 0 rows affected (0.04 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > Query OK, 1 row affected (0.03 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
> > this query does not use hash but it simply compares values
>
> Interesting.
>
> > Will write a proposal shortly.
>
> Okay.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-22 Thread Sergei Golubchik
Hi, Sachin!

On Mar 22, Sachin Setia wrote:
> Hello Sergei
> Actually I was prototyping for blob and varchar   for aria and myisam
> storage engine.
> My prototype worked for complex definations like
> craete table(abc int primary key, blob_col blob unique, varchar_col
> varchar(1000) unique) engine=myisam;
> Solved the issue of frm file incosistance.
> 
> As you suggested for doing it for innodb i am current working on it.Innodb

*I* did not suggest that. But you're welcome to try, of course.
If you think that just MyISAM is too simple for a three month project.
(Aria doesn't count it's a couple of days after you done MyISAM).

> does not natively support hash based index.
> when we run select distinct column from tbl;
> it use create_internal_tmp_table() which uses maria storage engine for
> creating tmp table.
> But query like this works
> MariaDB [sachin]> create table iu2(abc blob unique);
> Query OK, 0 rows affected (0.04 sec)
> 
> MariaDB [sachin]> insert into iu2 values(1);
> Query OK, 1 row affected (0.03 sec)
> 
> MariaDB [sachin]> insert into iu2 values(1);
> ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
> this query does not use hash but it simply compares values

Interesting.

> Will write a proposal shortly.

Okay.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-22 Thread Sachin Setia
Hello Sergei
Actually I was prototyping for blob and varchar   for aria and myisam
storage engine.
My prototype worked for complex definations like
craete table(abc int primary key, blob_col blob unique, varchar_col
varchar(1000) unique) engine=myisam;
Solved the issue of frm file incosistance.

As you suggested for doing it for innodb i am current working on it.Innodb
does not natively support hash based index.
when we run select distinct column from tbl;
it use create_internal_tmp_table() which uses maria storage engine for
creating tmp table.
But query like this works
MariaDB [sachin]> create table iu2(abc blob unique);
Query OK, 0 rows affected (0.04 sec)

MariaDB [sachin]> insert into iu2 values(1);
Query OK, 1 row affected (0.03 sec)

MariaDB [sachin]> insert into iu2 values(1);
ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
this query does not use hash but it simply compares values
Will write a proposal shortly.

On Tue, Mar 22, 2016 at 4:20 PM, Sachin Setia 
wrote:

> Hello Sergi
> Actually I was prototyping for blob and varchar   for aria and myisam
> storage engine.
> My prototype worked for complex definations like
> craete table(abc int primary key, blob_col blob unique, varchar_col
> varchar(1000) unique) engine=myisam;
> Solved the issue of frm file incosistance.
>
> As you suggested for doing it for innodb i am current working on it.Innodb
> does not natively support hash based index.
> when we run select distinct column from tbl;
> it use create_internal_tmp_table() which uses maria storage engine for
> creating tmp table.
> But query like this works
> MariaDB [sachin]> create table iu2(abc blob unique);
> Query OK, 0 rows affected (0.04 sec)
>
> MariaDB [sachin]> insert into iu2 values(1);
> Query OK, 1 row affected (0.03 sec)
>
> MariaDB [sachin]> insert into iu2 values(1);
> ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
> this query does not use hash but it simply compares values
> Will write a proposal shortly.
>
>
> Regards
> sachin
>
> On Sat, Mar 19, 2016 at 1:52 AM, Sergei Golubchik 
> wrote:
>
>> Hi, Sachin!
>>
>> On Mar 18, Sachin Setia wrote:
>> >
>> > ERROR 1030 (HY000): Got error 190 "Incompatible key or row definition
>> > between the MariaDB .frm file and the information in the storage engine.
>> > You have to dump an" from storage engine MyISAM
>> >
>> > We are getting this becuase in mi_create for each unique_def it creates
>> one
>> > keydef and writes it.And this creates two  problem
>> > 1. frm keydef algorithm is different from saved kefdef algorithm(always
>> > zero) for the time I have solved this problem .
>> >
>> > 2.Mismatch between keydef's keysegs the reason for this is when
>> mi_create
>> > creates keyseg for unique_def it did not keeps the orignal uniquedef's
>> > keyseg parameters in mind like language start length which creates
>> problem
>> > in check_definition function in ha_myisam.cc.I am currently working on
>> it
>> > Once again sorry for this foolish mistake.
>> > Regars
>> > sachin
>>
>> No problem, everyone makes mistakes :)
>>
>> It's a prototype, after all. It's much more important that you
>> understand how the code works and why it doesn't work.
>>
>> Regards,
>> Sergei
>> Chief Architect MariaDB
>> and secur...@mariadb.org
>>
>
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-19 Thread Sergei Golubchik
Hi, Sachin!

On Mar 16, Sachin Setia wrote:
> Thanks sir for your reply
> I have done two thing
> 1. First commenting some code to remove the error we get when we try to
> create unique blob
> file=sql/sql_table.cc
> line no=3877

It'd be easier to read, if you'd sent a patch instead.

> /*
> * Gsoc 2016
> * I am implementing this so comment this stuff out
>  */
> //  if (!column->length)
> //  {
> //my_error(ER_BLOB_KEY_WITHOUT_LENGTH, MYF(0), column->field_name.str);
> //DBUG_RETURN(TRUE);
> //  }
> 
> 2. here I am assuming this thing for for just prototyping
> Assumption My table will just contain two column first will be primary key
> second will be unique blob
> So basically what i am doing is create a custom unique key and passing it
> to mi_create
> Of course in real patching i will replace this code with logic like if i
> have m unique blobs (i will find this using key_length ==0)
> create m unique key array and pass it
> 
> file=storage/myisam/ha_myisam.cc
> line no=2067
> //some tweak in share for prototype
> share->keys--;
> share->uniques=1;
> MI_UNIQUEDEF uniquedef;
> MI_KEYDEF keydef_blob=*(keydef+1);
> bzero((char*) ,sizeof(uniquedef));
> uniquedef.keysegs=1;
> uniquedef.seg=keydef_blob.seg;
> uniquedef.null_are_equal=1;
> 
>   /* TODO: Check that the following fn_format is really needed */
>   error= mi_create(fn_format(buff, name, "", "",
>  MY_UNPACK_FILENAME|MY_APPEND_EXT),
>share->keys, keydef,
>record_count, recinfo,
>1, ,
>_info, create_flags);
> 
> If i am doing it wrongly please let me know

Looks fine for a prototype.
Very good!

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-19 Thread Sergei Golubchik
Hi, Sachin!

On Mar 18, Sachin Setia wrote:
> 
> ERROR 1030 (HY000): Got error 190 "Incompatible key or row definition
> between the MariaDB .frm file and the information in the storage engine.
> You have to dump an" from storage engine MyISAM
> 
> We are getting this becuase in mi_create for each unique_def it creates one
> keydef and writes it.And this creates two  problem
> 1. frm keydef algorithm is different from saved kefdef algorithm(always
> zero) for the time I have solved this problem .
> 
> 2.Mismatch between keydef's keysegs the reason for this is when mi_create
> creates keyseg for unique_def it did not keeps the orignal uniquedef's
> keyseg parameters in mind like language start length which creates problem
> in check_definition function in ha_myisam.cc.I am currently working on it
> Once again sorry for this foolish mistake.
> Regars
> sachin

No problem, everyone makes mistakes :)

It's a prototype, after all. It's much more important that you
understand how the code works and why it doesn't work.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-15 Thread Sachin Setia
Thanks sir for your reply
I have done two thing
1. First commenting some code to remove the error we get when we try to
create unique blob
file=sql/sql_table.cc
line no=3877
/*
* Gsoc 2016
* I am implementing this so comment this stuff out
 */
//  if (!column->length)
//  {
//my_error(ER_BLOB_KEY_WITHOUT_LENGTH, MYF(0), column->field_name.str);
//DBUG_RETURN(TRUE);
//  }


2. here I am assuming this thing for for just prototyping
Assumption My table will just contain two column first will be primary key
second will be unique blob
So basically what i am doing is create a custom unique key and passing it
to mi_create
Of course in real patching i will replace this code with logic like if i
have m unique blobs (i will find this using key_length ==0)
create m unique key array and pass it

file=storage/myisam/ha_myisam.cc
line no=2067
//some tweak in share for prototype
share->keys--;
share->uniques=1;
MI_UNIQUEDEF uniquedef;
MI_KEYDEF keydef_blob=*(keydef+1);
bzero((char*) ,sizeof(uniquedef));
uniquedef.keysegs=1;
uniquedef.seg=keydef_blob.seg;
uniquedef.null_are_equal=1;

  /* TODO: Check that the following fn_format is really needed */
  error= mi_create(fn_format(buff, name, "", "",
 MY_UNPACK_FILENAME|MY_APPEND_EXT),
   share->keys, keydef,
   record_count, recinfo,
   1, ,
   _info, create_flags);

If i am doing it wrongly please let me know
Regards
sachin

On Wed, Mar 16, 2016 at 12:37 AM, Sergei Golubchik  wrote:

> Hi, Sachin!
>
> On Mar 15, Sachin Setia wrote:
> >
> > I was doing prototype for this project as Mr Sergei Golubchik suggested
> >
> > "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF
> > "keys" for long unique constraints."
> >
> > Here is my output
> >
> > MariaDB [sachin]> create table tbl(int_key int primary key , blob_key
> > blob unique);
> > Query OK, 0 rows affected (0.03 sec)
> >
> > MariaDB [sachin]> insert into tbl values(1,1);
> > Query OK, 1 row affected (0.01 sec)
> >
> > MariaDB [sachin]> insert into tbl values(2,1);
> > ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key'
>
> Very cool!
>
> > Should i mail you the source code.Please let me know
> > Regards
> > sachin
>
> Sure! Please, do.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-15 Thread Sergei Golubchik
Hi, Sachin!

On Mar 15, Sachin Setia wrote:
> 
> I was doing prototype for this project as Mr Sergei Golubchik suggested
> 
> "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF
> "keys" for long unique constraints."
> 
> Here is my output
> 
> MariaDB [sachin]> create table tbl(int_key int primary key , blob_key
> blob unique);
> Query OK, 0 rows affected (0.03 sec)
> 
> MariaDB [sachin]> insert into tbl values(1,1);
> Query OK, 1 row affected (0.01 sec)
> 
> MariaDB [sachin]> insert into tbl values(2,1);
> ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key'

Very cool!

> Should i mail you the source code.Please let me know
> Regards
> sachin

Sure! Please, do.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-15 Thread Sachin Setia
Sorry one correction
@-- I am assuming that unique blob will have 0 zero key
@++ I am assuming that unique blob will have 0 zero key length
Regards
sachin

On Tue, Mar 15, 2016 at 11:02 PM, Sachin Setia 
wrote:

> Dear Developers,
>
> I was doing prototype for this project as Mr Sergei Golubchik suggested
>
> "This task is about creating MI_UNIQUEDEF "uniques" instead of MI_KEYDEF
> "keys" for long unique constraints."
>
> Here is my output
>
> MariaDB [sachin]> create table tbl(int_key int primary key , blob_key blob 
> unique);
> Query OK, 0 rows affected (0.03 sec)
>
> MariaDB [sachin]> insert into tbl values(1,1);
> Query OK, 1 row affected (0.01 sec)
>
> MariaDB [sachin]> insert into tbl values(2,1);
> ERROR 1062 (23000): Duplicate entry '1' for key 'blob_key'
> MariaDB [sachin]> insert into tbl values(2,2);
> Query OK, 1 row affected (0.00 sec)
>
> MariaDB [sachin]> insert into tbl values(3,3);
> Query OK, 1 row affected (0.00 sec)
>
> MariaDB [sachin]> select * from tbl;
> +-+--+
> | int_key | blob_key |
> +-+--+
> |   1 | 1|
> |   2 | 2|
> |   3 | 3|
> +-+--+
> 3 rows in set (0.00 sec)
>
>
> I am assuming that unique blob will have 0 zero key .
> Should i mail you the source code.Please let me know
> Regards
> sachin
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-15 Thread Sergei Golubchik
Hi, Sachin!

First: please send questions like this to maria-developers@lists.launchpad.net
it's a public mailing list dedicated to MySQL and MariaDB internals, source
code, and related things. I am subscribed, so I'll see you mail there, and you
may be sure I will, because it won't be accidentally catched by my spam filter,
or sorted out in some obscure folder. Furthermore other subscribers will see
your question and could reply if I will be not available (e.g. I could be
travelling).
Thank you.

On Mar 14, Sachin Setia wrote:
> Thank you sir for your reply but my question is what this pack flag will do
> and basically what is pack record will you please suggest source file in
> which i can look into to get more understanding of pack length ,pack flag
> and pack record

There isn't much I can answer - your question is too generic.
"pack_flag" is a set of flags that is stored per field in the frm file
and it describes some aspects of the field.

Search for "pack_flag" in the source tree. If you ask a specific
question, I'll be able to provide a more detailed answer.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-14 Thread Sergei Golubchik
Hi, Sachin!

On Mar 14, Sachin Setia wrote:
> Hello Developers
>My name is sachin , a third year undergrad student in India.I am
> interested in gsoc 2016
> I was debugging the maria db for following query
> 
> create table c4(a1 int primary key,b1 int unique,c1 blob) engine=myisam;
> 
> In mysql_prepare_create_table  there is a function
> name prepare_create_field which sets the sql_field pack flag . for a1 field
> it sets the pack flag =27  . do not quiet get what it means.

If you check prepare_create_field, you'll see how pack_flag is set.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp