Re: [Maria-developers] Window functions: why some functions are not in select_lex->window_funcs?

2016-03-23 Thread Sergey Petrunia
This is because of this in sql_yacc.yy:


window_func_expr:
  window_func OVER_SYM window_name
  {
$$= new (thd->mem_root) Item_window_func(thd, (Item_sum *) $1, $3);
if ($$ == NULL)
  MYSQL_YYABORT;
  }
|
  window_func OVER_SYM window_spec
  {
LEX *lex= Lex;
if (Select->add_window_spec(thd, lex->win_ref,
Select->group_list,
Select->order_list,
lex->win_frame))
  MYSQL_YYABORT;
$$= new (thd->mem_root) Item_window_func(thd, (Item_sum *) $1,
  thd->lex->win_spec); 
if ($$ == NULL)
  MYSQL_YYABORT;
if (Select->add_window_func((Item_window_func *) $$))
  MYSQL_YYABORT;
^^^ note that add_window_func call is present in this branch
but not in the other.
  }
;

Should I fix this?

On Thu, Mar 24, 2016 at 01:35:38AM +0300, Sergey Petrunia wrote:
> Hello Igor,
> 
> I was trying to move JOIN::process_window_functions() from using
> join->fields_list to using join->select_lex->window_funcs.
> 
> And I discovered that join->select_lex->window_funcs does not contain
> window function items that use window definition.  Is this intentional?
> 
> Example:
> 
> 
> create table t0 (a int);
> insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
> create table t1 (pk int, c int);
> insert into t1 select a+1,1 from t0;
> update t1 set c=2 where pk not in (1,2,3,4);
> 
> select  pk, c, 
>   count(*) over 
>   (partition by c order by pk  rows between 2 preceding and 2 following) as 
> CNT
> from t1 
> 
> Here:
> 
>   #1  0x55af5db6 in AGGR_OP::end_send (this=0x7fff4c0086c8)
> (gdb) p join->select_lex->window_funcs
>   $69 = { = { = {}, first = 
> 0x7fff4c005d18, last = 0x7fff4c005d18, elements = 1}, }
>  
> 
> But:
> 
> select  pk, c, 
>   count(*) over w1 as CNT 
> from t1 
> window w1 as (partition by c order by pk  rows between 2 preceding and 2 
> following);
> 
> 
> (gdb) p join->select_lex->window_funcs
>   $74 = { = { = {}, first = 
> 0x56d48820, last = 0x5ab10b70, elements = 0}, }
> 
> BR
>  Sergei
> -- 
> Sergei Petrunia, Software Developer
> MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog
> 
> 

-- 
BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog



___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


[Maria-developers] Window functions: why some functions are not in select_lex->window_funcs?

2016-03-23 Thread Sergey Petrunia
Hello Igor,

I was trying to move JOIN::process_window_functions() from using
join->fields_list to using join->select_lex->window_funcs.

And I discovered that join->select_lex->window_funcs does not contain
window function items that use window definition.  Is this intentional?

Example:


create table t0 (a int);
insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table t1 (pk int, c int);
insert into t1 select a+1,1 from t0;
update t1 set c=2 where pk not in (1,2,3,4);

select  pk, c, 
  count(*) over 
  (partition by c order by pk  rows between 2 preceding and 2 following) as CNT
from t1 

Here:

  #1  0x55af5db6 in AGGR_OP::end_send (this=0x7fff4c0086c8)
(gdb) p join->select_lex->window_funcs
  $69 = { = { = {}, first = 
0x7fff4c005d18, last = 0x7fff4c005d18, elements = 1}, }
 

But:

select  pk, c, 
  count(*) over w1 as CNT 
from t1 
window w1 as (partition by c order by pk  rows between 2 preceding and 2 
following);


(gdb) p join->select_lex->window_funcs
  $74 = { = { = {}, first = 
0x56d48820, last = 0x5ab10b70, elements = 0}, }

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog



___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] GSoC 2016:Unique indexes for blobs

2016-03-23 Thread Sergei Golubchik
Hi, Jan!

On Mar 23, Jan Lindström wrote:
> Hi,
> 
> All below is correct and naturally you may re-implement duplicate search on
> different index type. I think there is more code that needs change as this
> new index type would contain a column (hash value of the blob field) that
> is not on stored on base table, right ? Indexed columns are also stored
> inside a InnoDB data dictionary persistently, so that part would also need
> change (dict/dict0*.cc files). Actually, you could store blob has also to
> table, it could make things easier. Secondly, remember that unique keys can
> be used inside a InnoDB as foreign keys, this is again a design question,
> do you allow blobs to be foreign keys or not. Finally, unique key with NOT
> NULL can be used as primary key i.e. clustered key on InnoDB, using blobs
> hash on that might be out of reach on this timetable.
> 
> R: Jan Lindström
> Principal Engineer
> InnoDB

Agree, blobs (even UNIQUE NOT NULL) cannot be used as primary keys.
For simplicity I would not support foreign keys either, this can be
added later.

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] GSoC 2016:Unique indexes for blobs

2016-03-23 Thread Jan Lindström
Hi,

All below is correct and naturally you may re-implement duplicate search on
different index type. I think there is more code that needs change as this
new index type would contain a column (hash value of the blob field) that
is not on stored on base table, right ? Indexed columns are also stored
inside a InnoDB data dictionary persistently, so that part would also need
change (dict/dict0*.cc files). Actually, you could store blob has also to
table, it could make things easier. Secondly, remember that unique keys can
be used inside a InnoDB as foreign keys, this is again a design question,
do you allow blobs to be foreign keys or not. Finally, unique key with NOT
NULL can be used as primary key i.e. clustered key on InnoDB, using blobs
hash on that might be out of reach on this timetable.

R: Jan Lindström
Principal Engineer
InnoDB

On Wed, Mar 23, 2016 at 6:38 PM, Shubham Barai 
wrote:

> Hello Sergei,
>
>  I have gone through some of the source files in InnoDB.
>
>   / handler/ha_innodb.cc
>/row/row0mysql.cc
>/row/row0ins.cc
>/dict/dict0mem.cc  etc.
>
> In MyISAM,
>  MI_KEYDEF and MI_UNIQUEDEF structure have most of the variables same
> except MI_KEYDEF uses some extra variables like keylength, minlength,
> maxlength .
>  As we are calculating   a hash of  the column values for long UNIQUE
> constraints ,MI_UNIQUEDEF doesn't need them.
>
> mi_write(inserts a row in MyISAM table) and mi_update(updates a row)
> call mi_unique_check to verify all unique constraints before inserting or
> updating any row.
>
> In InnoDB ,
> dict_index_t is the data structure used for an index.  I think we can use
>  the dict_index_t  structure  for our hash based index .
> It has a type-variable which represents the type of an index (DICT_CLUSTERED,
> DICT_UNIQUE,DICT_UNIVERSAL etc)
>
> For our new index, we can define a new type (let's say
> DICT_NON_UNIQUE_HASH).
>
> Insertion of a row in InnoDB  has a function call graph like this
>
> ha_innobase::write_row(uchar* record)
>
> [image: Inline images 1]
>
>  row_insert_for_mysql(byte* mysql_rec, row_prebuilt_t* prebuilt)
>
> [image: Inline images 1]
>
> row_ins_step(que_thr_t* thr)
>
> [image: Inline images 5]
>
> row_ins(ins_node_t* node,que_thr_t* thr)
>
>  [image: Inline images 7]
>
> row_ins_index_entry_step(ins_node_t* node,que_thr_t* thr)
>
> [image: Inline images 7]
>
> row_ins_index_entry_set_vals(dict_index_t* then row_ins_index_entry (
> dict_index_t* index,  dtuple_t* entry, que_thr_t* thr)   index,dtuple_t*
> entry,const dtuple_t* row)
>
> [image: Inline images 1]
>
>   row_ins_clust_index_entry/
>
> row_ins_sec_index_entry
>
> [image: Inline images 3]
>
>
> row_ins_sec_index_entry_low
>
> [image: Inline images 4]
>
> row_ins_scan_sec_index_for_duplicate
>
>
>
> 1. row_ins(inserts a row to a table)  calls   row_ins_index_entry_step
> for each index defined on a table.
>
>
> 2.  row_ins_index_entry_step first callsrow_ins_index_entry_set_vals
> and then   row_ins_index_entry   .
>
>
>   row_ins_index_entry_set_vals is a function which sets the values of the 
> dtuple
> fields  in entry  from the appropriate columns   in row.
>
> entry is the index tuple which we are going to insert into the index.
>
>
>
>
> 3.  In our hash based index ,instead of storing the actual key value in
> the index, we have to store an only hash of the column values in a key .
>
>
> So for  our new DICT_NON_UNIQUE_HASH index ,we can define a  new function
>
>   row_ins_index_entry_set_vals_hash which will calculate hash of all the
> columns values in a key and create an index entry to insert into index
>
>
> 4. row_ins_sec_index_entry_low is the function which tries to insert an
> entry into the secondary index. If the type of the index is DICT_UNIQUE,
>
> it calls row_ins_scan_sec_index_for_duplicate.
> row_ins_scan_sec_index_for_duplicate scans a unique non-clustered index
>
> to check unique constraints.
>
>
> In our case (if the type is DICT_NON_UNIQUE_HASH), we can define a new
> function row_ins_scan_sec_index_for_duplicate_hash which will scan the
> index and if there are identical hashes in the index,it will retrieve the
> rows and compare actual values.
>
>
> This is my initial approach for implementing long unique constraints in
> InnoDB.
>
> Do you think this approach will work?
>
>
> I would really appreciate your suggestions.
>
>
> Thanks,
>
> Shubham
>
> On 22 March 2016 at 20:58, Shubham Barai  wrote:
>
>> Hello Sergei,
>>
>>  I understood most of the internals of long unique constraints in
>> MyISAM. I am still going through the code in InnoDB. I will soon reply to
>> you.
>>
>> Thanks,
>> Shubham
>>
>> On 21 March 2016 at 16:37, Sergei Golubchik  wrote:
>>
>>> Hi, Shubham!
>>>
>>> On Mar 21, Shubham Barai wrote:
>>> >
>>> > I am currently

Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-23 Thread Sergei Golubchik
Hi, Sachin!

On Mar 23, Sachin Setia wrote:
> Hello Sergei
> Today I made some progress related to project.
> MyISAM/ARIA
>   Got clear understanding of how to implement unique index for query like
>   create table tbl(col1 int primary key , col2 blob ,col3 blob , 
> unique(col2,col3))
> InnoDB
>   Reading about it.Actually Sir, I want to do this project whether I will
>   select in gsoc or not(because InnoDB is amazing).

:)

> Proposal
>   Still Writing

don't miss the deadline :)

> Actually sir i have one doubt in table2myisam function definition
> 
> recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
> ^ ^   ^
> why we allocating these many number of recinfo because we only require
> share->fields + 1 .

good question :) I don't know. this line apparently goes back at least
to 2001, there is no revision history beyond that date.

It could be that it allocates twice as much columns as necessary
nowadays, even if it made sense many years ago.

> One more doubt in optimizing "select distinct coloumn_name(here it is
> a blob coloumn)  from table" query. In mi write which take one record
> and write it we check for unique constraint. It takes O(n^2) time. I

This isnt O(n^2), because hashes are stored in the index, in a b-tree.
So, it's O(n*log(n)).

> was thinking if we can optimize this by first fetching the whole table
> record and calculating hash for each record.Instead of comparing one
> hash with all other we can sort the hashes and ignore the duplicate
> (we can make an array of 0 and 1 and if it 1 that means record is not
> duplicate and for 0 it is duplicte). by doing this we can reduce the
> time complexity to O(nlog(n)).

As you see, we already have O(n*log(n)). But if we put these hashes into
a hash table in memory (instead of just sorting them), the cost will go
down to O(n). Sounds interesting :)

Regards,
Sergei
Chief Architect MariaDB
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] GSoC 2016:Unique indexes for blobs

2016-03-23 Thread Shubham Barai
Hello Sergei,

 I have gone through some of the source files in InnoDB.

  / handler/ha_innodb.cc
   /row/row0mysql.cc
   /row/row0ins.cc
   /dict/dict0mem.cc  etc.

In MyISAM,
 MI_KEYDEF and MI_UNIQUEDEF structure have most of the variables same
except MI_KEYDEF uses some extra variables like keylength, minlength,
maxlength .
 As we are calculating   a hash of  the column values for long UNIQUE
constraints ,MI_UNIQUEDEF doesn't need them.

mi_write(inserts a row in MyISAM table) and mi_update(updates a row)
call mi_unique_check to verify all unique constraints before inserting or
updating any row.

In InnoDB ,
dict_index_t is the data structure used for an index.  I think we can use
 the dict_index_t  structure  for our hash based index .
It has a type-variable which represents the type of an index (DICT_CLUSTERED,
DICT_UNIQUE,DICT_UNIVERSAL etc)

For our new index, we can define a new type (let's say
DICT_NON_UNIQUE_HASH).

Insertion of a row in InnoDB  has a function call graph like this

ha_innobase::write_row(uchar* record)

[image: Inline images 1]

 row_insert_for_mysql(byte* mysql_rec, row_prebuilt_t* prebuilt)

[image: Inline images 1]

row_ins_step(que_thr_t* thr)

[image: Inline images 5]

row_ins(ins_node_t* node,que_thr_t* thr)

 [image: Inline images 7]

row_ins_index_entry_step(ins_node_t* node,que_thr_t* thr)

[image: Inline images 7]

row_ins_index_entry_set_vals(dict_index_t* then row_ins_index_entry (
dict_index_t* index,  dtuple_t* entry, que_thr_t* thr)   index,dtuple_t*
entry,const dtuple_t* row)

[image: Inline images 1]

  row_ins_clust_index_entry/

row_ins_sec_index_entry

[image: Inline images 3]


row_ins_sec_index_entry_low

[image: Inline images 4]

row_ins_scan_sec_index_for_duplicate



1. row_ins(inserts a row to a table)  calls   row_ins_index_entry_step
for each index defined on a table.


2.  row_ins_index_entry_step first callsrow_ins_index_entry_set_vals
and then   row_ins_index_entry   .


  row_ins_index_entry_set_vals is a function which sets the values of
the dtuple
fields  in entry  from the appropriate columns   in row.

entry is the index tuple which we are going to insert into the index.



3.  In our hash based index ,instead of storing the actual key value in the
index, we have to store an only hash of the column values in a key .


So for  our new DICT_NON_UNIQUE_HASH index ,we can define a  new function

  row_ins_index_entry_set_vals_hash which will calculate hash of all the
columns values in a key and create an index entry to insert into index


4. row_ins_sec_index_entry_low is the function which tries to insert an
entry into the secondary index. If the type of the index is DICT_UNIQUE,

it calls row_ins_scan_sec_index_for_duplicate.
row_ins_scan_sec_index_for_duplicate scans a unique non-clustered index

to check unique constraints.


In our case (if the type is DICT_NON_UNIQUE_HASH), we can define a new
function row_ins_scan_sec_index_for_duplicate_hash which will scan the
index and if there are identical hashes in the index,it will retrieve the
rows and compare actual values.


This is my initial approach for implementing long unique constraints in
InnoDB.

Do you think this approach will work?


I would really appreciate your suggestions.


Thanks,

Shubham

On 22 March 2016 at 20:58, Shubham Barai  wrote:

> Hello Sergei,
>
>  I understood most of the internals of long unique constraints in
> MyISAM. I am still going through the code in InnoDB. I will soon reply to
> you.
>
> Thanks,
> Shubham
>
> On 21 March 2016 at 16:37, Sergei Golubchik  wrote:
>
>> Hi, Shubham!
>>
>> On Mar 21, Shubham Barai wrote:
>> >
>> > I am currently looking into InnoDB codebase to see if it is possible
>> > for me to extend this feature to InnoDB storage engine. As  InnoDB
>> > doesn't support this feature internally, it would require more time
>> > than MyISAM.  Any suggestions regarding this would be helpful.
>>
>> Heh, that's very good (and ambitious) :)
>>
>> Do you already know how MyISAM supports arbitrary long UNIQUE
>> constraints internally? It stores only the hash of the value (of the
>> blob, for example) in the non-unique index, and on INSERT it checks if
>> there are identical hashes in the index. If there are (hash collision)
>> it will retrieve the rows and compare actual blob values.
>>
>> It seems that InnoDB should use the same approach as MyISAM. It'll need
>> some care for a case when two concurrent transactions insert conflicting
>> rows - as transaction changes are not visible until commit, you won't
>> see the conflict until it's too late. But gap locks [1] should be able
>> to prevent that.
>>
>> Regards,
>> Sergei
>> Chief Architect MariaDB
>> and secur...@mariadb.org
>>
>> [1]
>> https://dev.mysql.com/doc/refman/5.7/en/innodb

Re: [Maria-developers] Gsoc 2016 Mdev 371 Unique index for blob

2016-03-23 Thread Sachin Setia
Hello Sergei
Today I made some progress related to project.
MyISAM/ARIA
Got clear understanding of how to implement unique index for query like
create table tbl(col1 int primary key , col2 blob ,col3 blob ,
unique(col2,col3))
InnoDB
Reading about it.Actually Sir, I want to do this project whether I will
select in
gsoc or not(because InnoDB is amazing).
Proposal
Still Writing

Actually sir i have one doubt in table2myisam function definition

recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
^ ^   ^
why we allocating these many number of recinfo because we only require
share->fields + 1 .

One more doubt in optimizing "select distinct coloumn_name(here it is a
blob coloumn)  from table"
query. In mi write which take one record and write it we check for unique
constraint. It takes O(n^2)
time. I was thinking if we can optimize this by first fetching the whole
table record and calculating hash for
each record.Instead of comparing one hash with all other we can sort the
hashes and ignore the duplicate
(we can make an array of 0 and 1 and if it 1 that means record is not
duplicate and for 0 it is duplicte)
.buy doing this we can reduce the time complexity to O(nlog(n)).This will
work fast if we have enough buffer_storage
in case of low buffer memory this will turn to tradeoff between cpu and i/o
requests because in order to sort keys
in low ram we need to use m way merge sort which ultimately result in more
I/O because we have to send back records to
hard disk which we can not store in ram and then once again fetch unique
record for storing in tmp table.But we can get
performance if records fit in ram .For caching the records  we can do it
over here
sql/sql_select.cc
  18313 error= info->read_record(info);
 Regards
 sachin

On Wed, Mar 23, 2016 at 12:06 AM, Sergei Golubchik  wrote:

> Hi, Sachin!
>
> On Mar 22, Sachin Setia wrote:
> > Hello Sergei
> > Actually I was prototyping for blob and varchar   for aria and myisam
> > storage engine.
> > My prototype worked for complex definations like
> > craete table(abc int primary key, blob_col blob unique, varchar_col
> > varchar(1000) unique) engine=myisam;
> > Solved the issue of frm file incosistance.
> >
> > As you suggested for doing it for innodb i am current working on
> it.Innodb
>
> *I* did not suggest that. But you're welcome to try, of course.
> If you think that just MyISAM is too simple for a three month project.
> (Aria doesn't count it's a couple of days after you done MyISAM).
>
> > does not natively support hash based index.
> > when we run select distinct column from tbl;
> > it use create_internal_tmp_table() which uses maria storage engine for
> > creating tmp table.
> > But query like this works
> > MariaDB [sachin]> create table iu2(abc blob unique);
> > Query OK, 0 rows affected (0.04 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > Query OK, 1 row affected (0.03 sec)
> >
> > MariaDB [sachin]> insert into iu2 values(1);
> > ERROR 1062 (23000): Duplicate entry '1' for key 'abc'
> > this query does not use hash but it simply compares values
>
> Interesting.
>
> > Will write a proposal shortly.
>
> Okay.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] More memory allocation for key seg

2016-03-23 Thread Sachin Setia
Thanks Sergei , this solved the mystery.
Regards
sachin

On Wed, Mar 23, 2016 at 12:03 AM, Sergei Golubchik  wrote:

> Hi, Sachin!
>
> On Mar 22, Sachin Setia wrote:
> > Hello Devlopers
> > I was debugging the source code of mariadb. My query was
> > MariaDB [sachin]> create table pro2(abc int primary key,xyz int ,
> > ass int ,unique(xyz,ass))engine=myisam;
> >
> > In table2myisam function of file ha_myisam.cc i found that
> > we are allocating more memory for keysegments
> >
> > if (!(my_multi_malloc(MYF(MY_WME),
> >   recinfo_out, (share->fields * 2 + 2) * sizeof(MI_COLUMNDEF),
> >   keydef_out, share->keys * sizeof(MI_KEYDEF),
> >   &keyseg,
> >   (share->key_parts + share->keys) * sizeof(HA_KEYSEG),
> >^ here why this
> >   NullS)))
> >
> > And we donot use these extra key segments because
> > suceeding for loop use only total number of share->key_parts
> > keysegments. And same in ha_maria file.Is this a bug or I am missing
> > something
> > Regards
> > sachin
>
> The last key segment of every key is the row id. For dynamic row format
> it's the offset in the MYD file. For static row format (all rows have
> the same length as in your example above) it's the row number.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp