Re: [Maria-developers] Patch for parallel replication DBUG_ASSERT check on DROP TABLE et al

2016-08-18 Thread Kristian Nielsen
Michael Widenius  writes:

> Do you happen to know of any failures caused by the patch?

As I mentioned on IRC, the assertion triggers during rpl_mdev6020, for
example here:

  
https://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/9199/steps/test/logs/stdio

As we discussed, one possible reason is a race because did_mark_start_commit
is set only after waking up the next thread.

 - Kristian.

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Patch for parallel replication DBUG_ASSERT check on DROP TABLE et al

2016-08-18 Thread Michael Widenius
Hi!

On Thu, Aug 18, 2016 at 12:24 PM, Kristian Nielsen
 wrote:
> Monty,
>
> Apparently you pushed this patch into 10.0, even though I explained that it
> is incorrect, and why. That's not cool, and you can even see it failing in
> Buildbot now.

I pushed the patch as I didn't see (probably missed) a review from you
for more than a day.  I was also going away for a few days and I
wanted that Elena would have my code in 10.0 while she was testing
things that could trigger the assert.

As this was a DBUG_ASSERT and could not cause a problem for anyone in
production I didn't think it was totally critical to push it before
the review.

> Can you please fix it ASAP?

Of course. Still I don't know of any case in buildbot where the patch
has caused any issues.  I checked with Elena and she couldn't find
anything either that she could attribute to the patch.

Do you happen to know of any failures caused by the patch?

Regards,
Monty

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Sachin weekly report

2016-08-18 Thread Sachin Setia
Hello Sergei!
I am stuck at one problem
consider table t1(a blob , b blob , unique(a,b));
although select * from t1 where a= 12 and b= 23 works
but consider the case like select * from t1 where ( a= 12  or a=45 ) and
(b= 23 or b=45 ) does not works
and also update and delete using long index does not work
simple query like
delete/ update table t1 where a=1 and b=3;
does not works

The reason is that because these query uses test_quick_select function
which does not recognize hash index basically in get_mm_parts it always
return false because it compare db_row_hash_1 to a and b i solved this
problem
but now the problems
1. first how to assure we have both column a, and b is in where currently i
do this as crawl through tree->key[i]->
next_key_part untill next_key_part is null this works for simple case like
a= 1 and b=2 but i am not sure how will i do this in
conditions like
((a =  1 or a =34) and (b=34 or b=33)) or (b=34)
^^^
in this condition before or past it is okay to use hash but for b=34 we
should not use hash index but do not know how to do

2. when SEL_TREE is evaluated ?

3. where should i evaluate hash i can do that but main problem is that i
need to preserve original data  like a=1 and b=3
etc  because hash does not guaranty same.

4 there is one more problem  in test_quick_select

there is one code

if ((range_trp= get_key_scans_params(, tree, FALSE, TRUE,

   best_read_time)))

in the case of hash key before reaching this function we must have to
calculate hash after this it will work
but i guess upto this point our tree is not evaluated , i do not know

 please review branch
https://github.com/SachinSetiya/server/tree/unique_index_where_up_de  and
tell me what should
i do and also review the orignal branch  unique_index_sachin  i added more
test cases and reinsert also works

On Sun, Aug 14, 2016 at 1:46 PM, Sergei Golubchik  wrote:
>
> Hi, Sachin!
>
> I'm reviewing.
> Here I only comment on your reply to the previous review.
>
> > >> +longlong  Item_func_hash::val_int()
> > >> +{
> > >> +  unsigned_flag= true;
> > >> +  ulong nr1= 1,nr2= 4;
> > >> +  CHARSET_INFO *cs;
> > >> +  for(uint i= 0;i > >> +  {
> > >> +String * str = args[i]->val_str();
> > >> +if(args[i]->null_value)
> > >> +{
> > >> +  null_value= 1;
> > >> +  return 0;
> > >> +}
> > >> +cs= str->charset();
> > >> +uchar l[4];
> > >> +int4store(l,str->length());
> > >> +cs->coll->hash_sort(cs,l,sizeof(l), , );
> > > looks good, but use my_charset_binary for the length.
> > did not get it :(
>
> You use
>
>   cs->coll->hash_sort(cs,l,sizeof(l), , );
>
> to sort the length, the byte value of str->length().
> This is binary data, you should have
>
>   cs= my_charset_binary;
i do not find any variable name my_charset_binary but i used

my_charset_utf8_bin is it ok ?

>
> but you have
>
>   cs= str->charset();
>
> for example, it can be latin1_general_ci, and then 'a' and 'A' will be
> considered equal. But 'a' is the string length 97, while 'A' is string
> length 65. String lengths are binary data, they do not consist of
> "letters" or "characters", so you need to use binary charset for
> lengths, but str->charset() for actual string data.
>
> > >> +cs->coll->hash_sort(cs, (uchar *)str->ptr(), str->length(),
, );
>
> ^^^ here cs= str->charset() is correct.
>
> > >> +  }
> > >> +  return   (longlong)nr1;
> > >> +}
> > >> +
> > >> +
> > >> diff --git a/sql/sql_yacc.yy b/sql/sql_yacc.yy
> > >> index e614692..4872d20 100644
> > >> --- a/sql/sql_yacc.yy
> > >> +++ b/sql/sql_yacc.yy
> > >>   | COMMENT_SYM TEXT_STRING_sys { Lex->last_field->comment=
$2; }
> > >> +| HIDDEN
> > >> +  {
> > >> +  LEX *lex =Lex;
> > > no need to do that ^^^ you can simply write
Lex->last_field->field_visibility=...
> > change but in sql_yacc.yy this type of code is use everywhere
>
> Yes :) Many years ago Lex was defined to be current_thd->lex and on some
> platforms current_thd (which is pthread_getspecific) is expensive, so we
> were trying to avoid using it when possible. That's why Lex was saved in
> a local variable.
>
> But now (also, for many years) Lex is YYTHD->Lex, where YYTHD is the
> argument of the MYSQLyyparse function, not pthread_getspecific.
> So there is no need to avoid Lex anymore. Nobody bothered to fix the old
> code, though.
>
> Regards,
> Sergei
> Chief Architect MariaDB
> and secur...@mariadb.org
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] Patch for parallel replication DBUG_ASSERT check on DROP TABLE et al

2016-08-18 Thread Kristian Nielsen
Monty,

Apparently you pushed this patch into 10.0, even though I explained that it
is incorrect, and why. That's not cool, and you can even see it failing in
Buildbot now.

Can you please fix it ASAP?

 - Kristian.

Kristian Nielsen  writes:

> Monty writes:
>
>> Can you please check if this looks correct?
>
>> diff --git a/sql/mdl.cc b/sql/mdl.cc
>> index 28d2006..3b1f9f2 100644
>> --- a/sql/mdl.cc
>> +++ b/sql/mdl.cc
>> @@ -17,6 +17,7 @@
>>  #include "sql_class.h"
>>  #include "debug_sync.h"
>>  #include "sql_array.h"
>> +#include "rpl_rli.h"
>>  #include 
>>  #include 
>>  #include 
>> @@ -442,6 +443,7 @@ class MDL_lock
>>virtual void notify_conflicting_locks(MDL_context *ctx) = 0;
>>  
>>virtual bitmap_t hog_lock_types_bitmap() const = 0;
>> +  bool check_if_conflicting_replication_locks(MDL_context *ctx);
>
> This should probably be #ifdef NDEBUG, since you're only using it in debug
> build?
>
>>  
>>/** List of granted tickets for this lock. */
>>Ticket_list m_granted;
>> @@ -2290,6 +2292,44 @@ void 
>> MDL_scoped_lock::notify_conflicting_locks(MDL_context *ctx)
>>}
>>  }
>>  
>> +/**
>> +  Check if there is any conflicting lock that could cause this thread
>> +  to wait for another thread which is not ready to commit.
>> +  This is always an error, as the upper level of parallel replication
>> +  should not allow a scheduling of a conflicting DDL until all earlier
>> +  transactions has commited.
>> +
>> +  This function is only called for a slave using parallel replication
>> +  and trying to get an exclusive lock for the table.
>> +*/
>> +
>> +bool MDL_lock::check_if_conflicting_replication_locks(MDL_context *ctx)
>> +{
>> +  Ticket_iterator it(m_granted);
>> +  MDL_ticket *conflicting_ticket;
>> +
>> +  while ((conflicting_ticket= it++))
>> +  {
>> +if (conflicting_ticket->get_ctx() != ctx)
>> +{
>> +  MDL_context *conflicting_ctx= conflicting_ticket->get_ctx();
>> +
>> +  /*
>> +If the conflicting thread is another parallel replication
>> +thread for the same master and it's not in commit stage, then
>> +the current transaction has started too early and something is
>> +seriously wrong.
>> +  */
>> +  if (conflicting_ctx->get_thd()->rgi_slave &&
>> +  conflicting_ctx->get_thd()->rgi_slave->rli ==
>> +  ctx->get_thd()->rgi_slave->rli &&
>> +  !conflicting_ctx->get_thd()->rgi_slave->did_mark_start_commit)
>> +return 1;   // Fatal error
>
> No, this is not correct. For example, the threads might be in different
> domains, or using different multi-source connections. You need a check like
> in thd_report_wait_for(). _If_ T1 and T2 are in the same parallel
> replication domain, _and_ T1 goes before T2, _and_ there is a DDL lock
> conflict but T1 has not started to commit, then it indicates a potential
> problem.
>
> Also, the comments are too vague, "Something is seriously wrong" is not a
> good description. What is wrong? Please describe the issue being checked
> more precisely. What is the role of each of the two threads involved? What
> is the situation being checked? Why is it wrong?
>
> Also, use some temporaries for conflicting_ctx->get_thd()->rgi_slave etc, so
> the condition becomes more readable.
>
>> +}
>> +  }
>> +  return 0;
>> +}
>> +
>>  
>>  /**
>>Acquire one lock with waiting for conflicting locks to go away if needed.
>> @@ -2355,6 +2395,11 @@ MDL_context::acquire_lock(MDL_request *mdl_request, 
>> ulong lock_wait_timeout)
>>if (lock->needs_notification(ticket) && lock_wait_timeout)
>>  lock->notify_conflicting_locks(this);
>>  
>> +  DBUG_ASSERT(mdl_request->type != MDL_INTENTION_EXCLUSIVE ||
>> +  !(get_thd()->rgi_slave &&
>> +get_thd()->rgi_slave->is_parallel_exec &&
>> +lock->check_if_conflicting_replication_locks(this)));
>> +
>>mysql_prlock_unlock(>m_rwlock);
>>  
>>will_wait_for(ticket);
>> diff --git a/sql/mdl.h b/sql/mdl.h
>> index c4d792a..13de602 100644
>> --- a/sql/mdl.h
>> +++ b/sql/mdl.h
>> @@ -910,7 +910,6 @@ class MDL_context
>> */
>>MDL_wait_for_subgraph *m_waiting_for;
>>  private:
>> -  THD *get_thd() const { return m_owner->get_thd(); }
>>MDL_ticket *find_ticket(MDL_request *mdl_req,
>>enum_mdl_duration *duration);
>>void release_locks_stored_before(enum_mdl_duration duration, MDL_ticket 
>> *sentinel);
>> @@ -919,6 +918,7 @@ class MDL_context
>>   MDL_ticket **out_ticket);
>>  
>>  public:
>> +  THD *get_thd() const { return m_owner->get_thd(); }
>>void find_deadlock();
>>  
>>ulong get_thread_id() const { return thd_get_thread_id(get_thd()); }
>
>  - Kristian.
>
> ___
> Mailing list: https://launchpad.net/~maria-developers
> Post to : maria-developers@lists.launchpad.net
> Unsubscribe : 

Re: [Maria-developers] [Maria-discuss] Known limitation with TokuDB in Read Free Replication & parallel replication ?

2016-08-18 Thread Kristian Nielsen
Rich Prohaska  writes:

> I have tokudb kill query working with parallel replication.  Needed to
> attach the mysql thd pointer to each lock request, and use the thd as a key
> into the pending lock requests to find the one being used by a particular
> thd.  Once found, this lock request is immediately completed with a lock
> timeout result.  Also needed to rearrange the lock wait for code so that
> the callback is called when not holding any tokudb locks.

Ok, sounds great!

> This solves 1 or 10 parallel replication stalls with tokudb.  The remainder
> require the tokudb lock timer to pop.
>
> The problem is that the parallel replicator attempts to kill the query for
> a mysql slave thread that is waiting for a prior txn to complete.  This
> does not have any effect on tokudb.  The workaround for now is to use an
> extremely small tokudb lock timeout so that the stalls are short.

Hm, that sounds like a problem with the parallel replication code. It should
be normal that a thread needs to be killed in "waiting for prior txn", and
it should work. I will look into this.

Is your latest kill query code on github?

Thanks,

 - Kristian.

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp