Re: UDF behaves non-deterministic

2012-11-07 Thread Stefan Kuhn
Hi everybody,
it looks like the reason for the problem was me not handling  string arguments 
properly (I did not use the provided lengths, but relied on string being 
null-terminated, it's in the doc, but ...). It seems this became a problem 
specifically in the parallel situation, misleading me into believing it had 
to do with this. Thanks to everybody for help! Even though the solution was 
not directly provided, the comments made me think about my code and so were 
still helpfull.
Stefan

On Monday 05 November 2012 15:08:51 Michael Dykman wrote:
> C is not an inherently thread-safe language.  Several of the standard
> library functions use static data, which gets stepped on during concurrent
> operation.  Many of those do have thread-safe equivalents on many platforms
> such as strtok/strtok_r (the latter being the safe one).
>
> If you are confident you are not using statics or globals in your code
> directly, you will need to identify each function you do call.  Start by
> reading the man page for that function (if it's in the C stdlib, there is a
> man page for it) which should tell you if it is safe or not; for those
> which are not, the man page will likely suggest a threadsafe alternative if
> one is available.  If none are available, you might have to consider a
> mutex.
>
>  - michael dykman
>
> On Mon, Nov 5, 2012 at 9:28 AM, Stefan Kuhn  wrote:
> > Hi Dan,
> >
> > thanks for your answer. The UDF only contains functions (the one called
> > in sql plus two functions called in it). There are no variables outside
> > them and nothing is declared static. All variables inside the functions
> > are declared just like "double x=0;" etc. I am not an expert on C, but my
> > understanding is that these values are separate for each call of the
> > function and don't influence each other. Do you have a suggestion what I
> > should look for in my c code? Or do I need to make the code thread-safe
> > in that sense that concurrent executions are prevented by monitors or
> > semaphors or so (no idea about what this is called in c)?
> > Stefan
> >
> > >The first thing I would do is examine your UDF and ensure that it is
> > >thread-safe. No global variables, no static variables within functions,
> > >etc. Also make sure that any libc functions you call that are documented
> >
> > as
> >
> > >non-threadsafe are wrapped by a mutex or otherwise protected against
> > >multiple simultaneous access.
> > >
> > >http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html
> > >
> > >As for debugging, you should be able to write things to stderr which
> > > will show up in the mysql logfile, or you could open your own logfile
> > > and write to that.
> >
> > --
> > Dan Nelson
> > dnel...@allantgroup.com
> >
> >
> >
> >
> > --
> > MySQL General Mailing List
> > For list archives: http://lists.mysql.com/mysql
> > To unsubscribe:http://lists.mysql.com/mysql



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-05 Thread Stefan Kuhn
On Monday 05 November 2012 18:02:28 h...@tbbs.net wrote:
>  2012/11/04 22:23 +, Stefan Kuhn 
>
> select * from table order by udf(column, 'input_value') desc;
> For my understanding, this should give the same result always.
> 
> But if for your data function "udf" returns the same for more arguments
> there is not enough to fix the order. In that case I have found that other
> accidental things affect the order, things that one would not suspect:
> howmuch store is used and needed for the ordering, ... a further reason for
> showing what the function returns. If the order varies, although the
> function returns the same in all cases, well, 
True, but I am missing records from the top ten which should definitly be in, 
so this should not be the problem here. I am investigating things further...
Stefan



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-05 Thread hsv
 2012/11/04 22:23 +, Stefan Kuhn 
select * from table order by udf(column, 'input_value') desc;
For my understanding, this should give the same result always. 

But if for your data function "udf" returns the same for more arguments there 
is not enough to fix the order. In that case I have found that other accidental 
things affect the order, things that one would not suspect: howmuch store is 
used and needed for the ordering, ... a further reason for showing what the 
function returns. If the order varies, although the function returns the same 
in all cases, well, 


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-05 Thread walter harms

can you reduce the UDF just to return 1; ?
that should give you a clue what is going on. Random
values usualy point to two suspects
1. mixing 32bit and 64bit
2. using void instead of int

re,
 wh

Am 04.11.2012 23:23, schrieb Stefan Kuhn:
> Hi all,
> I have a weired (for me at least) problem with a user defined function, 
> written in C. The function seems to return different results in different 
> runs (the code of the function does not contain random elements). Basically, 
> the function calculates a score based on a column in a table and an input 
> value. So I do something like this:
> select * from table order by udf(column, 'input_value') desc;
> For my understanding, this should give the same result always. But if I run 
> many statements (execution is from a java program and I can do it in parallel 
> threads) so that they overlap (the udf on a large table takes 5-10 s on a 
> slow machine), the results of some queries are different. If I have enough 
> time between statements, it seems to work, i. e. the result is always the 
> same. I would have thought the statements are independent, even if executed 
> on different jdbc connections in parallel.
> Does somebody have an idea?
> Or could somebody give an idea on debugging? Normally I would try to debug 
> the 
> code to see what goes on, but how can I do this in a udf? Can I log in the 
> udf?
> Thanks for any hints,
> Stefan
> 

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: Re: UDF behaves non-deterministic

2012-11-05 Thread Michael Dykman
C is not an inherently thread-safe language.  Several of the standard
library functions use static data, which gets stepped on during concurrent
operation.  Many of those do have thread-safe equivalents on many platforms
such as strtok/strtok_r (the latter being the safe one).

If you are confident you are not using statics or globals in your code
directly, you will need to identify each function you do call.  Start by
reading the man page for that function (if it's in the C stdlib, there is a
man page for it) which should tell you if it is safe or not; for those
which are not, the man page will likely suggest a threadsafe alternative if
one is available.  If none are available, you might have to consider a
mutex.

 - michael dykman


On Mon, Nov 5, 2012 at 9:28 AM, Stefan Kuhn  wrote:

> Hi Dan,
>
> thanks for your answer. The UDF only contains functions (the one called in
> sql plus two functions called in it). There are no variables outside them
> and nothing is declared static. All variables inside the functions are
> declared just like "double x=0;" etc. I am not an expert on C, but my
> understanding is that these values are separate for each call of the
> function and don't influence each other. Do you have a suggestion what I
> should look for in my c code? Or do I need to make the code thread-safe in
> that sense that concurrent executions are prevented by monitors or
> semaphors or so (no idea about what this is called in c)?
> Stefan
>
> >The first thing I would do is examine your UDF and ensure that it is
> >thread-safe. No global variables, no static variables within functions,
> >etc. Also make sure that any libc functions you call that are documented
> as
> >non-threadsafe are wrapped by a mutex or otherwise protected against
> >multiple simultaneous access.
> >
> >http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html
> >
> >As for debugging, you should be able to write things to stderr which will
> >show up in the mysql logfile, or you could open your own logfile and write
> >to that.
>
> --
> Dan Nelson
> dnel...@allantgroup.com
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql
>
>


-- 
 - michael dykman
 - mdyk...@gmail.com

 May the Source be with you.


Aw: Re: UDF behaves non-deterministic

2012-11-05 Thread Stefan Kuhn
Hi Dan,

thanks for your answer. The UDF only contains functions (the one called in sql 
plus two functions called in it). There are no variables outside them and 
nothing is declared static. All variables inside the functions are declared 
just like "double x=0;" etc. I am not an expert on C, but my understanding is 
that these values are separate for each call of the function and don't 
influence each other. Do you have a suggestion what I should look for in my c 
code? Or do I need to make the code thread-safe in that sense that concurrent 
executions are prevented by monitors or semaphors or so (no idea about what 
this is called in c)?
Stefan

>The first thing I would do is examine your UDF and ensure that it is
>thread-safe. No global variables, no static variables within functions,
>etc. Also make sure that any libc functions you call that are documented as
>non-threadsafe are wrapped by a mutex or otherwise protected against
>multiple simultaneous access.
>
>http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html
>
>As for debugging, you should be able to write things to stderr which will
>show up in the mysql logfile, or you could open your own logfile and write
>to that.

--
Dan Nelson
dnel...@allantgroup.com




-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-04 Thread Dan Nelson
In the last episode (Nov 04), Stefan Kuhn said:
> I have a weired (for me at least) problem with a user defined function,
> written in C.  The function seems to return different results in different
> runs (the code of the function does not contain random elements). 
> Basically, the function calculates a score based on a column in a table
> and an input value.  So I do something like this:
>
> select * from table order by udf(column, 'input_value') desc;
>
> For my understanding, this should give the same result always. But if I
> run many statements (execution is from a java program and I can do it in
> parallel threads) so that they overlap (the udf on a large table takes
> 5-10 s on a slow machine), the results of some queries are different.  If
> I have enough time between statements, it seems to work, i.  e.  the
> result is always the same.  I would have thought the statements are
> independent, even if executed on different jdbc connections in parallel.
>
> Does somebody have an idea?  Or could somebody give an idea on debugging? 
> Normally I would try to debug the code to see what goes on, but how can I
> do this in a udf?  Can I log in the udf?

The first thing I would do is examine your UDF and ensure that it is
thread-safe.  No global variables, no static variables within functions,
etc.  Also make sure that any libc functions you call that are documented as
non-threadsafe are wrapped by a mutex or otherwise protected against
multiple simultaneous access.

http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html

As for debugging, you should be able to write things to stderr which will
show up in the mysql logfile, or you could open your own logfile and write
to that.

-- 
Dan Nelson
dnel...@allantgroup.com

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-04 Thread Stefan Kuhn
On Sunday 04 November 2012 22:34:22 Michael Dykman wrote:
> A couple of questions present.
>
> You mention that selecting from the whole table takes 5-10s so I assume you
> have a lot of records.
Yes, and the calculation of the score is fairly complicated. Plust the test 
server is slow (Pentium III machine, old, but working)
>   is the data not in flux? are you sure?
Yes, I am. I have a test server, where nothing happens.
>   these conflict queries are all on the same server?
Yes, one mysql instance on one server
>
> i would have structured the query like so:
>   select *, udf(column,'value') AS u from table order by u;
I tried this and whilst it gives a speedup (around 25%, I would say), it does 
not solve the problem (but thanks for the hint, I didn't think this makes a 
difference).
>
> I suspect it might reduce the number of udf invocations..  the order by
> clause is frequently referred to in the process of sorting.. keeping that
> static instead of dynamic might sanitize your issue.
>
> On 2012-11-04 4:24 PM, "Stefan Kuhn"  wrote:
>
> Hi all,
> I have a weired (for me at least) problem with a user defined function,
> written in C. The function seems to return different results in different
> runs (the code of the function does not contain random elements).
> Basically, the function calculates a score based on a column in a table and
> an input value. So I do something like this:
> select * from table order by udf(column, 'input_value') desc;
> For my understanding, this should give the same result always. But if I run
> many statements (execution is from a java program and I can do it in
> parallel
> threads) so that they overlap (the udf on a large table takes 5-10 s on a
> slow machine), the results of some queries are different. If I have enough
> time between statements, it seems to work, i. e. the result is always the
> same. I would have thought the statements are independent, even if executed
> on different jdbc connections in parallel.
> Does somebody have an idea?
> Or could somebody give an idea on debugging? Normally I would try to debug
> the
> code to see what goes on, but how can I do this in a udf? Can I log in the
> udf?
> Thanks for any hints,
> Stefan
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:http://lists.mysql.com/mysql



-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: UDF behaves non-deterministic

2012-11-04 Thread Michael Dykman
A couple of questions present.

You mention that selecting from the whole table takes 5-10s so I assume you
have a lot of records.
  is the data not in flux? are you sure?
  these conflict queries are all on the same server?

i would have structured the query like so:
  select *, udf(column,'value') AS u from table order by u;

I suspect it might reduce the number of udf invocations..  the order by
clause is frequently referred to in the process of sorting.. keeping that
static instead of dynamic might sanitize your issue.

On 2012-11-04 4:24 PM, "Stefan Kuhn"  wrote:

Hi all,
I have a weired (for me at least) problem with a user defined function,
written in C. The function seems to return different results in different
runs (the code of the function does not contain random elements). Basically,
the function calculates a score based on a column in a table and an input
value. So I do something like this:
select * from table order by udf(column, 'input_value') desc;
For my understanding, this should give the same result always. But if I run
many statements (execution is from a java program and I can do it in
parallel
threads) so that they overlap (the udf on a large table takes 5-10 s on a
slow machine), the results of some queries are different. If I have enough
time between statements, it seems to work, i. e. the result is always the
same. I would have thought the statements are independent, even if executed
on different jdbc connections in parallel.
Does somebody have an idea?
Or could somebody give an idea on debugging? Normally I would try to debug
the
code to see what goes on, but how can I do this in a udf? Can I log in the
udf?
Thanks for any hints,
Stefan

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql


UDF behaves non-deterministic

2012-11-04 Thread Stefan Kuhn
Hi all,
I have a weired (for me at least) problem with a user defined function, 
written in C. The function seems to return different results in different 
runs (the code of the function does not contain random elements). Basically, 
the function calculates a score based on a column in a table and an input 
value. So I do something like this:
select * from table order by udf(column, 'input_value') desc;
For my understanding, this should give the same result always. But if I run 
many statements (execution is from a java program and I can do it in parallel 
threads) so that they overlap (the udf on a large table takes 5-10 s on a 
slow machine), the results of some queries are different. If I have enough 
time between statements, it seems to work, i. e. the result is always the 
same. I would have thought the statements are independent, even if executed 
on different jdbc connections in parallel.
Does somebody have an idea?
Or could somebody give an idea on debugging? Normally I would try to debug the 
code to see what goes on, but how can I do this in a udf? Can I log in the 
udf?
Thanks for any hints,
Stefan

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql