Re: UDF behaves non-deterministic
Hi everybody, it looks like the reason for the problem was me not handling string arguments properly (I did not use the provided lengths, but relied on string being null-terminated, it's in the doc, but ...). It seems this became a problem specifically in the parallel situation, misleading me into believing it had to do with this. Thanks to everybody for help! Even though the solution was not directly provided, the comments made me think about my code and so were still helpfull. Stefan On Monday 05 November 2012 15:08:51 Michael Dykman wrote: > C is not an inherently thread-safe language. Several of the standard > library functions use static data, which gets stepped on during concurrent > operation. Many of those do have thread-safe equivalents on many platforms > such as strtok/strtok_r (the latter being the safe one). > > If you are confident you are not using statics or globals in your code > directly, you will need to identify each function you do call. Start by > reading the man page for that function (if it's in the C stdlib, there is a > man page for it) which should tell you if it is safe or not; for those > which are not, the man page will likely suggest a threadsafe alternative if > one is available. If none are available, you might have to consider a > mutex. > > - michael dykman > > On Mon, Nov 5, 2012 at 9:28 AM, Stefan Kuhn wrote: > > Hi Dan, > > > > thanks for your answer. The UDF only contains functions (the one called > > in sql plus two functions called in it). There are no variables outside > > them and nothing is declared static. All variables inside the functions > > are declared just like "double x=0;" etc. I am not an expert on C, but my > > understanding is that these values are separate for each call of the > > function and don't influence each other. Do you have a suggestion what I > > should look for in my c code? Or do I need to make the code thread-safe > > in that sense that concurrent executions are prevented by monitors or > > semaphors or so (no idea about what this is called in c)? > > Stefan > > > > >The first thing I would do is examine your UDF and ensure that it is > > >thread-safe. No global variables, no static variables within functions, > > >etc. Also make sure that any libc functions you call that are documented > > > > as > > > > >non-threadsafe are wrapped by a mutex or otherwise protected against > > >multiple simultaneous access. > > > > > >http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html > > > > > >As for debugging, you should be able to write things to stderr which > > > will show up in the mysql logfile, or you could open your own logfile > > > and write to that. > > > > -- > > Dan Nelson > > dnel...@allantgroup.com > > > > > > > > > > -- > > MySQL General Mailing List > > For list archives: http://lists.mysql.com/mysql > > To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
On Monday 05 November 2012 18:02:28 h...@tbbs.net wrote: > 2012/11/04 22:23 +, Stefan Kuhn > > select * from table order by udf(column, 'input_value') desc; > For my understanding, this should give the same result always. > > But if for your data function "udf" returns the same for more arguments > there is not enough to fix the order. In that case I have found that other > accidental things affect the order, things that one would not suspect: > howmuch store is used and needed for the ordering, ... a further reason for > showing what the function returns. If the order varies, although the > function returns the same in all cases, well, True, but I am missing records from the top ten which should definitly be in, so this should not be the problem here. I am investigating things further... Stefan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
2012/11/04 22:23 +, Stefan Kuhn select * from table order by udf(column, 'input_value') desc; For my understanding, this should give the same result always. But if for your data function "udf" returns the same for more arguments there is not enough to fix the order. In that case I have found that other accidental things affect the order, things that one would not suspect: howmuch store is used and needed for the ordering, ... a further reason for showing what the function returns. If the order varies, although the function returns the same in all cases, well, -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
can you reduce the UDF just to return 1; ? that should give you a clue what is going on. Random values usualy point to two suspects 1. mixing 32bit and 64bit 2. using void instead of int re, wh Am 04.11.2012 23:23, schrieb Stefan Kuhn: > Hi all, > I have a weired (for me at least) problem with a user defined function, > written in C. The function seems to return different results in different > runs (the code of the function does not contain random elements). Basically, > the function calculates a score based on a column in a table and an input > value. So I do something like this: > select * from table order by udf(column, 'input_value') desc; > For my understanding, this should give the same result always. But if I run > many statements (execution is from a java program and I can do it in parallel > threads) so that they overlap (the udf on a large table takes 5-10 s on a > slow machine), the results of some queries are different. If I have enough > time between statements, it seems to work, i. e. the result is always the > same. I would have thought the statements are independent, even if executed > on different jdbc connections in parallel. > Does somebody have an idea? > Or could somebody give an idea on debugging? Normally I would try to debug > the > code to see what goes on, but how can I do this in a udf? Can I log in the > udf? > Thanks for any hints, > Stefan > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: Re: UDF behaves non-deterministic
C is not an inherently thread-safe language. Several of the standard library functions use static data, which gets stepped on during concurrent operation. Many of those do have thread-safe equivalents on many platforms such as strtok/strtok_r (the latter being the safe one). If you are confident you are not using statics or globals in your code directly, you will need to identify each function you do call. Start by reading the man page for that function (if it's in the C stdlib, there is a man page for it) which should tell you if it is safe or not; for those which are not, the man page will likely suggest a threadsafe alternative if one is available. If none are available, you might have to consider a mutex. - michael dykman On Mon, Nov 5, 2012 at 9:28 AM, Stefan Kuhn wrote: > Hi Dan, > > thanks for your answer. The UDF only contains functions (the one called in > sql plus two functions called in it). There are no variables outside them > and nothing is declared static. All variables inside the functions are > declared just like "double x=0;" etc. I am not an expert on C, but my > understanding is that these values are separate for each call of the > function and don't influence each other. Do you have a suggestion what I > should look for in my c code? Or do I need to make the code thread-safe in > that sense that concurrent executions are prevented by monitors or > semaphors or so (no idea about what this is called in c)? > Stefan > > >The first thing I would do is examine your UDF and ensure that it is > >thread-safe. No global variables, no static variables within functions, > >etc. Also make sure that any libc functions you call that are documented > as > >non-threadsafe are wrapped by a mutex or otherwise protected against > >multiple simultaneous access. > > > >http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html > > > >As for debugging, you should be able to write things to stderr which will > >show up in the mysql logfile, or you could open your own logfile and write > >to that. > > -- > Dan Nelson > dnel...@allantgroup.com > > > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/mysql > > -- - michael dykman - mdyk...@gmail.com May the Source be with you.
Aw: Re: UDF behaves non-deterministic
Hi Dan, thanks for your answer. The UDF only contains functions (the one called in sql plus two functions called in it). There are no variables outside them and nothing is declared static. All variables inside the functions are declared just like "double x=0;" etc. I am not an expert on C, but my understanding is that these values are separate for each call of the function and don't influence each other. Do you have a suggestion what I should look for in my c code? Or do I need to make the code thread-safe in that sense that concurrent executions are prevented by monitors or semaphors or so (no idea about what this is called in c)? Stefan >The first thing I would do is examine your UDF and ensure that it is >thread-safe. No global variables, no static variables within functions, >etc. Also make sure that any libc functions you call that are documented as >non-threadsafe are wrapped by a mutex or otherwise protected against >multiple simultaneous access. > >http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html > >As for debugging, you should be able to write things to stderr which will >show up in the mysql logfile, or you could open your own logfile and write >to that. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
In the last episode (Nov 04), Stefan Kuhn said: > I have a weired (for me at least) problem with a user defined function, > written in C. The function seems to return different results in different > runs (the code of the function does not contain random elements). > Basically, the function calculates a score based on a column in a table > and an input value. So I do something like this: > > select * from table order by udf(column, 'input_value') desc; > > For my understanding, this should give the same result always. But if I > run many statements (execution is from a java program and I can do it in > parallel threads) so that they overlap (the udf on a large table takes > 5-10 s on a slow machine), the results of some queries are different. If > I have enough time between statements, it seems to work, i. e. the > result is always the same. I would have thought the statements are > independent, even if executed on different jdbc connections in parallel. > > Does somebody have an idea? Or could somebody give an idea on debugging? > Normally I would try to debug the code to see what goes on, but how can I > do this in a udf? Can I log in the udf? The first thing I would do is examine your UDF and ensure that it is thread-safe. No global variables, no static variables within functions, etc. Also make sure that any libc functions you call that are documented as non-threadsafe are wrapped by a mutex or otherwise protected against multiple simultaneous access. http://dev.mysql.com/doc/refman/5.5/en/adding-udf.html As for debugging, you should be able to write things to stderr which will show up in the mysql logfile, or you could open your own logfile and write to that. -- Dan Nelson dnel...@allantgroup.com -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
On Sunday 04 November 2012 22:34:22 Michael Dykman wrote: > A couple of questions present. > > You mention that selecting from the whole table takes 5-10s so I assume you > have a lot of records. Yes, and the calculation of the score is fairly complicated. Plust the test server is slow (Pentium III machine, old, but working) > is the data not in flux? are you sure? Yes, I am. I have a test server, where nothing happens. > these conflict queries are all on the same server? Yes, one mysql instance on one server > > i would have structured the query like so: > select *, udf(column,'value') AS u from table order by u; I tried this and whilst it gives a speedup (around 25%, I would say), it does not solve the problem (but thanks for the hint, I didn't think this makes a difference). > > I suspect it might reduce the number of udf invocations.. the order by > clause is frequently referred to in the process of sorting.. keeping that > static instead of dynamic might sanitize your issue. > > On 2012-11-04 4:24 PM, "Stefan Kuhn" wrote: > > Hi all, > I have a weired (for me at least) problem with a user defined function, > written in C. The function seems to return different results in different > runs (the code of the function does not contain random elements). > Basically, the function calculates a score based on a column in a table and > an input value. So I do something like this: > select * from table order by udf(column, 'input_value') desc; > For my understanding, this should give the same result always. But if I run > many statements (execution is from a java program and I can do it in > parallel > threads) so that they overlap (the udf on a large table takes 5-10 s on a > slow machine), the results of some queries are different. If I have enough > time between statements, it seems to work, i. e. the result is always the > same. I would have thought the statements are independent, even if executed > on different jdbc connections in parallel. > Does somebody have an idea? > Or could somebody give an idea on debugging? Normally I would try to debug > the > code to see what goes on, but how can I do this in a udf? Can I log in the > udf? > Thanks for any hints, > Stefan > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe:http://lists.mysql.com/mysql -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: UDF behaves non-deterministic
A couple of questions present. You mention that selecting from the whole table takes 5-10s so I assume you have a lot of records. is the data not in flux? are you sure? these conflict queries are all on the same server? i would have structured the query like so: select *, udf(column,'value') AS u from table order by u; I suspect it might reduce the number of udf invocations.. the order by clause is frequently referred to in the process of sorting.. keeping that static instead of dynamic might sanitize your issue. On 2012-11-04 4:24 PM, "Stefan Kuhn" wrote: Hi all, I have a weired (for me at least) problem with a user defined function, written in C. The function seems to return different results in different runs (the code of the function does not contain random elements). Basically, the function calculates a score based on a column in a table and an input value. So I do something like this: select * from table order by udf(column, 'input_value') desc; For my understanding, this should give the same result always. But if I run many statements (execution is from a java program and I can do it in parallel threads) so that they overlap (the udf on a large table takes 5-10 s on a slow machine), the results of some queries are different. If I have enough time between statements, it seems to work, i. e. the result is always the same. I would have thought the statements are independent, even if executed on different jdbc connections in parallel. Does somebody have an idea? Or could somebody give an idea on debugging? Normally I would try to debug the code to see what goes on, but how can I do this in a udf? Can I log in the udf? Thanks for any hints, Stefan -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql