[ https://issues.apache.org/jira/browse/CASSANDRA-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181515#comment-16181515 ]
xin jin commented on CASSANDRA-13904: ------------------------------------- Simple experiments: {code} //Test function: createTable("CREATE TABLE %s (a int primary key, b int)"); List<String> queryList = new ArrayList<>(); for (int i = 1, m = 10000; i < m; i++) { String queryString = "INSERT INTO %s (a, b) " + String.format("VALUES (%d, %d)", i, i); execute(queryString); } String fState = createFunction(KEYSPACE, "int, int", "CREATE FUNCTION %s(a int, b int) " + "CALLED ON NULL INPUT " + "RETURNS int " + "LANGUAGE java " + "AS 'return Integer.valueOf((a!=null?a.intValue():0) + b.intValue());'"); String a = createAggregate(KEYSPACE, "int, int", "CREATE AGGREGATE %s(int) " + "SFUNC " + shortFunctionName(fState) + " " + "STYPE int"); // 1 + 2 + 3 = 6 assertRows(execute("SELECT " + a + "(b) FROM %s"), row(49995000)); {code} results: 1. enable_user_defined_functions_threads: false TRACE: UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s) to state function cql_test_keyspace.function_1 in 37259μs, 17297μs, 26131μs 2. enable_user_defined_functions_threads: true UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s) to state function cql_test_keyspace.function_1 in 555004μs, 457931μs, 475664μs > Performance improvement of Cassandra UDF/UDA > -------------------------------------------- > > Key: CASSANDRA-13904 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13904 > Project: Cassandra > Issue Type: Improvement > Components: CQL > Reporter: xin jin > Priority: Critical > Labels: performance > Fix For: 3.11.x > > > Hi All, > We have made a few experiments and found that running query with direct UDF > execution is ten time more faster than the async UDF execution. The in-line > comment: "Using async UDF execution is expensive (adds about 100us overhead > per invocation on a Core-i7 MBPr)” > https://insight.io/github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293 > show that this is a known behavior. My questions are as below: > 1. What are the main pros and cons of these two methods? Can I find any > documents that discuss this? > 2. Are there any plans to improve the performance of using async UDF? A > simple way come to my mind is to use some sort of batch method, e.g., replace > current row by row method with some rows by some rows. Are there any concerns > on this? > 3. How people solve this performance issue in general? It seems this > performance issue is not an urgent or an important issue to solve because it > is known and it is still there. Therefore people must have some sort of good > solution solving this issue. > I really appreciate your comments in advance. > Best regards, > Xin -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org