[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-13 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132595#comment-14132595
 ] 

Robert Stupp commented on CASSANDRA-7888:
-

bq. I propose we just remove the UDF-as-classes
Can be simply optimized to use indy. I'm generally neutral on whether to keep 
or remove 'class' UDFs regarding the implementation itself. But not having 
class UDFs can avoid potential deployment issues.

bq. Still think we should be using invokedynamic
It's not necessary to do so because with CASSANDRA-7924 the code's refactored 
to generate java-UDF classes that extend/implement {{UDFunction}} - so there's 
no more need to use either reflection or indy during invocation.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132250#comment-14132250
 ] 

Sylvain Lebresne commented on CASSANDRA-7888:
-

bq. The problem with the static method approach is that it force us internally 
to perform a method call via reflection for each call of the function.

It's a fair point. But maybe the simpler solution is just to remove the option 
of having UDT as 'classes'. After all, it doesn't really provide any benefit 
over CASSANDRA-7562 and simpler is better. So I propose we just remove the 
UDF-as-classes option.

Regarding user-defined aggregates, I think we should just copy Postresql 
approach which is described 
[here|http://www.postgresql.org/docs/9.1/static/sql-createaggregate.html]. This 
solves the problem of how to do scripting language for aggregates in particular.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132277#comment-14132277
 ] 

Jonathan Ellis commented on CASSANDRA-7888:
---

bq. I propose we just remove the UDF-as-classes option.

How does this remove the need for reflection in the UDF-as-java-source case?

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132279#comment-14132279
 ] 

Sylvain Lebresne commented on CASSANDRA-7888:
-

bq. the need for reflection in the UDF-as-java-source case?

We use javaassist to generate bytecode directly, no reflection there.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132284#comment-14132284
 ] 

Jonathan Ellis commented on CASSANDRA-7888:
---

We use javaassist to create a class, implementing a method with the right 
signature, but we still use Method.invoke to call that generated method.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132286#comment-14132286
 ] 

Jonathan Ellis commented on CASSANDRA-7888:
---

Still think we should be using invokedynamic. 
http://rick-hightower.blogspot.com/2013/10/java-invoke-dynamic-examples-java-7.html

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-05 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123003#comment-14123003
 ] 

Robert Stupp commented on CASSANDRA-7888:
-

A Java interface would work fine with 'class' UDFs (CASSANDRA-7395).
'java' (CASSANDRA-7562) UDFs and JSR223 (script) UDFs from CASSANDRA-7526 might 
get complicated.
(Although 'java' code generation could be changed.)

I thought of an alternative approach to pass some result set context object 
into UDFs as the first parameter for aggregate functions.
Means: each SELECT execution generates one result set context object for each 
used aggregate function.
The drawback of course is that such a generic result set context could not 
use primitive types ({{int}}, {{long}}, {{double}}, etc) but only the wrapped 
types ({{Integer}}, {{Long}}, {{Double}}, etc).

Or we could let the UDF implementation return some aggregate interface 
implementation which gets called for each row and for the final result. For 
example for 'class' UDFs:
{noformat}
class MyAggregateFunctionContext implements AggregateFunctionResultSetDouble {
void forEachRow(SomeRowOrCell data) {
... per row magic code
}
Double getResult() {
return resultValue;
}
}
{noformat}

for 'java' UDFs:
{noformat}
CREATE FUNCTION aggregateMagic ( input double ) RETURNS double LANGUAGE java AS 
'
return new AggregateFunctionResultSetDouble {
void forEachRow(SomeRowOrCell data) {
... per row magic code
}
Double getResult() {
return resultValue;
}
}
';
{noformat}

Maybe it's necessary to add some {{CREATE AGGREGATE FUNCTION ...}} syntax to 
distinguish between scalar and aggregation functions.

BTW: javassist has been added as part of CASSANDRA-7562.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;

[jira] [Commented] (CASSANDRA-7888) Decide the best way to define user-define functions

2014-09-05 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14123394#comment-14123394
 ] 

Robert Stupp commented on CASSANDRA-7888:
-

Just looked which script languages support implementing java interfaces:
* Groovy: http://groovy.codehaus.org/Groovy+way+to+implement+interfaces
* Jython: 
http://www.jython.org/jythonbook/en/1.0/JythonAndJavaIntegration.html#applying-the-design-to-different-object-types
* JRuby: 
https://github.com/jruby/jruby/wiki/CallingJavaFromJRuby#implementing-java-interfaces-in-jruby
* Scala: http://www.scala-lang.org/old/node/812.html
All languages seem to have their difficulties with generics.
Means your approach in your v2 patch of CASSANDRA-4914 should be possible with 
scripting languages, too.

 Decide the best way to define user-define functions
 ---

 Key: CASSANDRA-7888
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7888
 Project: Cassandra
  Issue Type: Improvement
Reporter: Benjamin Lerer
  Labels: cql
 Fix For: 3.0


 The goal of this ticket is to define what would be the best way from the ease 
 of use and performance point of view for defining User Defined Scalar 
 Function and User Defined Aggregate Function.
 I would like to clarify this point before we add support for User Defined 
 Aggregate Function as part of #4914 
 The current version of UDF is supporting only the addition of Scalar Function 
 and does so by allowing a User to provide some classes containing static 
 methods that can then be loaded as functions within Cassandra.
 The problem with the static method approach is that it force us internally to 
 perform a method call via reflection for each call of the function. So if the 
 request load 10 000 rows the static method will be called 10 000 times via 
 reflection.
 As the Method object is cached the HotSpot compiler will optimize the method 
 call after a certain amount of iterations. Nevertheless, from a performance 
 point of view it is definetly not a optimal situation.
 Ideally a proper solution from the performance point of view will limit the 
 impact to the function loading time (when the function is first added or at 
 startup time) but not at query time.
 The first solution to solve that problem would be to force the designer of a 
 new function to implements a specific interface like:
 {code}
 public interface UserDefinedScalarFunction
 {
 Object execute(Object... args);
 }
 {code}
 or for aggregate function
 {code}
 public interface UserDefinedAggregateFunction
 {
 UserDefinedAggregation newAggregate();
 public interface UserDefinedAggregate 
 {
 void add(Object... args);
 Object getResult();
 void reset();
 }
 } 
 {code} 
 This will allow use to create one object instance via reflection and then 
 reuse that object everytime the function is called.
 The problems with that approach is that we loose the type safety of the 
 arguments and of the return type and by consequence we will be able to detect 
 a problem only at running time.
 The second solution would be to force the designer of a new function to 
 create a new class in which it marks the method to execute with an annotation.
 {code}
 public class AbsFunction
 {
 @Execute
 public double abs(double d)
 {
 return Maths.abs(d);
 }
 }
 {code}
 The same approach for aggregate functions will give:
 {code}
 public class AvgFunction
 {
 private double sum;
 private int count
 @Add
 public void addValue(double d)
 {
 sum += d;
   count++;
 }
 @Get
 public double getAvg()
 {
 if (count == 0)
   return 0;
 return sum / count
 }
  
 @Reset
 public void clear()
 {
   sum = 0;
 count = 0;
 }
 }
 {code}
 For this approach to work we need to use, at loading time, code generation 
 for extending the provided class with the method needed to adapt the class to 
 our framework.
 The disavantage of it is that we will need to add a new library like 
 javaassist to the libraries used by C*.
 Its advantage is that it will allow us to detect type mismatch at creation 
 time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)