[
https://issues.apache.org/jira/browse/BEAM-5852?focusedWorklogId=161398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161398
]
ASF GitHub Bot logged work on BEAM-5852:
----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Nov/18 00:10
Start Date: 01/Nov/18 00:10
Worklog Time Spent: 10m
Work Description: akedin commented on a change in pull request #6898:
[BEAM-5852] [BEAM-5892] BeamSQL functions
URL: https://github.com/apache/beam/pull/6898#discussion_r229906798
##########
File path:
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
##########
@@ -75,6 +77,14 @@ public static BeamSqlEnv inMemory(TableProvider...
tableProviders) {
return withTableProvider(inMemoryMetaStore);
}
+ public void registerBuiltinUdf(Map<String, List<Method>> methods) {
+ for (Map.Entry<String, List<Method>> entry : methods.entrySet()) {
+ for (Method method : entry.getValue()) {
+ defaultSchema.add(entry.getKey(), ScalarFunctionImpl.create(method));
Review comment:
I don't think it's just the matter of benefits in this case. It is a wrong
abstraction. `Method` represents a method on a java class, it has properties
that are java-specific and doesn't have properties that we need. What we need
is something that represents a Beam SQL UDF. Java method is not a SQL UDF.
Maybe even Calcite's interfaces are wrong to use for this but they fit much
better than java concepts and we can try using them. Calcite's
`ScalarFunctionImpl` is a utility class to adapt java method signature to
Calcite UDF concept in a specific manner that is not enough for our use case.
What we need is an abstraction (or multiple) that allows us to:
- capture arguments types and return types:
- capture function name;
- adapt to Calcite's `Function`;
- link to the actual implementation at run-time;
I don't think that we need anything else to capture the information about
UDF. And I think that all of the above [belongs to the same
concept](https://en.wikipedia.org/wiki/Cohesion_(computer_science)) and we can
house it in the same place (modulo non-public utility classes or extra
abstractions). `Method` doesn't capture all of it plus has a ton of extra stuff
that we don't need that serve [different purposes, and it's not
extensible](https://en.wikipedia.org/wiki/SOLID).
If we have our own UDF, the logic can look like (pseudocode):
```
// loading UDFs
ServiceLoader
.load(UDFProvider.class)
.stream()
.flatMap(provider -> provider.getClass().getDeclaredMethods())
.filter(UDFReflectionUtils::annotatedAsUDFs)
.map(BeamReflectionUDF::create)
.collect(toList());
// creating the udf
class ReflectionUDF implements SQLUDF {
static create(Method javaMethod) {
this.name = UDFRefletionUtils.getUDFName(javaMethod);
this.arguments = UDFReflectionUtils.getArguments(javaMethod);
this.returnType =
UDFReflectionUtils.convertToRelDataType(javaMethod.getReturnType());
}
...
// we may optionally store the link to the original java method
// but ultimately this too should be abstracted away
@Override
public Method getImplementation() { return this.method; }
}
```
In this case:
- function class encapsulates all the needed information;
- e.g. now you don't need to pass around the map of functions and just
know what the keys and values mean in the map;
- you can populate all the information once at instance construction time;
- it only has relevant information;
- only the places that need to know about reflection know about it, for
other places it's just a bunch of business-specific interfaces;
- opens the implementation for extensibility:
- we control the java type mapping;
- we can implement code-generation instead of reflection at run-time;
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 161398)
Time Spent: 2h 20m (was: 2h 10m)
> Function extension in BeamSQL
> -----------------------------
>
> Key: BEAM-5852
> URL: https://issues.apache.org/jira/browse/BEAM-5852
> Project: Beam
> Issue Type: New Feature
> Components: dsl-sql
> Reporter: Rui Wang
> Assignee: Rui Wang
> Priority: Major
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
> We could add more functions to BeamSQL (as UDFs) to provide rich
> functionalities than standard/Calcite functions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)