[ 
https://issues.apache.org/jira/browse/BEAM-5852?focusedWorklogId=161398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-161398
 ]

ASF GitHub Bot logged work on BEAM-5852:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Nov/18 00:10
            Start Date: 01/Nov/18 00:10
    Worklog Time Spent: 10m 
      Work Description: akedin commented on a change in pull request #6898: 
[BEAM-5852] [BEAM-5892] BeamSQL functions
URL: https://github.com/apache/beam/pull/6898#discussion_r229906798
 
 

 ##########
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/BeamSqlEnv.java
 ##########
 @@ -75,6 +77,14 @@ public static BeamSqlEnv inMemory(TableProvider... 
tableProviders) {
     return withTableProvider(inMemoryMetaStore);
   }
 
+  public void registerBuiltinUdf(Map<String, List<Method>> methods) {
+    for (Map.Entry<String, List<Method>> entry : methods.entrySet()) {
+      for (Method method : entry.getValue()) {
+        defaultSchema.add(entry.getKey(), ScalarFunctionImpl.create(method));
 
 Review comment:
   I don't think it's just the matter of benefits in this case. It is a wrong 
abstraction. `Method` represents a method on a java class, it has properties 
that are java-specific and doesn't have properties that we need. What we need 
is something that represents a Beam SQL UDF. Java method is not a SQL UDF. 
Maybe even Calcite's interfaces are wrong to use for this but they fit much 
better than java concepts and we can try using them. Calcite's 
`ScalarFunctionImpl` is a utility class to adapt java method signature to 
Calcite UDF concept in a specific manner that is not enough for our use case.
   
   What we need is an abstraction (or multiple) that allows us to:
    - capture arguments types and return types:
    - capture function name;
    - adapt to Calcite's `Function`;
    - link to the actual implementation at run-time;
   
   I don't think that we need anything else to capture the information about 
UDF. And I think that all of the above [belongs to the same 
concept](https://en.wikipedia.org/wiki/Cohesion_(computer_science)) and we can 
house it in the same place (modulo non-public utility classes or extra 
abstractions). `Method` doesn't capture all of it plus has a ton of extra stuff 
that we don't need that serve [different purposes, and it's not 
extensible](https://en.wikipedia.org/wiki/SOLID). 
   
   If we have our own UDF, the logic can look like (pseudocode):
   
   
   ```
   // loading UDFs
   
   ServiceLoader
     .load(UDFProvider.class)
     .stream()
     .flatMap(provider -> provider.getClass().getDeclaredMethods())
     .filter(UDFReflectionUtils::annotatedAsUDFs)
     .map(BeamReflectionUDF::create)
     .collect(toList());
   
   // creating the udf
   class ReflectionUDF implements SQLUDF {
   
       static create(Method javaMethod) {
          this.name = UDFRefletionUtils.getUDFName(javaMethod);
          this.arguments = UDFReflectionUtils.getArguments(javaMethod);
          this.returnType = 
UDFReflectionUtils.convertToRelDataType(javaMethod.getReturnType());
       }
   
       ...
   
       // we may optionally store the link to the original java method
       // but ultimately this too should be abstracted away
       @Override
       public Method getImplementation() { return this.method; }
   }
   ```
   
   In this case:
    - function class encapsulates all the needed information;
        - e.g. now you don't need to pass around the map of functions and just 
know what the keys and values mean in the map;
    - you can populate all the information once at instance construction time;
    - it only has relevant information;
    - only the places that need to know about reflection know about it, for 
other places it's just a bunch of business-specific interfaces;
    - opens the implementation for extensibility:
        - we control the java type mapping;
        - we can implement code-generation instead of reflection at run-time;
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 161398)
    Time Spent: 2h 20m  (was: 2h 10m)

> Function extension in BeamSQL
> -----------------------------
>
>                 Key: BEAM-5852
>                 URL: https://issues.apache.org/jira/browse/BEAM-5852
>             Project: Beam
>          Issue Type: New Feature
>          Components: dsl-sql
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>          Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We could add more functions to BeamSQL (as UDFs) to provide rich 
> functionalities than standard/Calcite functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to