HyukjinKwon commented on a change in pull request #27406: 
[SPARK-30681][PYSPARK][SQL] Add higher order functions API to PySpark
URL: https://github.com/apache/spark/pull/27406#discussion_r374068808
 
 

 ##########
 File path: python/pyspark/sql/column.py
 ##########
 @@ -129,6 +129,103 @@ def _(self, other):
     return _
 
 
+def _unresolved_named_lambda_variable(*name_parts):
+    """
+    Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable and
+    convert it to o.s.sql.Column
+
+    :param name_parts: str
+    """
+    sc = SparkContext._active_spark_context
+    name_parts_seq = _to_seq(sc, name_parts)
+    expressions = sc._jvm.org.apache.spark.sql.catalyst.expressions
+    return Column(
+        sc._jvm.Column(
+            expressions.UnresolvedNamedLambdaVariable(name_parts_seq)
+        )
+    )
+
+
+def _get_lambda_parameters(f):
+    import inspect
+
+    signature = inspect.signature(f)
+    parameters = signature.parameters.values()
+
+    # We should exclude functions that use
+    # variable args and keyword argnames
+    # as well as keyword only args
+    supported_parmeter_types = {
+        inspect.Parameter.POSITIONAL_OR_KEYWORD,
+        inspect.Parameter.POSITIONAL_ONLY,
+    }
+
+    # Validate that
+    # function arity is between 1 and 3
+    if not (1 <= len(parameters) <= 3):
 
 Review comment:
   I actually pretty don't like such way (e.g., `_LambdaSpec(func=f, 
expected_nargs={1, 2})`) which looks a bit confusing. Also, to be complete, we 
should also check the input and output after `f` execution in Python side then..
   
   The point is that, I think there might be another alternative for instance 
introducing type hint into PySpark, or somehow improving error messages at  
[utils.py#L76-L92](https://github.com/apache/spark/blob/0a95eb08003a115f59495b30aacaaa832940e977/python/pyspark/sql/utils.py#L76-L92)
 as pointed out above. That's why I am currently thinking it might be better to 
do it separately.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to