vibhatha commented on PR #13687:
URL: https://github.com/apache/arrow/pull/13687#issuecomment-1240575947

   > > Plus, we should probably address the `ds.field('')._call` issue before 
we worry too much about extensive documentation.
   
   I understand your point. Should we hold this PR until we resove this issue?
   
   > 
   > The reason that this `_call` is currently private with a leading 
underscore, is because for the built-in compute functions, you can actually use 
the compute function itself and pass it a field expression instead of actual 
array:
   > 
   > ```
   > >>> import pyarrow.compute as pc
   > 
   > # you can do
   > >>> pc.field('a')._call("add", [pc.field("b")])
   > <pyarrow.compute.Expression add(b)>
   > # instead of
   > >>> pc.Expression._call("add", [pc.field("a"), pc.field("b")])
   > <pyarrow.compute.Expression add(a, b)>
   > ```
   > 
   > which was sufficient for the initial examples for dataset projections. 
Now, this might have some limitations. It already seems this is currently 
limited to only expressions as arguments, so you can't mix with a scalar right 
now (as the current example would do):
   > 
   > ```
   > >>> pc.add(pc.field('a'), 1)
   > ...
   > TypeError: only other expressions allowed as arguments
   > ```
   > 
   > Now, that might be something we can fix (didn't again look into it at the 
moment, I suppose I added this limitation in the initial PR for simplicity)
   > 
   > For UDFs, there is of course the additional limitation that this isn't 
available as a `pc.` function. For this use case, we should maybe allow 
`pc.call_function` to accept expressions as well? So that you can do 
`pc.call_function("my_udf", [pc.field("a")])` instead of 
`pc.Expression.call("my_udf", [pc.field("a")])`?
   
   
   
   > > I think the more interesting case for UDFs is when we want to use some 
other library that does efficient compute and is capable of working with Arrow 
data. For example, numpy. Here is an example that exposes numpy's gcd function 
(greatest common divisor) as an Arrow function
   > 
   > I think this would indeed be a more compelling example.
   > 
   > Another example could be a specific python functionality (eg something 
from `ipaddress`, to check or extract some information from strings that are 
supposed to be ipaddresses), although this will typically only work on scalars, 
and thus will be slow (but it's still an example how you can use this within 
arrow). Or another example could be a custom function implemented in numba.
   
   This is a better addition for the cookbook. Or should we add something like 
that here too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to