[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-03-03 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500828#comment-17500828
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Sure, the pasting code in the code block was a pain, it didn't capture the 
newlines. 
I will follow this method next time. Thanks [~apitrou] :) 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-03-03 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500821#comment-17500821
 ] 

Antoine Pitrou commented on ARROW-15765:


[~vibhatha] You can post your code on e.g. https://gist.github.com/ instead of 
editing the same comment several times :-) This will produce less notifications 
for people who watch this issue.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-03-03 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17500816#comment-17500816
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

[~apitrou] [~jorisvandenbossche] [~westonpace] 

This is a very naive attempt on extracting types on a so-called class which 
extends the Generics. The following example reflects extracting types from such 
a class.
{code:java}
from typing import Any, List
import inspectfrom typing import TypeVar, Generic
from logging import LoggerT = TypeVar('T')

class Array(object): 
    
    def __init__(self, data: List[Any]):
        self._data = data
        
    @property 
    def data(self):
        return self._data
    
    def __repr__(self):
        str1 = ""
        for datum in self.data:
            str1 += str(datum) + ", "
        return str1
    
class ArrayLike(Array, Generic[T]):
    
    def __init__(self, data:List[Any]):
        super(data)
    
class DataType(object):
    
    def __init__(self):
        pass
    
class Int32Type(DataType):
    
    def __init__(self):
        self._type_id = "int32"
    
    @property
    def id(self):
        return self._type_id

#test

a : Array = None

data : List[int] = [10, 20 , 30]

b: ArrayLike[Int32Type] = Array(data)

print(b)

# define a function with the generics

def sample_udf(array: ArrayLike[Int32Type]) -> ArrayLike[Int32Type]:
    return arraysig = inspect.signature(sample_udf)

input_types = sig.parameters.values()

annotations = [val.annotation for val in input_types]

annotation = annotations[0]

inner_type = annotation.__args__[0]

inner_typeouter_type = annotation.__origin__

outer_typeexpr_arg = inner_type == Int32Type

assert(expr_arg)

expr_outer = outer_type == ArrayLike

assert(expr_outer) {code}

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-25 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498231#comment-17498231
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Sure, I will give it a try and post what I find out. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-25 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17498093#comment-17498093
 ] 

Antoine Pitrou commented on ARROW-15765:


Someone could experiment with the typing generic approach indeed and see if it 
works.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497856#comment-17497856
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

[~jorisvandenbossche] the new typing generics look interesting. Is it practical 
to adopt this now. I am referring to the Python versions we support now. Is it 
wise to use it in the UDF integration and not do what I am suggesting to do in 
this jira. 

[~apitrou] Numba jit approach is nice and it looks like an advance feature for 
UDFs someday. I will keep this in mind. 

As [~westonpace] suggested, some of our main motivations are to support the 
user and try to provide user friendly options when we write TPCx-BB queries and 
similar applications. If the suggestion from [~jorisvandenbossche] to use 
advance typing is feasible, is it wise to use that instead of doing this change 
if it succeeds in solving our underlying problem. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497838#comment-17497838
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

I want to clarify a point, if I have not clearly mentioned the reason for the 
necessity of the typing information earlier in the thread. If I am not 
mistaken, here the main issue is not what UDF internally is doing for the data. 
We just need to register it in the function registry without taking the input 
and output types from the user explicitly. It is just a nice to have a feature 
which could look great in terms of presentability and usability with new Python 
upgrades. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497733#comment-17497733
 ] 

Antoine Pitrou commented on ARROW-15765:


> Another dimension to consider is whether a UDF would care if an array were 
>dictionary encoded or not? We probably want a way to express that too.

If you want a UDF to have different implementations based on the parameter 
types, you can't do that using type annotations.

What you could do is use a two-step approach like in Numba's {{generated_jit}}:
https://numba.pydata.org/numba-doc/dev/user/generated-jit.html


> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497720#comment-17497720
 ] 

Weston Pace commented on ARROW-15765:
-

For a concrete use case consider a user that wants to integrate some kind of 
Arrow native geojson library.  They would have extension types for geojson data 
types and custom functions that can do things like normalize coordinates to 
some kind of different reference or format coordinates in a particular way.  In 
this case the UDFs would be taking in extension arrays for custom data types 
which I think would have its own typings-based considerations.

Another possible example that comes from the TPCx-BB benchmark is doing 
sentiment analysis on strings (is this user comment a positive comment or a 
negative comment?)  If we had an arrow-native natural language processing 
library we could hook in an extract_sentiment operation which took in strings 
and returns ? (maybe doubles?).

As far as I know the type information itself is only used for validation and 
casting purposes.

Another dimension to consider is whether a UDF would care if an array were 
dictionary encoded or not?  We probably want a way to express that too.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Joris Van den Bossche (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497696#comment-17497696
 ] 

Joris Van den Bossche commented on ARROW-15765:
---

In context of a full query plan, I think it is important to know the output 
types given the input types, to be able to resolve the types in your full query?

I am wondering if we could make use of some of the newer typing features, which 
would allow to do something like

{code:python}
def simple_function(arrow_array: pa.Array[pa.int32()]) -> pa.Array[pa.int32()]: 
return call_function("add", [arrow_array, 1])  
{code}

I think such an object with which you can use [] is called a "generic" in 
typing terminology (https://docs.python.org/3.11/library/typing.html#generics), 
and it would allow to more easily get the type of the values in the container. 
On the other hand it creates a bit a separate typing syntax ({{pa.Array}} is 
not actually itself a useful class, it's always subclasses you get in practice).

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497567#comment-17497567
 ] 

Antoine Pitrou commented on ARROW-15765:


Of course, another question is: do you need to know the types at all? Without 
some concrete use cases it's hard to tell.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497566#comment-17497566
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Should we design this feature or as [~jorisvandenbossche] and [~westonpace] 
suggested, we can use the inverse option to get the type from the Array type 
and not exposing this to the user? This issue is at the moment mainly focusing 
on the UDF usability piece rather than improving a core functionality for Arrow 
Python API. But it could be useful, but beyond the scope of this usecase it is 
not very clear to me how useful it is going to be to the user. 

What do you think? 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497545#comment-17497545
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

[~apitrou] I see your point. There are pitfalls and limitations to this 
approach. This is mainly a usability piece. I also have a doubt, is it worth 
investing time on it if the the applications of this becomes niche. But it 
feels like a nice to have a feature to at least support some widely used UDF 
function signatures.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-24 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497523#comment-17497523
 ] 

Antoine Pitrou commented on ARROW-15765:


Note that this approach limits the expressivity of the type annotations. For 
example, if you write:
{code:python}
def compute_func(a: pa.ListArray) -> pa.ListArray:
...
{code}
... you are not able to tell what the value type of the list type is. Similarly 
with parametrized types such as timestamps or decimals.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497094#comment-17497094
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

There can be limitations in places when user just want to use a lambda in a 
groupby where we expose UDFs. That needs to be handled internally for group by 
ops. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497082#comment-17497082
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

As [~westonpace] explained, we are working on a UDF PoC. At the moment how you 
register a function can be as follows;

 
 
{code:java}
import pyarrow as pa 
from pyarrow import compute as pc 
from pyarrow.compute import call_function, register_pyfunction
from pyarrow.compute import Arity, InputType 

func_doc = {} 
func_doc["summary"] = "summary" 
func_doc["description"] = "desc" 
func_doc["arg_names"] = ["number"] 
func_doc["options_class"] = "SomeOptions" 
func_doc["options_required"] = False 
arity = Arity.unary() 
func_name = "python_udf" 
in_types = [InputType.array(pa.x())] 
out_type = pa.int64() 

def py_function(arrow_array): 
p_new_array = call_function("add", [arrow_array, 1]) 
return p_new_array 

callback = simple_functionregister_pyfunction(func_name, arity, func_doc, 
in_types, out_type, callback) 

func1 = pc.get_function(func_name)

a1 = pc.call_function(func_name, [pa.array([20])]){code}
 
When registering the function user has to explicitly mention what is the arity 
and what are the input and output types of the UDF. We can ease this by taking 
all the information from the type-hints itself. This is only to improve the 
usability.
 
Spark is already providing that support. When we go this route, we will extract 
all the information from the UDF signature. At the moment I am using inspect 
API to extract those information.
 
Next step is to extract from the type hint info: `pa.Int32Array` that this is a 
`pa.Array` of type `pa.int32()`. This is the objective of this exercise.
 
[~apitrou] does it clear things out? Do you need more information to know why 
we need this feature?  
 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497011#comment-17497011
 ] 

Weston Pace commented on ARROW-15765:
-

This is indeed about user-defined functions.  Vibhatha has been working on an 
implementation.  You can see the current progress here: 
https://github.com/apache/arrow/compare/master...vibhatha:test-udf-vibhatha

I suspect the need has to do with registering a function like:

{code}
def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
return pc.call_function("add", [array1, array2])
{code}

with our function registry (which will want to know the arity and types of each 
argument).  Vibhatha can probably give a more complete answer.

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496945#comment-17496945
 ] 

Antoine Pitrou commented on ARROW-15765:


Let's step back a bit. Is this about user-defined functions? If so, perhaps 
someone should actually start working on these and explain what the actual need 
is?

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496942#comment-17496942
 ] 

Weston Pace commented on ARROW-15765:
-

In that case extending classes is not going to help.  There isn't really 
anything that makes sense to extend from C++ as the object {{Int32Array}} has 
no runtime equivalent in C++ (i.e. there is no such thing as reflection in C++).

I think Joris' suggestion from zulip is simplest.  Let's invert 
{{_array_classes}} and {{_scalar_classes}}

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496881#comment-17496881
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

[~westonpace] 
Exactly, a reflection task. Need to extract the types before data get's in 
here.  

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496879#comment-17496879
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Getting the type from the data, that's totally correct [~westonpace] , I agree. 

The thing is we need the type information extracted from the function 
signature, not from the data. So can we use this approach? Did I get it wrong? 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496877#comment-17496877
 ] 

Weston Pace commented on ARROW-15765:
-

Ah, from Zulip I see the goal is to go from Int32Array to DataType(int32) 
without instantiating an instance.  So is this more of a metaprogramming / 
reflection task?

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496876#comment-17496876
 ] 

Weston Pace commented on ARROW-15765:
-

Perhaps I am missing some key piece but don't all arrays (Int32Array, 
Int64Array, etc.) extend Array which already has a type field?

{noformat}
cdef class Array(_PandasConvertible):
cdef:
shared_ptr[CArray] sp_array
CArray* ap

cdef readonly:
DataType type
{noformat}

So couldn't you do `array1.type`?

{noformat}
>>> x = pa.array([1, 2, 3])
>>> x.type
DataType(int64)
{noformat}

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496806#comment-17496806
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Yes the id would be kind of hard to grasp for the user. One intention is 
actually making expose to the user, so in their development activities this 
could be helpful for some advanced use case similar to UDFs. I cannot exactly 
say what are such cases, but it could be useful. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496801#comment-17496801
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

You're right, we can use that. Just tried to show the expected outcome more 
clearly. 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Alessandro Molina (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496797#comment-17496797
 ] 

Alessandro Molina commented on ARROW-15765:
---

[~vibhatha] I'm not sure why you want `CDataType` itself, `Array.type` already 
provides the `DataType` which is the Python equivalent of `CDataType`

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Alessandro Molina (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496792#comment-17496792
 ] 

Alessandro Molina commented on ARROW-15765:
---

Probably exposing `name` in `CDataType` would be a starting point. For nested 
types like lists, struct etc.. we will have to dive into the sub_fields, but 
for basic types the `name` might already allow to easily identify the type. (We 
could already use `id` but I guess that's less immediately understandable)

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496789#comment-17496789
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

I am referring to the classes here: 
https://github.com/apache/arrow/blob/0363df1b44274707228af7274102bbe50cdb68be/python/pyarrow/lib.pxd#L317

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496784#comment-17496784
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

Ah Yes you're right. Mixed up with the C++ naming. 

To expose this one, I guess we have to extend the classes
{code:java}
cdef class Int32Array(IntegerArray):
   pass{code}
 to something like 
{code:java}
cdef class Int32Array(IntegerArray):
   cdef shared_ptr[CDataType] get_type(){code}
 And expose it as property to Python? 

Or is there a better approach for this?

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496760#comment-17496760
 ] 

Antoine Pitrou commented on ARROW-15765:


Well, {{pa.Int32Type}} doesn't exist:
{code:python}
>>> ty = pa.int32()
>>> ty.__class__
pyarrow.lib.DataType
{code}

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15765) [Python] Extracting Type information from Python Objects

2022-02-23 Thread Vibhatha Lakmal Abeykoon (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17496755#comment-17496755
 ] 

Vibhatha Lakmal Abeykoon commented on ARROW-15765:
--

 [~apitrou] [~westonpace] [~jorisvandenbossche] [~amol-]

Your thoughts on this? 

> [Python] Extracting Type information from Python Objects
> 
>
> Key: ARROW-15765
> URL: https://issues.apache.org/jira/browse/ARROW-15765
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Vibhatha Lakmal Abeykoon
>Assignee: Vibhatha Lakmal Abeykoon
>Priority: Major
>
> When creating user defined functions or similar exercises where we want to 
> extract the Arrow data types from the type hints, the existing Python API 
> have some limitations. 
> An example case is as follows;
> {code:java}
> def function(array1: pa.Int64Array, arrya2: pa.Int64Array) -> pa.Int64Array:
>     return pc.call_function("add", [array1, array2])
>   {code}
> We want to extract the fact that array1 is an `pa.Array` of `pa.Int32Type`. 
> At the moment there doesn't exist a straightforward manner to get this done. 
> So the idea is to expose this feature to Python. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)