I made a PR for this issue at https://github.com/apache/arrow/pull/5835. Would love some more detail about what was intended by the initial issue and what would be a better way.
On Tue, Nov 12, 2019 at 11:25 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Sorry for the delay in response. I would suggest that you open a PR (or > point to a branch with those changes), that will make it easier to discuss > specific implementation options (rather than trying to explain and > understand it in words) and give advice. > > On Wed, 6 Nov 2019 at 20:29, Justin Polchlopek <jpolchlo...@azavea.com> > wrote: > > > Hi. I'm looking into this issue and I have some questions as someone new > > to the project. The comment from Joris earlier in the thread suggests > that > > the solution here is to create an Array subclass for each extension type > > that wants to use one. This will give a nice symmetry w.r.t. the Java > > interface, but in the Python case, this seems to suggest having to travel > > some fairly byzantine code paths (rather quickly, we end up in C++ code, > > where I lose the thread of what's happening—specifically as regards > > `pyarrow_wrap_array`, as suggested in ARROW-6176). > > > > The goal here is that for the end user, it is possible to do this without > involving C++ code, and I *think* implementing it should be possible from > cython. How did you end up in C++? > > > > I came up with a quick-and-dirty method wherein the ExtensionType > subclass > > simply provides a method to translate from the storage type to the output > > type, and ExtensionArray has a __getitem__ implementation that passes the > > element from storage through the translation function. This doesn't feel > > outside of the realm of what is often acceptable in the python world, but > > it isn't nearly as typeful as Arrow seems to be leaning. Plus, this > feels > > very far from what was intended in the issue, and I believe that I'm not > > understanding the underlying design principles. > > > > Can I get a bit of advice on this? > > > > Thanks. > > -J > > > > On Tue, Oct 29, 2019 at 12:26 PM Justin Polchlopek < > jpolchlo...@azavea.com > > > > > wrote: > > > > > That sounds about right. We're doing some work here that might require > > > this feature sooner than later, and if we decide to go the route that > > needs > > > this improved support, I'd be happy to make this PR. Thanks for > showing > > > that issue. I'll be sure to tag any contribution with that ticket > > number. > > > > > > On Tue, Oct 29, 2019 at 9:01 AM Joris Van den Bossche < > > > jorisvandenboss...@gmail.com> wrote: > > > > > >> > > >> On Mon, 28 Oct 2019 at 22:41, Wes McKinney <wesmck...@gmail.com> > wrote: > > >> > > >>> Adding dev@ > > >>> > > >>> I don't believe we have APIs yet for plugging in user-defined Array > > >>> subtypes. I assume you've read > > >>> > > >>> > > >>> > > > http://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types > > >>> > > >>> There may be some JIRA issues already about this (defining subclasses > > >>> of pa.Array with custom behavior) -- since Joris has been working on > > >>> this I'm interested in more comments > > >>> > > >> > > >> Yes, there is https://issues.apache.org/jira/browse/ARROW-6176 for > > >> exactly this issue. > > >> What I proposed there is to allow one to subclass > pyarrow.ExtensionArray > > >> and to attach this to an attribute on the custom ExtensionType (eg > > >> __arrow_ext_array_class__ in line with the other __arrow_ext_.. > > >> methods). That should allow to achieve similar functionality as what > is > > >> available in Java I think. > > >> > > >> If that seems a good way to do this, I think we certainly welcome a PR > > >> for that (I can also look into it otherwise before 1.0). > > >> > > >> Joris > > >> > > >> > > >>> > > >>> On Mon, Oct 28, 2019 at 3:56 PM Justin Polchlopek > > >>> <jpolchlo...@azavea.com> wrote: > > >>> > > > >>> > Hi! > > >>> > > > >>> > I've been working through understanding extension types in Arrow. > > >>> It's a great feature, and I've had no problems getting things working > > in > > >>> Java/Scala; however, Python has been a bit of a different story. Not > > that > > >>> I am unable to create and register extension types in Python, but > > rather > > >>> that I can't seem to recreate the functionality provided by the Java > > API's > > >>> ExtensionTypeVector class. > > >>> > > > >>> > In Java, ExtensionType::getNewVector() provides a clear pathway > from > > >>> the registered type to output a vector in something other than the > > >>> underlying vector type, and I am at a loss for how to get this same > > >>> functionality in Python. Am I missing something? > > >>> > > > >>> > Thanks for any hints. > > >>> > -Justin > > >>> > > >> > > >