[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-640250180 Closing the PR again. Please open JIRA issues or review other PRs to provide more feedback. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-636402690 I'll leave this PR open until next week sometime to allow more time for comments This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633240348 The PR has been merged, but the code review feature is still available. Please continue to leave comments and I will address them This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633231974 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633229487 @jorisvandenbossche I fixed the segfault you hit and now have changed the implementation of `Array.filter` to be ``` pc = _pc() options = pc.FilterOptions(null_selection_behavior) return pc.call_function('filter', [self, mask], options) ``` I also rebased -- I am going to merge this when the build is passing so we do not continue to stack changes that need to be reviewed on this PR. I welcome further code reviews on this PR or follow up JIRAs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633218315 @jorisvandenbossche I'll take a look at that segfault > The goal is to eventually have them all go through _pc.call_function()? Yes. `arrow::compute::CallFunction` is general enough to deal with non-value (aka not Array or Scalar) arguments so we should be able to define table-valued functions that dispatch array kernels to a table or record batch's columns > pc.call_function() doesn't yet have a way to pass options. That's also something to be added in the future? Yes, I'll open a JIRA about this This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633154141 I'd like to work on a couple follow-up PRs tomorrow, so in order to unblock these PRs so that they can be reviewed, this PR will have to be merged first. So I would like to do that tomorrow absent problems that need to be fixed. Upon merging this PR, I would immediately reopen it so that others can continue to use the GitHub code review feature to leave comments in mass, and I will certainly respond to those comments and address them or open follow up JIRA issues so they are tracked. Please also see the "build out" umbrella JIRA where I will track the follow on work building out kernels functionality in the project https://issues.apache.org/jira/browse/ARROW-8894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-633051554 The build is passing (the Travis thing is a flake on ARM architecture), thanks @kou for your help with that! I think we should merge this today and I will continue to address comments in this PR in follow up patches. I'll wait until later today but it would be good if others could chime in to make sure this plan is agreeable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632975883 I see the problem, I am fixing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632975274 Ok I will take a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632947370 Note that `kHashSeed` is declared as `uint64_t` so that must be changed also ``` static constexpr uint64_t kHashSeed = 0; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632947188 @kou that's fine, please go ahead and apply it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632945792 Perfect thank you :pray:. I'm done hacking on this for now, I'll keep investigating test failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632945052 I'm just fixing a failing unit test now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632944653 @xhochy @pitrou @kszucs @jorisvandenbossche I just added Python bindings for this new functionality including generic argument packing and function dispatching. The new implementation of `pyarrow.compute.sum` for example is now ``` def sum(array): """ Sum the values in a numerical (chunked) array. Parameters -- array : pyarrow.Array or pyarrow.ChunkedArray Returns --- sum : pyarrow.Scalar """ return call_function('sum', [array]) ``` So we're going to be able to go around the Python codebase and delete a _ton_ of code. I haven't written unit tests for this part (let's do this later), but I added wrappers for the registry, functions, and kernels: ``` In [1]: import pyarrow.compute as pc In [2]: reg = pc.function_registry() In [3]: reg.list_functions() Out[3]: ['add', 'and', 'and_kleene', 'count', 'dict_encode', 'equals', 'filter', 'greater', 'greater_equal', 'invert', 'isin', 'less', 'less_equal', 'match', 'mean', 'minmax', 'not_equals', 'or', 'or_kleene', 'partition_indices', 'sort_indices', 'sum', 'take', 'unique', 'value_counts', 'xor'] In [4]: reg.get_function('equals') Out[4]: arrow.compute.Function kind: scalar num_kernels: 21 In [5]: reg.get_function('equals').list_kernels() Out[5]: [ScalarKernel<(any[bool], any[bool]) -> bool>, ScalarKernel<(any[uint8], any[uint8]) -> bool>, ScalarKernel<(any[uint16], any[uint16]) -> bool>, ScalarKernel<(any[uint32], any[uint32]) -> bool>, ScalarKernel<(any[uint64], any[uint64]) -> bool>, ScalarKernel<(any[int8], any[int8]) -> bool>, ScalarKernel<(any[int16], any[int16]) -> bool>, ScalarKernel<(any[int32], any[int32]) -> bool>, ScalarKernel<(any[int64], any[int64]) -> bool>, ScalarKernel<(any[float], any[float]) -> bool>, ScalarKernel<(any[double], any[double]) -> bool>, ScalarKernel<(any[binary], any[binary]) -> bool>, ScalarKernel<(any[string], any[string]) -> bool>, ScalarKernel<(any[large_binary], any[large_binary]) -> bool>, ScalarKernel<(any[large_string], any[large_string]) -> bool>, ScalarKernel<(any[date32[day]], any[date32[day]]) -> bool>, ScalarKernel<(any[date64[ms]], any[date64[ms]]) -> bool>, ScalarKernel<(any[timestamp(s)], any[timestamp(s)]) -> bool>, ScalarKernel<(any[timestamp(ms)], any[timestamp(ms)]) -> bool>, ScalarKernel<(any[timestamp(us)], any[timestamp(us)]) -> bool>, ScalarKernel<(any[timestamp(ns)], any[timestamp(ns)]) -> bool>] ``` Needless to say very excited about all of this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632937311 Done. It looks like GitHub is having some problems today but will keep an eye on the builds This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632936766 OK, I have it fixed locally. I'm finishing some Python stuff and then I'll push This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632936071 Thank you, I will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632929465 @kou done, thank you for catching this, I think it's better to retain this functionality. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632927540 The IsIn/Match changes don't look too hard, should be able to push them in the next half hour or less This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632926484 I just pushed a fix for an issue that was showing up in some Windows builds and some other cleaning This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632925674 I'm taking a look now, I'll report back in a half hour or so This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632925336 @kou yes, that was an API change. I can try to restore this functionality though? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632919447 Awesome, thanks @kou! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632918704 The R test suite is fixed -- this also turned up some missing test coverage for things that R was depending on https://issues.apache.org/jira/browse/ARROW-8895. I'm going to grind through the pending comments and add the simple Python bindings I talked about above. I think beyond that (and fixing remaining CI issues) the GLib bindings are the main things left. @kou please let me know if I can be of assistance This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632910238 This problem of kernel dispatch with timestamps that may or may not have time zones actually brought out a limitation with input type checking. I'm introducing a simple interface for user-defined type-checking rules. I think this should give us sufficient flexibility for the time being, and if not we can always improve the interfaces. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632824231 I just rebased and pushed MSVC build fixes (that works for me at least locally on VS 2017). I'm going to fix the R failure next and then address the other accumulated comments above. As far as timeline for this, I would ideally like to have this merged on Sunday at the latest. I was planning to take next week mostly off of work so this would unblock PRs into arrow/compute I also would like in the next couple days to write a small follow up patch or two to provide a template for scalar string kernels to help unblock ARROW-555. I will also start creating an umbrella JIRA issue for a more general kernels library buildout (a plan for going from our ~20 kernels to 100+) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632460179 I got sucked into some other things today, I'll focus on getting the CI passing tomorrow so we can hopefully go into the weekend with a green build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632452421 @emkornfield @brills no Python APIs are changed (not sure how much of the C++ API TFX uses) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632430558 I rebased without pushing any new changes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632368218 @kou yes please go ahead This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632167817 @nealrichardson yeah, I will fix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-632131314 Aside from fixing the builds, I'm going to do a few things to help with this: * Add as many comments as possible to the new header files * Add some high level explanation to compute/README.md about how kernels work and how to begin to develop a new one * Add simple Python bindings for `FunctionRegistry`, `Function`, and `Kernel` so that we can inspect the contents of the registry and the available kernels (and their type signatures) for each function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7240: ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
wesm commented on pull request #7240: URL: https://github.com/apache/arrow/pull/7240#issuecomment-631878670 For code reviewers: * The code diff is a bit deceiving because many kernel files were renamed to make their taxonomy more clear * I suggest focusing your energies on the files in the arrow/compute top level directory. This is almost entirely new code, what was there before is almost all gone I am aware that there are some obviously messy things that will need to be cleaned up a bit, but we have to start somewhere. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org