Hi all, This is another digest of open backlog issues, following the initial post last month. It looks like we were able to close 3 of the 5 mentioned, so thanks to those who picked up the issues and drove them to completion. New list of issues as follows:
- Enable coverage CI jobs: https://github.com/apache/datafusion/issues/3678 - I don't have much to add here as I haven't looked at it much, but considering its separate from the main codebase (CI related) it might be good to pickup for those not looking to go deep on the main codebase - Some aggregates silently ignore IGNORE NULLS and ORDER BY on arguments: https://github.com/apache/datafusion/issues/9924 - This is mainly a review task, to check our aggregate functions to ensure that if they support IGNORE NULLS they handle it appropriately, and if they can be affected by ORDER BY they also handle that appropriately - Support complete distinct usage for aggregate expressions: https://github.com/apache/datafusion/issues/2406 - This is similar to above in needing to do a comprehensive review of our aggregate functions to see where fixes are needed; we'd need to look at which aggregate functions already check for distinct and explicitly error out, to see if we can implement handling for distinct, but we'd also need to look at functions that DON'T check for distinct and incorrectly compute their result without erroring out. I believe nth_value is of the latter kind - Binary string (BYTEA, Binary) concatenation: https://github.com/apache/datafusion/issues/12709 - More of a straight forward implementation (I hope!); I've left a comment on the issue with some pointers - DataFrame write api should accept Overwrite option when the file already exist: https://github.com/apache/datafusion/issues/4986 - This is a bit interesting because the existing behaviour for the API is not really intuitive; I've left a comment with more details on the issue I'll repost the two remaining issues from last time: - Support ANY operator: https://github.com/apache/datafusion/issues/2548 - There was an attempt made but the PR went stale; I've closed the PR for now to allow someone else to volunteer to pick it up, but it's worth taking a look at the previous PR to see if we could build on top of it - Support ALL operator: https://github.com/apache/datafusion/issues/2547 - I think this will have a dependency on the ANY operator, as it'll be easier to implement this in terms of that operator (see discussion on issue), so this can wait As before, I'll be happy to help review PRs and provide as much clarification as I can for these issues, so feel free to tag me on them. Cheers, Jeffrey
