[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568620#comment-16568620 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-410344149 @ilooner, you are the man! Thanks much! The next PR will be available next week sometime. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568586#comment-16568586 ] ASF GitHub Bot commented on DRILL-6373: --- ilooner commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-410338390 @paul-rogers All tests passed. Your ordeal is over, I've merged the PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567763#comment-16567763 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-410142512 @ilooner, the PR in question was a different one; one that tried a more general fix for the concurrent vector update issue. Thanks for rerunning this one so we can see if the workaround works. I did just rebase the code, so hopefully you will find no issues when you do it again. Thanks much for your help on this. Once this goes in, I can add the final vector types, then move on to the text and JSON readers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567756#comment-16567756 ] ASF GitHub Bot commented on DRILL-6373: --- ilooner commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-410141154 @paul-rogers My apologies, I don't recall discussing or disagreeing to any fix to the value vectors. I am in favor of whatever it takes to get this change in asap. I will try rebasing this PR ontop of the latest master. If the tests pass I will merge it. If the tests don't pass I will try to help debug it or provide a unit test that reproduces the issue. Will keep you posted. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566325#comment-16566325 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-409801351 @vrozov, thanks for the explanation. This is exactly what we've discussed multiple times over the last several months. In fact, I offered a fix for that problem, but it seemed that @ilooner disagreed with the approach and I don't have time to pursue it further. Hopefully your `MaterializedField` refactoring will fix it. The whole reason this PR has been stalled is due to a bug that showed up only in the pre-commit tests when @Ben-Zvi tried to commit it. I've been stuck ever since. I'm pretty much at a dead end on this PR. If the bug is pre-existing and only occurs in the private MapR pre-commit tests, then there is not much I can do to fix it. This PR has been revised to provide a workaround for the only vector change that this PR introduced. It would be very useful to please re-run the pre-commit tests to see if the workaround corrected the issue that showed up only with this PR in the pre-commit tests. The other, pre-existing issue would be great to fix, just not in this PR. As I've noted, I'm getting rusty on the code and have other commitments, and so it would be hard for me to fix bugs unrelated to this work. Is there a reason to hold up another trial run of this PR while waiting for a fix for the map problem? As this is stalled, others are slowly recreating the work already completed here. Seems a terrible waste of resources for a small team that has quite a bit of work ahead of it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551787#comment-16551787 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406811356 @paul-rogers The change to `NullableVarCharVector` that was part of your PR exposed a bug that is part of the existing code. Please see `AbstractMapVector.java:48`. The problem is that a newly created map vector references a newly created deep copy(clone) of the passed `MaterializedField` while newly created child vectors of the map vector references children of **another** instance of the `MaterializeField`. If that **other** instance is completely immutable it would not cause any problem (except that now there is no reason to create deep copy), but with your changes there was an attempt to mutate from different threads the **other** instance as that **other** instance is used to create multiple outgoing vectors. I hope that this explains what I mean when referring to **existing** bug. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551293#comment-16551293 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406725968 @vrozov, if the bug was not introduced by this PR, then why have I been trying for months to fix it without access to the test environment? Why did the pre-commit tests pass before this PR, but fail with this PR? Just double-checked. This PR did not touch `AbstractMapVector` and so could not have introduced new issues in that vector. This PR did change the nullable vector. That change has been backed out and an alternative fix provided. Could the team Kindly fix the pre-existing bug so that the tests run clean before we attempt to retest this PR? Or, can the team re-run the pre-commit tests so that we know if the work around solves the issue, or if there is some deeper issue to fix? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551014#comment-16551014 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406668760 @paul-rogers I disagree that the bug was introduced in the prior changes to this PR. The bug is already part of `AbstractMapVector` that clones MaterializedField but modifies children of the original `MaterializedField` and not the clone. In combination with changes to `NullableVarCharVector` it triggered functional test failures. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550937#comment-16550937 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406650835 @vrozov, perhaps a bit more explanation would help. The bug that you described in the above comments was due (it seems) to a change I made in nullable vector to make the `value` vector carry the correct metadata. This caused problems when we "cloned" that vector. My first fix attempted to fix that secondary issue (the cloning issue). After strong push-back from @sohami, I decided we can't afford to fix the metadata inconsistency (at least not by someone such as myself that is getting rusty on the code.) So, instead, I backed out the metadata change, which should fix the vector clone issue. Instead, in the result set loader code, I "parse" the vector type and handle the metadata inconsistency. As a result, the tests for this PR pass. I do need someone, however, to rerun the pre-commit tests to determine if the clone issue has gone away; we have no unit tests that test that particular case. If we can run those tests, and they pass, then perhaps we can finally commit this PR so I can add the final bit of code change in this series that we've been trying to get in for six months now... This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549692#comment-16549692 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406380183 @vrozov, not needed. The latest commit provides a different solution. The ResultSetLoader unit tests pass. Next step is for someone at MapR to try the pre-commit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549635#comment-16549635 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406364011 @paul-rogers Did you try to pull changes from PR #1383? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549619#comment-16549619 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-406359589 @vrozov or @sohami: Can you take a look at the latest commit with an alternative solution for inconsistent metadata? If this fix is OK, then I'll mark this as ready to commit, since it previously received a +1. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546103#comment-16546103 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-405479402 Revised with an alternative fix to the "lying nullable vector values" metadata problem. Rebased on latest master. Should be ready for another attempt at merging to master; another pre-commit test run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543911#comment-16543911 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-404987892 @sohami, the cost of my proposed fix has exceeded its benefit -- we're just not converging. I've closed that PR and will look for another solution. Rather than use the `MaterializedField` to get the type, I'll ad code that does a switch statement on the vector type to learn the "real" type, leaving the `MaterializedField` to hold the pretend type for the `values` field of a `Nullable` vector. This should not be hard: we already have generated code that parses the vector type. I can use this to manufacture a `MajorType` that matches the vector class; bypassing the need for correct `MaterializedField` data, and thus eliminating the need to change the vector code. Revision to be done soon. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543809#comment-16543809 ] ASF GitHub Bot commented on DRILL-6373: --- sohami commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-404968576 @paul-rogers - I have posted couple of questions for this change on other PR (https://github.com/apache/drill/pull/1367) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543642#comment-16543642 ] Karthikeyan Manivannan commented on DRILL-6373: --- [~paul-rogers] The functional test failed with some plan verification failures but I doubt it is because of your change. The log is attachedĀ [^6373_Functional_Fail_07_13_1300.txt] > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: 6373_Functional_Fail_07_13_1300.txt, > drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542223#comment-16542223 ] ASF GitHub Bot commented on DRILL-6373: --- bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-404660067 @vrozov @paul-rogers I have started a test run. Will update the PR once it is done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540830#comment-16540830 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-404337181 Revised the vector-clone fix. Found a simpler solution that requires just one line of code change. @vrozov or @bitblender, please run a pre-commit test on this PR. If test pass, then PR #1367 can perhaps be included in the next batch commit, this PR in the following one (since it has already obtained an approval). Then, in two weeks, I can provide the final PR for the result set loader work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537899#comment-16537899 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403678211 @bitblender, thanks for running the pre-commit tests. As it turned out, I changed the wrong file originally: I changed a "Partition something" that turns out to not be used, rather than the `PartitionSenderTemplate` which I should have changed. Sigh... Anyway, cherry-picked the revised change. This should (fingers crossed) fix the partition sender issue. Please run the test again so we can see if any other code creates a new vector from a "recycled" materialized field. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537679#comment-16537679 ] Karthikeyan Manivannan commented on DRILL-6373: --- The functional test failed. The logs are attachedĀ [^drill-6373-with-6585-fix-functional-failure.txt] > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Attachments: drill-6373-with-6585-fix-functional-failure.txt > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537201#comment-16537201 ] ASF GitHub Bot commented on DRILL-6373: --- bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403540185 @ppadma @vrozov @paul-rogers I meant I am starting pre-commit tests on this branch with the cherry-pick from 6585 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537186#comment-16537186 ] ASF GitHub Bot commented on DRILL-6373: --- bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403537844 @ppadma @paul-rogers @vrozov I just triggered a pre-commit test run using paul-rogers:DRILL-6585. Will post the results once it is done. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536535#comment-16536535 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers edited a comment on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757 Rebased on latest master. Cherry-picked the fix from DRILL-6585 to this branch. @vrozov or @ppadma - can you kick off a pre-commit build to see if the DRILL-6586 fix resolves the failure we saw earlier? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536534#comment-16536534 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757 Rebased on latest paster. Cherry-picked the fix from DRILL-6585 to this branch. @vrozov or @ppadma - can you kick off a pre-commit build to see if the DRILL-6586 fix resolves the failure we saw earlier? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511539#comment-16511539 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-397045424 @vrozov, I'll make the required changes as a separate PR once my schedule allows. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510511#comment-16510511 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396787878 I would prefer the second approach but instead of introducing `newVector()` method, will it be possible to use a different signature for cloning and not cloning `getNewVector()`? I don't care much whether the method name that does not clone is `newVector()` or `getNewVector()`, but I would prefer that method that does not clone does not accept `MaterializedField` as a parameter at all. It should construct it instead similar to what `getNewVector(String name, ...)` already does. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510489#comment-16510489 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396781670 The specific fix proposal must reflect that the code doing the copying is generated: ``` at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381) ~[na:na] ``` So, one choice is to create a new method, `duplicateVector` that 1) clones the `MaterializedField` and 2) calls `getNewVector`. Then we change the generated code in the partitioner to call the new method. The other choice is to create a `newVector()` method that does what `getNewVector()` does now, and to revise `getNewVector()` to do the clone. We could then review all uses of `getNewVector()` to determine if we should use the non-cloning `newVector()` instead. This forces us to think about which is needed in each case, but forces generated code to use the "most safe" path. Thoughts @vrozov? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509907#comment-16509907 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396669212 @vrozov, I agree that there is a bug somewhere. We are faced with a dilemma. On the one hand, we have the set of constraints described earlier. On the other hand, we have a preference about not mutating `MaterializedField` and that `PartitionSender` should not have to clone `MaterializedField`s when making copies. How do you propose we reconcile the technical constraint with the preference? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509212#comment-16509212 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396471425 @paul-rogers I did not check the proposed solution, there may be bugs in it, but the main point is that when `PartitionSender` creates new vectors it should not mutate `MaterializedField` of the `incoming` batch vectors. It should be possible to construct a new vector based on an existing vector including two, three or more levels deep maps or other not flat vectors. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509171#comment-16509171 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396462788 @vrozov, by the way thanks for taking the time to work on this -- a huge help since I can't run the functional tests myself. If all this seems overly complex, I agree that it is. I spent many hours working out all these twists and turns. The original design is broken and was never cleanly revised once maps were added. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509164#comment-16509164 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396462584 @vrozov, your proposed solution works for the single-level map case. It does *not* work for the two-level map case as the `MaterializedField` used inside the inner map is not the one passed from the outer map. Hacky code would be needed to pass in the `MaterializedField`, then fetch it to add it to the outer map's `MaterializedField`. And, even the above hack does not work for three-level maps. So, what else can we do? We can note one additional problem with maps. If we simply take a map field `m` from batch B1 and use it to create a map in batch B2, we have a dilemma. The `m` `MaterializedField` already contains "child" entries for `a` and `b`, say. A similar issue exists for any vector with structure: VarChar (offsets and data), Nullable (bits and data), etc. So, what to do? The new code, of which this PR is a part, faced similar difficulties with the additional complexity of the `ResultSetLoader` which must clone a vector to produce the "overflow" vector. (This is the reason for fixing the Nullable vectors so that the data vector has its type changed to Required, so that the vector clone works correctly.) So, we get our solution. Clone the `MaterializedField` in the `PartitionSender` *before* calling `BasicTypeHelper.getNewVector`. We might thing that `getNewVector()` can do the clone. But, it is used by `ResultSetLoader` assuming that the new vector will use the `MaterializedField` provided. And, in the vast number of cases, we don't need a clone. It is only when using one vector to create another that we need the clone, so it must be done in the caller. In this case, that is `PartitionSender`. Does this make sense? Maybe the easiest thing is to just make the change and do a test run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509163#comment-16509163 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396462584 @vrozov, your proposed solution works for the single-level map case. It does *not* work for the two-level map case as the `MaterializedField` used inside the inner map is not the one passed from the outer map. Hacky code would be needed to pass in the `MaterializedField`, then fetch it to add it to the outer map's `MaterializedField`. And, even the above hack does not work for three-level maps. So, what else can we do? We can note one additional problem with maps. If we simply take a map field `m` from batch B1 and use it to create a map in batch B2, we have a dilemma. The `m` `MaterializedField` already contains "child" entries for `a` and `b`, say. A similar issue exists for any vector with structure: VarChar (offsets and data), Nullable (bits and data), etc. So, what to do? The new code, of which this PR is a part, faced similar difficulties with the additional complexity of the `ResultSetLoader` which must clone a vector to produce the "overflow" vector. (This is the reason for fixing the Nullable vectors so that the data vector has its type changed to Required, so that the vector clone works correctly.) (If all this seems overly complex, I agree that it is. I spent many hours working out all these twists and turns. The original design is broken and was never cleanly revised once maps were added.) So, we get our solution. Clone the `MaterializedField` in the `PartitionSender` *before* calling `BasicTypeHelper.getNewVector`. We might thing that `getNewVector()` can do the clone. But, it is used by `ResultSetLoader` assuming that the new vector will use the `MaterializedField` provided. And, in the vast number of cases, we don't need a clone. It is only when using one vector to create another that we need the clone, so it must be done in the caller. In this case, that is `PartitionSender`. Does this make sense? Maybe the easiest thing is to just make the change and do a test run. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509153#comment-16509153 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396460011 @vrozov, this is what I meant by the code only working well for a flat vector. Maps have a `MaterializedField` for their own metadata. That metadata contains a list of the child metadata. So: ``` m:Map, {a:Int, b:Varchar} . a:INT . b:Varchar ``` This means, as we add `a`, then, `b` to map `m`, we must mutate the `MaterializedField` for `m` to add the child fields. This completely breaks the idea that a `MaterializedField` is always immutable. This is a place where a design goal (immutable `MateraializedField`) collides with implementation (readers add to maps as they find new fields). So. We could make a clone on each modification and problem solved, right? As in many places in Drill, the solution is not so simple. The above is the easy case. What about a nested map: ``` m1:Map, {m2:Map {a:Int, b:Varchar}} . m2:Map, {a:Int, b:Varchar} . . a:INT . . b:Varchar ``` Now when we add `a`, we have to update both the `m1` and `m2` `MaterializedField`s. If we clone the MaterializedField for `m2`, then the old version in `m1` will get out of sync. The result will be: ``` m1:Map, {m2:Map {}} . m2:Map, {a:Int, b:Varchar} . . a:INT . . b:Varchar ``` Code that depends on accurate schema information then breaks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509147#comment-16509147 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396460011 @vrozov, this is what I meant by the code only working well for a flat vector. Maps have a `MaterializedField` for their own metadata. That metadata contains a list of the child metadata. So: ``` m:Map, {a:Int, b:Varchar} . a:INT . b:Varchar ``` This means, as we add `a`, then, `b` to map `m`, we must mutate the `MaterializedField` for `m` to add the child fields. This completely breaks the idea that a `MaterializedField` is always immutable. This is a place where a design goal (immutable `MateraializedField`) collides with implementation (readers add to maps as they find new fields). So. We could make a clone on each modification and problem solved, right? As in many places in Drill, the solution is not so simple. The above is the easy case. What about a nested map: ``` m1:Map, {m2:Map {a:Int, b:Varchar}} . m2:Map, {a:Int, b:Varchar} . . a:INT . . b:Varchar ``` Now when we add `a`, we have to update both the `m1` and `m2` `MaterializedField`s. If we clone the MaterializedField for `m2`, then the old version in `m1` will get out of sync. The result will be: ``` m1:Map, {m2:Map {}} . m2:Map, {a:Int, b:Varchar} . . a:INT . . b:Varchar ``` Code that depends on accurate schema information then breaks. So, we're left with a choice: clone and have a corrupt schema, or make `MaterializedField` mutable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508224#comment-16508224 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396287379 @paul-rogers Please track the stack trace for the modification part. There is an attempt to add a child to the children of a vector that is part of the `incoming` batch. Please see `PartitionerTemplate.java:381`: ``` ValueVector outgoingVector = TypeHelper.getNewVector(v.getField(), allocator); ``` The existing assumption is that `v.getField()` is read-only (immutable), but it is passed to `AbstractMapVector` that iterates over passed `field` children instead of using cloned version. Then, `child` is passed to `NullableVarCharVector` that again modifies passed parameter. Please consider changing constructor of vectors to clone passed in field and using cloned version to add child: ``` BaseValueVector.java (add clone()): protected BaseValueVector(MaterializedField field, BufferAllocator allocator) { this.field = Preconditions.checkNotNull(field, "field cannot be null").*clone*(); this.allocator = Preconditions.checkNotNull(allocator, "allocator cannot be null"); } NullableVarCharVector.java (use cloned version): public NullableVarCharVector(MaterializedField field, BufferAllocator allocator) { super(field, allocator); // replace field with it's clone *field = getField();* // The values vector has the same type and attributes // as the nullable vector, but with a mode of required. This ensures that // things like scale and precision are preserved in the values vector. // For backward compatibility, the values vector must have the same // name as the enclosing vector. // Setting the child to REQUIRED is a change, prior it was OPTIONAL. But // if we then use the field to create a vector, we get the wrong type // (we get an optional child vector when we want required.) values = new VarCharVector( MaterializedField.create(field.getName(), field.getType().toBuilder() .setMode(DataMode.REQUIRED) .build()), allocator); field.addChild(bits.getField()); field.addChild(values.getField()); accessor = new Accessor(); } AbstractMapVector.java (use cloned version): protected AbstractMapVector(MaterializedField field, BufferAllocator allocator, CallBack callBack) { super(field.clone(), allocator, callBack); // replace field with it's clone *field = getField();* // create the hierarchy of the child vectors based on the materialized field for (MaterializedField child : field.getChildren()) { if (child.getName().equals(BaseRepeatedValueVector.OFFSETS_FIELD.getName())) { continue; } final String fieldName = child.getName(); final ValueVector v = BasicTypeHelper.getNewVector(child, allocator, callBack); putVector(fieldName, v); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507697#comment-16507697 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396126712 Thanks much @vrozov for the analysis. I must say I'm a bit stumped. Value vectors are clearly not designed for concurrent modification. That is not simply a code bug, it is a fundamental design decision. Somewhere in code or documentation I recall a statement that says that value vectors are meant to be created once (by a single thread), then be immutable thereafter. It should be perfectly fine for any number of readers, in separate threads, to access the vector once it has entered its immutable phase. But, nothing about vectors allows concurrent access while mutable. What is going on in this use case to cause concurrent modification. Is that a "bug" or a "feature"? In the stack trace you provided, both threads are creating a new vector, which should not cause a conflict. If, however, they are modifying the same record batch, then we are violating a design assumption that, like vectors, batches are immutable once created, and that each batch is mutated by a single thread. The one other possibility is that a bit of code has a bug that is modifying the immutable schema when it should be modifying the mutable one (if working with two vectors), but I'm not sure how that could happen since code that adds fields is not aware of other vectors. Also, AFAIK, while I did change some code to keep metadata in sync (the design of `MaterializedField` really works only for simple vectors; it is a muddle for complex vectors such as maps), the changes only apply to the mutable stage of a vector's lifecycle. Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507485#comment-16507485 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396077437 Another issue that I noticed and that is likely related to the same changes to `NullableValueVectors.java` is multiple attempts to add the same element to `children` collection on different threads. Due to `Set` nature of `children` collection only one attempt is supposed to succeed, but it is not guaranteed as `children` is not a concurrent collection and access is not synchornized accross threads. ``` 2018-06-10 10:57:30,024 [Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-3635] ERROR o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children fee2fd9, size 2, mod count 2 2018-06-10 10:57:30,024 [Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-4243] ERROR o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children fee2fd9, size 2, mod count 2 2018-06-10 10:57:30,024 [Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-656] ERROR o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children fee2fd9, size 3, mod count 3 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507481#comment-16507481 ] ASF GitHub Bot commented on DRILL-6373: --- vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-396076908 Changes in partition sender inroduced back in April are not related to the `ConcurrentModificationException` . The regression is introduced in `NullableValueVectors.java:110` as it tries to modify `children` collection while the same collection is iterated over in a different thread. Please take a look at the stack traces for the competing threads: ``` at org.apache.drill.exec.record.MaterializedField.addChild(MaterializedField.java:132) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.vector.NullableVarBinaryVector.(NullableVarBinaryVector.java:132) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1490) [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.vector.complex.AbstractMapVector.(AbstractMapVector.java:56) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.vector.complex.MapVector.(MapVector.java:65) [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1275) [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261) [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381) [na:na] at org.apache.drill.exec.test.generated.PartitionerGen33655.flushOutgoingBatches(PartitionerTemplate.java:173) [na:na] at org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute(PartitionerDecorator.java:285) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run(PartitionerDecorator.java:340) [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] ``` and ``` at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) ~[na:1.8.0_161] at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) ~[na:1.8.0_161] at org.apache.drill.exec.record.MaterializedField.withPathAndType(MaterializedField.java:206) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.MaterializedField.clone(MaterializedField.java:185) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.MaterializedField.withPathAndType(MaterializedField.java:207) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.record.MaterializedField.clone(MaterializedField.java:185) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.vector.complex.AbstractMapVector.(AbstractMapVector.java:49) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.vector.complex.MapVector.(MapVector.java:65) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1275) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261) ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381) ~[na:na] at org.apache.drill.exec.test.generated.PartitionerGen33655.flushOutgoingBatches(PartitionerTemplate.java:173) ~[na:na] at org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute(PartitionerDecorator.java:285) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] at org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run(PartitionerDecorator.java:340) ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT] ... 3 common frames omitted ``` This is an automated message from the Apache Git
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494285#comment-16494285 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-392947880 Thanks @ppadma! If you can explain how to reproduce the issue without using the full functional tests, I can take a look. In case it leads to anything... Earlier changes around `MaterializedField` did try to keep the various views of data in sync. The map's `MaterializedField` is supposed to contain, as children, all the map members. This means that changes to any map field requires a matching change to the children. This didn't always happen, leading to bugs that earlier PRs attempted to fix. Not sure if that has an impact here. The other question is whether the partition sender is the only code that calls the `clone()` method. Do we have unit tests for that method? Is there some interaction in cloning that reaches back into the original child list? A quick scan of the code didn't suggest any, which is why I wonder if the change occurs in another thread; perhaps triggered as a result of recent fixes in the partitioned sender. Please let us know what you find, then I'd be happy to fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492906#comment-16492906 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-392594956 @Ben-Zvi, the stack trace indicates an error in `MaterializedField`, but that file is not changed by this commit. Nor are any of the other classes in the call stack. The test is in an HBase test, which is a bit hard to set up. As you know, it is not possible to run functional tests outside of MapR -- which is why only MapR folks can do the actual commits. The failure occurs inside the partitioned sender, generated code, when attempting to clone a `MaterializedField` for a Map. Looking at the code, the only way the `MaterializedField.clone()` method can fail is if another thread is adding/removing members to the map while the thread that failed is attempting to clone the map schema. I'm not familiar enough with the partioned sender to track down the issue. Looking at history, it appears we modified the partition sender back in April. I wonder if that triggered this failure? Can someone who knows more about that class check if there is some way that the same `MaterializedField` is being used across threads, and mutated in one of them? Otherwise, this may be a race condition; try running the test again and the problem may go away. That may show that the problem is not in this particular PR, but rather a pre-existing problem that requires a separate research effort. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492155#comment-16492155 ] ASF GitHub Bot commented on DRILL-6373: --- Ben-Zvi commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-392364222 There was a failure (after rebasing on the latest master), in the functional test emptyHbaseLeftJoin.sql : ``` select t.data.partkey, t1.data.ps_partkey from schema_change_empty_batch_empty as t LEFT JOIN schema_change_empty_batch_partsupp as t1 ON t.data.partkey = t1.data.ps_partkey Exception: java.sql.SQLException: SYSTEM ERROR: ConcurrentModificationException Fragment 2:0 [Error Id: bfb0598f-e598-453b-93f2-699bd87a0061 on drill194:31010] (java.util.ConcurrentModificationException) null java.util.LinkedHashMap$LinkedHashIterator.nextNode():719 java.util.LinkedHashMap$LinkedKeyIterator.next():742 org.apache.drill.exec.record.MaterializedField.withPathAndType():144 org.apache.drill.exec.record.MaterializedField.clone():125 org.apache.drill.exec.record.MaterializedField.withPathAndType():145 org.apache.drill.exec.record.MaterializedField.clone():125 org.apache.drill.exec.vector.complex.AbstractMapVector.():49 org.apache.drill.exec.vector.complex.MapVector.():65 org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1275 org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1261 org.apache.drill.exec.test.generated.PartitionerGen84661$OutgoingRecordBatch.initializeBatch():381 org.apache.drill.exec.test.generated.PartitionerGen84661.flushOutgoingBatches():173 org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute():285 org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run():340 java.util.concurrent.ThreadPoolExecutor.runWorker():1149 java.util.concurrent.ThreadPoolExecutor$Worker.run():624 java.lang.Thread.run():748 at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) at org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1904) at org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:64) at oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630) at org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109) at org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120) at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:638) at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200) at oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:149) at oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218) at org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:110) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:210) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: ConcurrentModificationException Fragment 2:0 [Error Id: bfb0598f-e598-453b-93f2-699bd87a0061 on drill194:31010] (java.util.ConcurrentModificationException) null java.util.LinkedHashMap$LinkedHashIterator.nextNode():719 java.util.LinkedHashMap$LinkedKeyIterator.next():742 org.apache.drill.exec.record.MaterializedField.withPathAndType():144 org.apache.drill.exec.record.MaterializedField.clone():125 org.apache.drill.exec.record.MaterializedField.withPathAndType():145 org.apache.drill.exec.record.MaterializedField.clone():125 org.apache.drill.exec.vector.complex.AbstractMapVector.():49 org.apache.drill.exec.vector.complex.MapVector.():65 org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1275 org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1261
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488534#comment-16488534 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-391610102 Rebased onto latest master. Fixed check style issues due to new rules. This one is ready to commit; can someone merge it and run the tests? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480637#comment-16480637 ] ASF GitHub Bot commented on DRILL-6373: --- arina-ielchiieva commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-390205193 @paul-rogers there are some check styles failures, please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475122#comment-16475122 ] ASF GitHub Bot commented on DRILL-6373: --- paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-389006322 Thanks @ppadma! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474985#comment-16474985 ] ASF GitHub Bot commented on DRILL-6373: --- ppadma commented on issue #1244: DRILL-6373: Refactor Result Set Loader for Union, List support URL: https://github.com/apache/drill/pull/1244#issuecomment-388986616 @paul-rogers LGTM. +1. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459404#comment-16459404 ] ASF GitHub Bot commented on DRILL-6373: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1244 @ppadma, here is the next step in our long saga to check in the batch sizing work. Please give it a review at your convenience. Thanks! > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the "batch sizing" enhancements, refactor the > {{ResultSetLoader}} and related classes to prepare for Union and List > support. This fix follows the refactoring of the column accessors for the > same purpose. Actual Union and List support is to follow in a separate PR. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support
[ https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459402#comment-16459402 ] ASF GitHub Bot commented on DRILL-6373: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/1244 DRILL-6373: Refactor Result Set Loader for Union, List support This PR builds on the previous refactoring of the column accessors to prepare for Union, (non-repeated) List and Repeated List support. The PR includes four closely related changes divided across four commits: ### Correct the Type of the Data Vector in a Nullable Vector The nullable vectors contain a "bits" vector and a "data" vector. The data vector has historically been created using the same `MaterializedField` as the nullable vector, meaning that the data vector is labeled as "nullable" even though it has no bits vector. This PR creates a clone `MaterializedField` with the same name as the outer nullable vector, but with a Required type. This change ensures that the overflow logic works correctly as it uses the vector metadata (in the `MaterializedField`) to know what kind of vector to create for the "lookahead" vector. ### Result Set Loader Refactor The second commit pretty much just rearranges the deck chairs in a way that we an slot in the new types in the next PR. The need for the changes can be seen in the full code set (the union and list support was pulled out for this PR.) A union is a container, like a map, so the tuple state was refactored to create a common parent container state. List and unions are very complex to build, so the code to build the internal workings of each vector was pulled out into a separate builder class. ### Projection Handling and the Vector Cache Previous versions of the result set loader handled projection and a cache for vectors reused across readers in the same Scan operator. Once we introduce nested maps, projection within maps, unions and lists, projection gets much more complex, as does vector caching. This PR adds logic to support projection and vector caching to any arbitrary level of maps. It turns out that handling projection of an entire map, and projection of fields within maps, is far more complex than you'd think, requiring quite a bit of internal state to keep everything straight. The result is that we can now handle a map `m` with three fields `{a, b, c}` and project just one of them, `m.a`, say. Further, Drill allows projection of non-existent columns. So, we might ask for field `m.d` which does not exist in the above map. The projection mechanism handles this case as well, creating the right kind of null column. ### Unit Tests New tests are added to exercise the projection and cache mechanisms. Existing tests were updated for the changes made in the refactoring. ### Reference Design All of this work is done in support of the overall "batch sizing" project explained [here](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades). You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-6373 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1244.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1244 commit 7df4280f3011862b84b43240c8a07e0bf019745d Author: Paul RogersDate: 2018-05-01T02:10:53Z DRILL-6373: Fix nullable vector data vector type Fixes the type of the data vector within a nullable vector. The data vector is Required (has no bits vector.) Accurate metadata is required for proper overflow handling in the result set loader. commit 9496ef681f19f03ccd735f2c9b18f6d914eae3e2 Author: Paul Rogers Date: 2018-05-01T02:15:33Z DRILL-6373: Refactor result set loader commit 74675436ae1efdf66deafeaa27b281d169e274ad Author: Paul Rogers Date: 2018-05-01T02:16:48Z DRILL-6373: Revised projection & vector cache commit 04598c0dbdbbff5ecbc2f89d02b14f66982f86bd Author: Paul Rogers Date: 2018-05-01T02:17:22Z DRILL-6373: Revised & added unit tests > Refactor the Result Set Loader to prepare for Union, List support > - > > Key: DRILL-6373 > URL: https://issues.apache.org/jira/browse/DRILL-6373 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.14.0 > > > As the next step in merging the