[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568620#comment-16568620
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410344149
 
 
   @ilooner, you are the man! Thanks much! The next PR will be available next 
week sometime. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568586#comment-16568586
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

ilooner commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410338390
 
 
   @paul-rogers All tests passed. Your ordeal is over, I've merged the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567763#comment-16567763
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410142512
 
 
   @ilooner, the PR in question was a different one; one that tried a more 
general fix for the concurrent vector update issue. Thanks for rerunning this 
one so we can see if the workaround works.
   
   I did just rebase the code, so hopefully you will find no issues when you do 
it again.
   
   Thanks much for your help on this. Once this goes in, I can add the final 
vector types, then move on to the text and JSON readers. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567756#comment-16567756
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

ilooner commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-410141154
 
 
   @paul-rogers My apologies, I don't recall discussing or disagreeing to any 
fix to the value vectors. I am in favor of whatever it takes to get this change 
in asap. I will try rebasing this PR ontop of the latest master. If the tests 
pass I will merge it. If the tests don't pass I will try to help debug it or 
provide a unit test that reproduces the issue. Will keep you posted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-08-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566325#comment-16566325
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-409801351
 
 
   @vrozov, thanks for the explanation. This is exactly what we've discussed 
multiple times over the last several months. In fact, I offered a fix for that 
problem, but it seemed that @ilooner disagreed with the approach and I don't 
have time to pursue it further. Hopefully your `MaterializedField` refactoring 
will fix it.
   
   The whole reason this PR has been stalled is due to a bug that showed up 
only in the pre-commit tests when @Ben-Zvi tried to commit it. I've been stuck 
ever since.
   
   I'm pretty much at a dead end on this PR. If the bug is pre-existing and 
only occurs in the private MapR pre-commit tests, then there is not much I can 
do to fix it.
   
   This PR has been revised to provide a workaround for the only vector change 
that this PR introduced. It would be very useful to please re-run the 
pre-commit tests to see if the workaround corrected the issue that showed up 
only with this PR in the pre-commit tests.
   
   The other, pre-existing issue would be great to fix, just not in this PR. As 
I've noted, I'm getting rusty on the code and have other commitments, and so it 
would be hard for me to fix bugs unrelated to this work. 
   
   Is there a reason to hold up another trial run of this PR while waiting for 
a fix for the map problem?
   
   As this is stalled, others are slowly recreating the work already completed 
here. Seems a terrible waste of resources for a small team that has quite a bit 
of work ahead of it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551787#comment-16551787
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406811356
 
 
   @paul-rogers The change to `NullableVarCharVector` that was part of your PR 
exposed a bug that is part of the existing code. Please see 
`AbstractMapVector.java:48`. The problem is that a newly created map vector 
references a newly created deep copy(clone) of the passed `MaterializedField` 
while newly created child vectors of the map vector references children of 
**another** instance of the `MaterializeField`. If that **other** instance is 
completely immutable it would not cause any problem (except that now there is 
no reason to create deep copy), but with your changes there was an attempt to 
mutate from different threads the **other** instance as that **other** instance 
is used to create multiple outgoing vectors. I hope that this explains what I 
mean when referring to **existing** bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551293#comment-16551293
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406725968
 
 
   @vrozov, if the bug was not introduced by this PR, then why have I been 
trying for months to fix it without access to the test environment? Why did the 
pre-commit tests pass before this PR, but fail with this PR? Just 
double-checked. This PR did not touch `AbstractMapVector` and so could not have 
introduced new issues in that vector.
   
   This PR did change the nullable vector. That change has been backed out and 
an alternative fix provided.
   
   Could the team Kindly fix the pre-existing bug so that the tests run clean 
before we attempt to retest this PR? 
   
   Or, can the team re-run the pre-commit tests so that we know if the work 
around solves the issue, or if there is some deeper issue to fix?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551014#comment-16551014
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406668760
 
 
   @paul-rogers I disagree that the bug was introduced in the prior changes to 
this PR. The bug is already part of `AbstractMapVector` that clones 
MaterializedField but modifies children of the original `MaterializedField` and 
not the clone. In combination with changes to `NullableVarCharVector` it 
triggered functional test failures.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-20 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550937#comment-16550937
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406650835
 
 
   @vrozov, perhaps a bit more explanation would help. The bug that you 
described in the above comments was due (it seems) to a change I made in 
nullable vector to make the `value` vector carry the correct metadata. This 
caused problems when we "cloned" that vector. My first fix attempted to fix 
that secondary issue (the cloning issue). After strong push-back from @sohami, 
I decided we can't afford to fix the metadata inconsistency (at least not by 
someone such as myself that is getting rusty on the code.)
   
   So, instead, I backed out the metadata change, which should fix the vector 
clone issue. Instead, in the result set loader code, I "parse" the vector type 
and handle the metadata inconsistency.
   
   As a result, the tests for this PR pass. I do need someone, however, to 
rerun the pre-commit tests to determine if the clone issue has gone away; we 
have no unit tests that test that particular case.
   
   If we can run those tests, and they pass, then perhaps we can finally commit 
this PR so I can add the final bit of code change in this series that we've 
been trying to get in for six months now...  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549692#comment-16549692
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406380183
 
 
   @vrozov, not needed. The latest commit provides a different solution. The 
ResultSetLoader unit tests pass. Next step is for someone at MapR to try the 
pre-commit tests. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549635#comment-16549635
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406364011
 
 
   @paul-rogers Did you try to pull changes from PR #1383?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549619#comment-16549619
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-406359589
 
 
   @vrozov or @sohami: Can you take a look at the latest commit with an 
alternative solution for inconsistent metadata? If this fix is OK, then I'll 
mark this as ready to commit, since it previously received a +1. Thanks! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546103#comment-16546103
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-405479402
 
 
   Revised with an alternative fix to the "lying nullable vector values" 
metadata problem.
   
   Rebased on latest master.
   
   Should be ready for another attempt at merging to master; another pre-commit 
test run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543911#comment-16543911
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-404987892
 
 
   @sohami, the cost of my proposed fix has exceeded its benefit -- we're just 
not converging. I've closed that PR and will look for another solution. Rather 
than use the `MaterializedField` to get the type, I'll ad code that does a 
switch statement on the vector type to learn the "real" type, leaving the 
`MaterializedField` to hold the pretend type for the `values` field of a 
`Nullable` vector.
   
   This should not be hard: we already have generated code that parses the 
vector type. I can use this to manufacture a `MajorType` that matches the 
vector class; bypassing the need for correct `MaterializedField` data, and thus 
eliminating the need to change the vector code.
   
   Revision to be done soon.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543809#comment-16543809
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

sohami commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-404968576
 
 
   @paul-rogers - I have posted couple of questions for this change on other PR 
(https://github.com/apache/drill/pull/1367)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-13 Thread Karthikeyan Manivannan (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543642#comment-16543642
 ] 

Karthikeyan Manivannan commented on DRILL-6373:
---

[~paul-rogers] The functional test failed with some plan verification failures 
but I doubt it is because of your change. The log is attachedĀ 
[^6373_Functional_Fail_07_13_1300.txt]

> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: 6373_Functional_Fail_07_13_1300.txt, 
> drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542223#comment-16542223
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-404660067
 
 
   @vrozov @paul-rogers I have started a test run. Will update the PR once it 
is done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540830#comment-16540830
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-404337181
 
 
   Revised the vector-clone fix. Found a simpler solution that requires just 
one line of code change. @vrozov or @bitblender, please run a pre-commit test 
on this PR. If test pass, then PR #1367 can perhaps be included in the next 
batch commit, this PR in the following one (since it has already obtained an 
approval). Then, in two weeks, I can provide the final PR for the result set 
loader work. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537899#comment-16537899
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-403678211
 
 
   @bitblender, thanks for running the pre-commit tests. As it turned out, I 
changed the wrong file originally: I changed a "Partition something" that turns 
out to not be used, rather than the `PartitionSenderTemplate` which I should 
have changed. Sigh...
   
   Anyway, cherry-picked the revised change. This should (fingers crossed) fix 
the partition sender issue. Please run the test again so we can see if any 
other code creates a new vector from a "recycled" materialized field.
   
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-09 Thread Karthikeyan Manivannan (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537679#comment-16537679
 ] 

Karthikeyan Manivannan commented on DRILL-6373:
---

The functional test failed. The logs are attachedĀ 
[^drill-6373-with-6585-fix-functional-failure.txt]

> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Attachments: drill-6373-with-6585-fix-functional-failure.txt
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537201#comment-16537201
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-403540185
 
 
   @ppadma  @vrozov  @paul-rogers  I meant I am starting pre-commit tests on 
this branch with the cherry-pick from 6585


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537186#comment-16537186
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

bitblender commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-403537844
 
 
   @ppadma  @paul-rogers  @vrozov  I just triggered a pre-commit test run using 
paul-rogers:DRILL-6585. Will post the results once it is done. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536535#comment-16536535
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers edited a comment on issue #1244: DRILL-6373: Refactor Result Set 
Loader for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757
 
 
   Rebased on latest master. Cherry-picked the fix from DRILL-6585 to this 
branch.
   
   @vrozov or @ppadma - can you kick off a pre-commit build to see if the 
DRILL-6586 fix resolves the failure we saw earlier? Thanks! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-07-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536534#comment-16536534
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-403358757
 
 
   Rebased on latest paster. Cherry-picked the fix from DRILL-6585 to this 
branch.
   
   @vrozov or @ppadma - can you kick off a pre-commit build to see if the 
DRILL-6586 fix resolves the failure we saw earlier? Thanks! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511539#comment-16511539
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-397045424
 
 
   @vrozov, I'll make the required changes as a separate PR once my schedule 
allows. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510511#comment-16510511
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396787878
 
 
   I would prefer the second approach but instead of introducing `newVector()` 
method, will it be possible to use a different signature for cloning and not 
cloning `getNewVector()`? I don't care much whether the method name that does 
not clone is `newVector()` or `getNewVector()`, but I would prefer that method 
that does not clone does not accept `MaterializedField` as a parameter at all. 
It should construct it instead similar to what `getNewVector(String name, ...)` 
already does. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16510489#comment-16510489
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396781670
 
 
   The specific fix proposal must reflect that the code doing the copying is 
generated:
   
   ```
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381)
 ~[na:na]
   ```
   
   So, one choice is to create a new method, `duplicateVector` that 1) clones 
the `MaterializedField` and 2) calls `getNewVector`. Then we change the 
generated code in the partitioner to call the new method.
   
   The other choice is to create a `newVector()` method that does what 
`getNewVector()` does now, and to revise `getNewVector()` to do the clone. We 
could then review all uses of `getNewVector()` to determine if we should use 
the non-cloning `newVector()` instead. This forces us to think about which is 
needed in each case, but forces generated code to use the "most safe" path.
   
   Thoughts @vrozov? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509907#comment-16509907
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396669212
 
 
   @vrozov, I agree that there is a bug somewhere. We are faced with a dilemma. 
On the one hand, we have the set of constraints described earlier. On the other 
hand, we have a preference about not mutating `MaterializedField` and that 
`PartitionSender` should not have to clone `MaterializedField`s when making 
copies.
   
   How do you propose we reconcile the technical constraint with the 
preference? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509212#comment-16509212
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396471425
 
 
   @paul-rogers I did not check the proposed solution, there may be bugs in it, 
but the main point is that when `PartitionSender` creates new vectors it should 
not mutate `MaterializedField` of the `incoming` batch vectors. It should be 
possible to construct a new vector based on an existing vector including two, 
three or more levels deep maps or other not flat vectors. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509171#comment-16509171
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396462788
 
 
   @vrozov, by the way thanks for taking the time to work on this -- a huge 
help since I can't run the functional tests myself.
   
   If all this seems overly complex, I agree that it is. I spent many hours 
working out all these twists and turns. The original design is broken and was 
never cleanly revised once maps were added.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509164#comment-16509164
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396462584
 
 
   @vrozov, your proposed solution works for the single-level map case. It does 
*not* work for the two-level map case as the `MaterializedField` used inside 
the inner map is not the one passed from the outer map. Hacky code would be 
needed to pass in the `MaterializedField`, then fetch it to add it to the outer 
map's `MaterializedField`.
   
   And, even the above hack does not work for three-level maps.
   
   So, what else can we do? We can note one additional problem with maps. If we 
simply take a map field `m` from batch B1 and use it to create a map in batch 
B2, we have a dilemma. The `m` `MaterializedField` already contains "child" 
entries for `a` and `b`, say.
   
   A similar issue exists for any vector with structure: VarChar (offsets and 
data), Nullable (bits and data), etc.
   
   So, what to do? The new code, of which this PR is a part, faced similar 
difficulties with the additional complexity of the `ResultSetLoader` which must 
clone a vector to produce the "overflow" vector. (This is the reason for fixing 
the Nullable vectors so that the data vector has its type changed to Required, 
so that the vector clone works correctly.)
   
   So, we get our solution. Clone the `MaterializedField` in the 
`PartitionSender` *before* calling `BasicTypeHelper.getNewVector`.
   
   We might thing that `getNewVector()` can do the clone. But, it is used by 
`ResultSetLoader` assuming that the new vector will use the `MaterializedField` 
provided. And, in the vast number of cases, we don't need a clone.
   
   It is only when using one vector to create another that we need the clone, 
so it must be done in the caller. In this case, that is `PartitionSender`.
   
   Does this make sense? Maybe the easiest thing is to just make the change and 
do a test run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509163#comment-16509163
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396462584
 
 
   @vrozov, your proposed solution works for the single-level map case. It does 
*not* work for the two-level map case as the `MaterializedField` used inside 
the inner map is not the one passed from the outer map. Hacky code would be 
needed to pass in the `MaterializedField`, then fetch it to add it to the outer 
map's `MaterializedField`.
   
   And, even the above hack does not work for three-level maps.
   
   So, what else can we do? We can note one additional problem with maps. If we 
simply take a map field `m` from batch B1 and use it to create a map in batch 
B2, we have a dilemma. The `m` `MaterializedField` already contains "child" 
entries for `a` and `b`, say.
   
   A similar issue exists for any vector with structure: VarChar (offsets and 
data), Nullable (bits and data), etc.
   
   So, what to do? The new code, of which this PR is a part, faced similar 
difficulties with the additional complexity of the `ResultSetLoader` which must 
clone a vector to produce the "overflow" vector. (This is the reason for fixing 
the Nullable vectors so that the data vector has its type changed to Required, 
so that the vector clone works correctly.)
   
   (If all this seems overly complex, I agree that it is. I spent many hours 
working out all these twists and turns. The original design is broken and was 
never cleanly revised once maps were added.)
   
   So, we get our solution. Clone the `MaterializedField` in the 
`PartitionSender` *before* calling `BasicTypeHelper.getNewVector`.
   
   We might thing that `getNewVector()` can do the clone. But, it is used by 
`ResultSetLoader` assuming that the new vector will use the `MaterializedField` 
provided. And, in the vast number of cases, we don't need a clone.
   
   It is only when using one vector to create another that we need the clone, 
so it must be done in the caller. In this case, that is `PartitionSender`.
   
   Does this make sense? Maybe the easiest thing is to just make the change and 
do a test run.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509153#comment-16509153
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396460011
 
 
   @vrozov, this is what I meant by the code only working well for a flat 
vector. Maps have a `MaterializedField` for their own metadata. That metadata 
contains a list of the child metadata. So:
   
   ```
   m:Map, {a:Int, b:Varchar}
   . a:INT
   . b:Varchar
   ```
   
   This means, as we add `a`, then, `b` to map `m`, we must mutate the 
`MaterializedField` for `m` to add the child fields. This completely breaks the 
idea that a `MaterializedField` is always immutable. This is a place where a 
design goal (immutable `MateraializedField`) collides with implementation 
(readers add to maps as they find new fields).
   
   So. We could make a clone on each modification and problem solved, right? As 
in many places in Drill, the solution is not so simple. The above is the easy 
case. What about a nested map:
   
   ```
   m1:Map, {m2:Map {a:Int, b:Varchar}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Now when we add `a`, we have to update both the `m1` and `m2` 
`MaterializedField`s. If we clone the MaterializedField for `m2`, then the old 
version in `m1` will get out of sync. The result will be:
   
   ```
   m1:Map, {m2:Map {}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Code that depends on accurate schema information then breaks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509147#comment-16509147
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396460011
 
 
   @vrozov, this is what I meant by the code only working well for a flat 
vector. Maps have a `MaterializedField` for their own metadata. That metadata 
contains a list of the child metadata. So:
   
   ```
   m:Map, {a:Int, b:Varchar}
   . a:INT
   . b:Varchar
   ```
   
   This means, as we add `a`, then, `b` to map `m`, we must mutate the 
`MaterializedField` for `m` to add the child fields. This completely breaks the 
idea that a `MaterializedField` is always immutable. This is a place where a 
design goal (immutable `MateraializedField`) collides with implementation 
(readers add to maps as they find new fields).
   
   So. We could make a clone on each modification and problem solved, right? As 
in many places in Drill, the solution is not so simple. The above is the easy 
case. What about a nested map:
   
   ```
   m1:Map, {m2:Map {a:Int, b:Varchar}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Now when we add `a`, we have to update both the `m1` and `m2` 
`MaterializedField`s. If we clone the MaterializedField for `m2`, then the old 
version in `m1` will get out of sync. The result will be:
   
   ```
   m1:Map, {m2:Map {}}
   . m2:Map, {a:Int, b:Varchar}
   . . a:INT
   . . b:Varchar
   ```
   
   Code that depends on accurate schema information then breaks.
   
   So, we're left with a choice: clone and have a corrupt schema, or make 
`MaterializedField` mutable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16508224#comment-16508224
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396287379
 
 
   @paul-rogers Please track the stack trace for the modification part. There 
is an attempt to add a child to the children of a vector that is part of the 
`incoming` batch. Please see `PartitionerTemplate.java:381`:
   ```
   ValueVector outgoingVector = TypeHelper.getNewVector(v.getField(), 
allocator);
   ```
   The existing assumption is that `v.getField()` is read-only (immutable), but 
it is passed to `AbstractMapVector` that iterates over passed `field` children 
instead of using cloned version. Then, 
   `child` is passed to `NullableVarCharVector` that again modifies passed 
parameter.
   
   Please consider changing constructor of vectors to clone passed in field and 
using cloned version to add child:
   ```
   BaseValueVector.java (add clone()):
 protected BaseValueVector(MaterializedField field, BufferAllocator 
allocator) {
   this.field = Preconditions.checkNotNull(field, "field cannot be 
null").*clone*();
   this.allocator = Preconditions.checkNotNull(allocator, "allocator cannot 
be null");
 }
   
   NullableVarCharVector.java (use cloned version):
 public NullableVarCharVector(MaterializedField field, BufferAllocator 
allocator) {
   super(field, allocator);
   // replace field with it's clone
   *field = getField();*
   
   // The values vector has the same type and attributes
   // as the nullable vector, but with a mode of required. This ensures that
   // things like scale and precision are preserved in the values vector.
   // For backward compatibility, the values vector must have the same
   // name as the enclosing vector.
   // Setting the child to REQUIRED is a change, prior it was OPTIONAL. But
   // if we then use the field to create a vector, we get the wrong type
   // (we get an optional child vector when we want required.)
   
   values = new VarCharVector(
   MaterializedField.create(field.getName(),
   field.getType().toBuilder()
 .setMode(DataMode.REQUIRED)
 .build()),
   allocator);
   
   field.addChild(bits.getField());
   field.addChild(values.getField());
   accessor = new Accessor();
 }
   
   AbstractMapVector.java (use cloned version):
 protected AbstractMapVector(MaterializedField field, BufferAllocator 
allocator, CallBack callBack) {
   super(field.clone(), allocator, callBack);
   // replace field with it's clone
   *field = getField();*
   // create the hierarchy of the child vectors based on the materialized 
field
   for (MaterializedField child : field.getChildren()) {
 if 
(child.getName().equals(BaseRepeatedValueVector.OFFSETS_FIELD.getName())) {
   continue;
 }
 final String fieldName = child.getName();
 final ValueVector v = BasicTypeHelper.getNewVector(child, allocator, 
callBack);
 putVector(fieldName, v);
   }
 }
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507697#comment-16507697
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396126712
 
 
   Thanks much @vrozov for the analysis. I must say I'm a bit stumped. Value 
vectors are clearly not designed for concurrent modification. That is not 
simply a code bug, it is a fundamental design decision. Somewhere in code or 
documentation I recall a statement that says that value vectors are meant to be 
created once (by a single thread), then be immutable thereafter.
   
   It should be perfectly fine for any number of readers, in separate threads, 
to access the vector once it has entered its immutable phase. But, nothing 
about vectors allows concurrent access while mutable.
   
   What is going on in this use case to cause concurrent modification. Is that 
a "bug" or a "feature"? In the stack trace you provided, both threads are 
creating a new vector, which should not cause a conflict. If, however, they are 
modifying the same record batch, then we are violating a design assumption 
that, like vectors, batches are immutable once created, and that each batch is 
mutated by a single thread.
   
   The one other possibility is that a bit of code has a bug that is modifying 
the immutable schema when it should be modifying the mutable one (if working 
with two vectors), but I'm not sure how that could happen since code that adds 
fields is not aware of other vectors. Also, AFAIK, while I did change some code 
to keep metadata in sync (the design of `MaterializedField` really works only 
for simple vectors; it is a muddle for complex vectors such as maps), the 
changes only apply to the mutable stage of a vector's lifecycle.
   
   Thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507485#comment-16507485
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396077437
 
 
   Another issue that I noticed and that is likely related to the same changes 
to `NullableValueVectors.java` is multiple attempts to add the same element to 
`children` collection on different threads. Due to `Set` nature of `children` 
collection only one attempt is supposed to succeed, but it is not guaranteed as 
`children` is not a concurrent collection and access is not synchornized 
accross threads.
   
   ```
   2018-06-10 10:57:30,024 
[Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-3635] ERROR 
o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children 
fee2fd9, size 2,  mod
count 2
   2018-06-10 10:57:30,024 
[Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-4243] ERROR 
o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children 
fee2fd9, size 2,  mod
count 2
   2018-06-10 10:57:30,024 
[Partitioner-24e29976-0190-774c-c213-3e59bd47533b:frag:2:1-656] ERROR 
o.a.d.exec.record.MaterializedField - Added 16923214 to 21b92297. children 
fee2fd9, size 3,  mod
   count 3
   ```
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-06-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507481#comment-16507481
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

vrozov commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-396076908
 
 
   Changes in partition sender inroduced back in April are not related to the 
`ConcurrentModificationException` . The regression is introduced in 
`NullableValueVectors.java:110` as it tries to modify `children` collection 
while the same collection is iterated over in a different thread. Please take a 
look at the stack traces for the competing threads:
   
   ```
   at 
org.apache.drill.exec.record.MaterializedField.addChild(MaterializedField.java:132)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.vector.NullableVarBinaryVector.(NullableVarBinaryVector.java:132)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1490)
 [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.vector.complex.AbstractMapVector.(AbstractMapVector.java:56)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.vector.complex.MapVector.(MapVector.java:65) 
[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1275)
 [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261)
 [vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381)
 [na:na]
   at 
org.apache.drill.exec.test.generated.PartitionerGen33655.flushOutgoingBatches(PartitionerTemplate.java:173)
 [na:na]
   at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute(PartitionerDecorator.java:285)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run(PartitionerDecorator.java:340)
 [drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
   ```
   and 
   ```
   at 
java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719) 
~[na:1.8.0_161]
   at 
java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742) 
~[na:1.8.0_161]
   at 
org.apache.drill.exec.record.MaterializedField.withPathAndType(MaterializedField.java:206)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.record.MaterializedField.clone(MaterializedField.java:185)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.record.MaterializedField.withPathAndType(MaterializedField.java:207)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.record.MaterializedField.clone(MaterializedField.java:185)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.vector.complex.AbstractMapVector.(AbstractMapVector.java:49)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.vector.complex.MapVector.(MapVector.java:65) 
~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1275)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.expr.BasicTypeHelper.getNewVector(BasicTypeHelper.java:1261)
 ~[vector-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.test.generated.PartitionerGen33655$OutgoingRecordBatch.initializeBatch(PartitionerTemplate.java:381)
 ~[na:na]
   at 
org.apache.drill.exec.test.generated.PartitionerGen33655.flushOutgoingBatches(PartitionerTemplate.java:173)
 ~[na:na]
   at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute(PartitionerDecorator.java:285)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   at 
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run(PartitionerDecorator.java:340)
 ~[drill-java-exec-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
   ... 3 common frames omitted
   ```
   
   


This is an automated message from the Apache Git 

[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494285#comment-16494285
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-392947880
 
 
   Thanks @ppadma! If you can explain how to reproduce the issue without using 
the full functional tests, I can take a look.
   
   In case it leads to anything... Earlier changes around `MaterializedField` 
did try to keep the various views of data in sync. The map's 
`MaterializedField` is supposed to contain, as children, all the map members. 
This means that changes to any map field requires a matching change to the 
children. This didn't always happen, leading to bugs that earlier PRs attempted 
to fix. Not sure if that has an impact here.
   
   The other question is whether the partition sender is the only code that 
calls the `clone()` method. Do we have unit tests for that method?
   
   Is there some interaction in cloning that reaches back into the original 
child list? A quick scan of the code didn't suggest any, which is why I wonder 
if the change occurs in another thread; perhaps triggered as a result of recent 
fixes in the partitioned sender.
   
   Please let us know what you find, then I'd be happy to fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492906#comment-16492906
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-392594956
 
 
   @Ben-Zvi, the stack trace indicates an error in `MaterializedField`, but 
that file is not changed by this commit. Nor are any of the other classes in 
the call stack.
   
   The test is in an HBase test, which is a bit hard to set up. As you know, it 
is not possible to run functional tests outside of MapR -- which is why only 
MapR folks can do the actual commits.
   
   The failure occurs inside the partitioned sender, generated code, when 
attempting to clone a `MaterializedField` for a Map.
   
   Looking at the code, the only way the `MaterializedField.clone()` method can 
fail is if another thread is adding/removing members to the map while the 
thread that failed is attempting to clone the map schema. I'm not familiar 
enough with the partioned sender to track down the issue.
   
   Looking at history, it appears we modified the partition sender back in 
April. I wonder if that triggered this failure? Can someone who knows more 
about that class check if there is some way that the same `MaterializedField` 
is being used across threads, and mutated in one of them?
   
   Otherwise, this may be a race condition; try running the test again and the 
problem may go away. That may show that the problem is not in this particular 
PR, but rather a pre-existing problem that requires a separate research effort.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16492155#comment-16492155
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

Ben-Zvi commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-392364222
 
 
   There was a failure (after rebasing on the latest master), in the functional 
test emptyHbaseLeftJoin.sql :
   ```
   select t.data.partkey, t1.data.ps_partkey 
 from schema_change_empty_batch_empty as t 
 LEFT JOIN schema_change_empty_batch_partsupp as t1 
 ON t.data.partkey = t1.data.ps_partkey
   
   Exception:
   
   java.sql.SQLException: SYSTEM ERROR: ConcurrentModificationException
   
   Fragment 2:0
   
   [Error Id: bfb0598f-e598-453b-93f2-699bd87a0061 on drill194:31010]
   
 (java.util.ConcurrentModificationException) null
   java.util.LinkedHashMap$LinkedHashIterator.nextNode():719
   java.util.LinkedHashMap$LinkedKeyIterator.next():742
   org.apache.drill.exec.record.MaterializedField.withPathAndType():144
   org.apache.drill.exec.record.MaterializedField.clone():125
   org.apache.drill.exec.record.MaterializedField.withPathAndType():145
   org.apache.drill.exec.record.MaterializedField.clone():125
   org.apache.drill.exec.vector.complex.AbstractMapVector.():49
   org.apache.drill.exec.vector.complex.MapVector.():65
   org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1275
   org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1261
   
org.apache.drill.exec.test.generated.PartitionerGen84661$OutgoingRecordBatch.initializeBatch():381
   
org.apache.drill.exec.test.generated.PartitionerGen84661.flushOutgoingBatches():173
   
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$FlushBatchesHandlingClass.execute():285
   
org.apache.drill.exec.physical.impl.partitionsender.PartitionerDecorator$PartitionerTask.run():340
   java.util.concurrent.ThreadPoolExecutor.runWorker():1149
   java.util.concurrent.ThreadPoolExecutor$Worker.run():624
   java.lang.Thread.run():748
   
at 
org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528)
at 
org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:600)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1904)
at 
org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:64)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection$1.execute(AvaticaConnection.java:630)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1109)
at 
org.apache.drill.jdbc.impl.DrillMetaImpl.prepareAndExecute(DrillMetaImpl.java:1120)
at 
oadd.org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:638)
at 
org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareAndExecuteInternal(DrillConnectionImpl.java:200)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:149)
at 
oadd.org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:218)
at 
org.apache.drill.jdbc.impl.DrillStatementImpl.executeQuery(DrillStatementImpl.java:110)
at 
org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:210)
at 
org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:115)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: ConcurrentModificationException
   
   Fragment 2:0
   
   [Error Id: bfb0598f-e598-453b-93f2-699bd87a0061 on drill194:31010]
   
 (java.util.ConcurrentModificationException) null
   java.util.LinkedHashMap$LinkedHashIterator.nextNode():719
   java.util.LinkedHashMap$LinkedKeyIterator.next():742
   org.apache.drill.exec.record.MaterializedField.withPathAndType():144
   org.apache.drill.exec.record.MaterializedField.clone():125
   org.apache.drill.exec.record.MaterializedField.withPathAndType():145
   org.apache.drill.exec.record.MaterializedField.clone():125
   org.apache.drill.exec.vector.complex.AbstractMapVector.():49
   org.apache.drill.exec.vector.complex.MapVector.():65
   org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1275
   org.apache.drill.exec.expr.BasicTypeHelper.getNewVector():1261
   

[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488534#comment-16488534
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-391610102
 
 
   Rebased onto latest master. Fixed check style issues due to new rules.
   
   This one is ready to commit; can someone merge it and run the tests?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480637#comment-16480637
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

arina-ielchiieva commented on issue #1244: DRILL-6373: Refactor Result Set 
Loader for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-390205193
 
 
   @paul-rogers there are some check styles failures, please take a look.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16475122#comment-16475122
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

paul-rogers commented on issue #1244: DRILL-6373: Refactor Result Set Loader 
for Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-389006322
 
 
   Thanks @ppadma! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474985#comment-16474985
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

ppadma commented on issue #1244: DRILL-6373: Refactor Result Set Loader for 
Union, List support
URL: https://github.com/apache/drill/pull/1244#issuecomment-388986616
 
 
   @paul-rogers  LGTM. +1.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459404#comment-16459404
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1244
  
@ppadma, here is the next step in our long saga to check in the batch 
sizing work. Please give it a review at your convenience. Thanks! 


> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the "batch sizing" enhancements, refactor the 
> {{ResultSetLoader}} and related classes to prepare for Union and List 
> support. This fix follows the refactoring of the column accessors for the 
> same purpose. Actual Union and List support is to follow in a separate PR.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6373) Refactor the Result Set Loader to prepare for Union, List support

2018-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-6373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459402#comment-16459402
 ] 

ASF GitHub Bot commented on DRILL-6373:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/1244

DRILL-6373: Refactor Result Set Loader for Union, List support

This PR builds on the previous refactoring of the column accessors to 
prepare for Union, (non-repeated) List and Repeated List support. The PR 
includes four closely related changes divided across four commits:

### Correct the Type of the Data Vector in a Nullable Vector

The nullable vectors contain a "bits" vector and a "data" vector. The data 
vector has historically been created using the same `MaterializedField` as the 
nullable vector, meaning that the data vector is labeled as "nullable" even 
though it has no bits vector.

This PR creates a clone `MaterializedField` with the same name as the outer 
nullable vector, but with a Required type.

This change ensures that the overflow logic works correctly as it uses the 
vector metadata (in the `MaterializedField`) to know what kind of vector to 
create for the "lookahead" vector.

### Result Set Loader Refactor

The second commit pretty much just rearranges the deck chairs in a way that 
we an slot in the new types in the next PR. The need for the changes can be 
seen in the full code set (the union and list support was pulled out for this 
PR.)

A union is a container, like a map, so the tuple state was refactored to 
create a common parent container state.

List and unions are very complex to build, so the code to build the 
internal workings of each vector was pulled out into a separate builder class.

### Projection Handling and the Vector Cache

Previous versions of the result set loader handled projection and a cache 
for vectors reused across readers in the same Scan operator. Once we introduce 
nested maps, projection within maps, unions and lists, projection gets much 
more complex, as does vector caching.

This PR adds logic to support projection and vector caching to any 
arbitrary level of maps. It turns out that handling projection of an entire 
map, and projection of fields within maps, is far more complex than you'd 
think, requiring quite a bit of internal state to keep everything straight. The 
result is that we can now handle a map `m` with three fields `{a, b, c}` and 
project just one of them, `m.a`, say.

Further, Drill allows projection of non-existent columns. So, we might ask 
for field `m.d` which does not exist in the above map. The projection mechanism 
handles this case as well, creating the right kind of null column.

### Unit Tests

New tests are added to exercise the projection and cache mechanisms. 
Existing tests were updated for the changes made in the refactoring.

### Reference Design

All of this work is done in support of the overall "batch sizing" project 
explained 
[here](https://github.com/paul-rogers/drill/wiki/Batch-Handling-Upgrades).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-6373

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1244


commit 7df4280f3011862b84b43240c8a07e0bf019745d
Author: Paul Rogers 
Date:   2018-05-01T02:10:53Z

DRILL-6373: Fix nullable vector data vector type

Fixes the type of the data vector within a nullable vector. The data vector 
is Required (has no bits vector.) Accurate metadata is required for proper 
overflow handling in the result set loader.

commit 9496ef681f19f03ccd735f2c9b18f6d914eae3e2
Author: Paul Rogers 
Date:   2018-05-01T02:15:33Z

DRILL-6373: Refactor result set loader

commit 74675436ae1efdf66deafeaa27b281d169e274ad
Author: Paul Rogers 
Date:   2018-05-01T02:16:48Z

DRILL-6373: Revised projection & vector cache

commit 04598c0dbdbbff5ecbc2f89d02b14f66982f86bd
Author: Paul Rogers 
Date:   2018-05-01T02:17:22Z

DRILL-6373: Revised & added unit tests




> Refactor the Result Set Loader to prepare for Union, List support
> -
>
> Key: DRILL-6373
> URL: https://issues.apache.org/jira/browse/DRILL-6373
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.13.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.14.0
>
>
> As the next step in merging the