[jira] [Commented] (DRILL-7503) Refactor project operator

2019-12-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005939#comment-17005939
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944
 
 
   Breaks the big "setup" function into its own class, and
   separates out physical vector setup from logical projection
   planning. No functional change; just rearranging existing
   code.
   
   Testing: reran all unit tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007048#comment-17007048
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362507186
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007035#comment-17007035
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362491665
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
 
 Review comment:
   ```suggestion
 private enum OutputColumnType {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007047#comment-17007047
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362494309
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
 
 Review comment:
   please remove the field and enum as suggested previously.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007050#comment-17007050
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495815
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
 Review comment:
   ```suggestion
   OutputWidthExpression getOutputExpression() { return outputExpression; }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For qu

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007052#comment-17007052
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362499191
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007043#comment-17007043
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495075
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
 
 Review comment:
   Actually the field and getter aren't used, but if you think this could be 
useful for debugging then it can be left as is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines th

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007051#comment-17007051
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362493899
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
 
 Review comment:
   The ```width``` is always ```-1``` and getter is unused, so I suggest 
removing the field and related code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions,

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007045#comment-17007045
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362530889
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007034#comment-17007034
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362529595
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007049#comment-17007049
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496036
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007037#comment-17007037
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362495939
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007038#comment-17007038
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362493497
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
 
 Review comment:
   ```suggestion
 private static class VariableWidthColumnInfo {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specifi

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007031#comment-17007031
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362500067
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007040#comment-17007040
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496478
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007030#comment-17007030
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362497826
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007044#comment-17007044
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534274
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007046#comment-17007046
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362531195
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007041#comment-17007041
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362496119
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007036#comment-17007036
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362529251
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007025#comment-17007025
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452176
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
 
 Review comment:
   ```suggestion
* registers the field with PMM. If the field is a variable-width field, PMM
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007029#comment-17007029
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362506622
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007033#comment-17007033
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452329
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
 
 Review comment:
   ```suggestion
* fixed-width fields are just accumulated into a single total. Note: The 
PMM,
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007026#comment-17007026
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362454960
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
 
 Review comment:
   ```suggestion
 private static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007028#comment-17007028
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362499071
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007024#comment-17007024
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362452251
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
 
 Review comment:
   ```suggestion
* records the expression that produces the variable-width field. The 
expression
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007042#comment-17007042
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362530135
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007054#comment-17007054
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534494
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007039#comment-17007039
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362534333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007053#comment-17007053
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362501279
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007027#comment-17007027
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362497334
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColumnTy

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007032#comment-17007032
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362486603
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
 
 Review comment:
   After a close look at the code, I believe this enum is unnecessary. All 
usages of the ```ColumnWidthInfo``` constructor accept ```WidthType.VARIABLE``` 
and there is a block inside the update method : 
   ```java
 if (columnWidthInfo.isFixedWidth()) {
   // fixed width columns are accumulated in totalFixedWidthColumnWidth
   ShouldNotReachHere();
 } else {...}
   ```
   Please remove the enum and related code. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007229#comment-17007229
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362705846
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColum

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007228#comment-17007228
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362704446
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
+private final WidthType widthType;
+private final OutputColumnType outputColumnType;
+private final ValueVector outputVV; // for transfers, this is the transfer 
src
+
+
+ColumnWidthInfo(OutputWidthExpression outputWidthExpression,
+OutputColumnType outputColumnType,
+WidthType widthType,
+int fieldWidth, ValueVector outputVV) {
+  this.outputExpression = outputWidthExpression;
+  this.width = fieldWidth;
+  this.outputColumnType = outputColumnType;
+  this.widthType = widthType;
+  this.outputVV = outputVV;
 }
 
-RecordBatch incomingBatch = null;
-ProjectRecordBatch outgoingBatch = null;
+public OutputWidthExpression getOutputExpression() { return 
outputExpression; }
 
-int rowWidth = 0;
-Map outputColumnSizes;
-// Number of variable width columns in the batch
-int variableWidthColumnCount = 0;
-// Number of fixed width columns in the batch
-int fixedWidthColumnCount = 0;
-// Number of complex columns in the batch
-int complexColumnsCount = 0;
+public OutputColumnType getOutputColum

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007226#comment-17007226
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362702109
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
 
 Review comment:
   Done. I tried to limit the "blast radius" by gritting my teeth and leaving 
this class as it was. This seems to be some offshoot of my "temporary" 
`BatchSizer`. Just for background, some time ago we realized that not all Drill 
rows are of the same width. (Who'd have guessed?) So, we added "temporary" code 
to work out sizes in each operator. This was supposed to be temporary until 
batches carried their own size info. (Once we figure it out for a vector, we 
don't have to throw it away and figure it out again for each operator.)
   
   In fact, I deliberately chose a goofy name "BatchSizer" to remind folks that 
this was a temporary hack to so I could focus on the "real work" of fixing the 
external sort. Sigh...
   
   But, the temporary solution seems to have become semi-permanent and has 
grown odd variations such as this.
   
   The "master plan" was to not try to predict batch sizes as we are doing 
here. Instead, the `ResultSetLoader` is intended to just let the operator write 
to the batch until we hit the desired limit. All the calcs are already done for 
the reader case. The goal was to use the same mechanism in other places were we 
don't know widths ahead of time. Project is the classic case: for all we know, 
the user is doing some silly function like repeating a big `VARCHAR` 100 times. 
So,

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007227#comment-17007227
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r362702397
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectMemoryManager.java
 ##
 @@ -42,307 +44,310 @@
 import java.util.Map;
 
 /**
- *
- * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by 
ProjectRecordBatch.
- * The PMM works as follows:
- *
- * Setup phase: As and when ProjectRecordBatch creates or transfers a field, 
it registers the field with PMM.
- * If the field is a variable width field, PMM records the expression that 
produces the variable
- * width field. The expression is a tree of LogicalExpressions. The PMM walks 
this tree of LogicalExpressions
- * to produce a tree of OutputWidthExpressions. The widths of Fixed width 
fields are just accumulated into a single
- * total. Note: The PMM, currently, cannot handle new complex fields, it just 
uses a hard-coded estimate for such fields.
- *
- *
- * Execution phase: Just before a batch is processed by Project, the PMM walks 
the tree of OutputWidthExpressions
- * and converts them to FixedWidthExpressions. It uses the RecordBatchSizer 
and the function annotations to do this conversion.
- * See OutputWidthVisitor for details.
+ * ProjectMemoryManager(PMM) is used to estimate the size of rows produced by
+ * ProjectRecordBatch. The PMM works as follows:
+ * 
+ * Setup phase: As and when ProjectRecordBatch creates or transfers a field, it
+ * registers the field with PMM. If the field is a variable width field, PMM
+ * records the expression that produces the variable width field. The 
expression
+ * is a tree of LogicalExpressions. The PMM walks this tree of
+ * LogicalExpressions to produce a tree of OutputWidthExpressions. The widths 
of
+ * Fixed width fields are just accumulated into a single total. Note: The PMM,
+ * currently, cannot handle new complex fields, it just uses a hard-coded
+ * estimate for such fields.
+ * 
+ * Execution phase: Just before a batch is processed by Project, the PMM walks
+ * the tree of OutputWidthExpressions and converts them to
+ * FixedWidthExpressions. It uses the RecordBatchSizer and the function
+ * annotations to do this conversion. See OutputWidthVisitor for details.
  */
 public class ProjectMemoryManager extends RecordBatchMemoryManager {
 
-static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(ProjectMemoryManager.class);
-
-public RecordBatch getIncomingBatch() {
-return incomingBatch;
+  static final Logger logger = 
LoggerFactory.getLogger(ProjectMemoryManager.class);
+
+  private RecordBatch incomingBatch;
+  private ProjectRecordBatch outgoingBatch;
+
+  private int rowWidth;
+  private final Map outputColumnSizes;
+  // Number of variable width columns in the batch
+  private int variableWidthColumnCount;
+  // Number of fixed width columns in the batch
+  private int fixedWidthColumnCount;
+  // Number of complex columns in the batch
+  private int complexColumnsCount;
+
+  // Holds sum of all fixed width column widths
+  private int totalFixedWidthColumnWidth;
+  // Holds sum of all complex column widths
+  // Currently, this is just a guess
+  private int totalComplexColumnWidth;
+
+  private enum WidthType {
+  FIXED,
+  VARIABLE
+  }
+
+  public enum OutputColumnType {
+  TRANSFER,
+  NEW
+  }
+
+  public static class ColumnWidthInfo {
+private final OutputWidthExpression outputExpression;
+private final int width;
 
 Review comment:
   Thanks for looking at this in detail. Fixed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring bre

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007237#comment-17007237
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570465345
 
 
   
   Longer term, it seems we might want to restructure how we generate code. 
Today, if we run a query across, say, 20 minor fragments in the same Drillbit, 
and they all see the same schema (the normal case), then all 20 will generate 
exactly the same code. Then, down in the compiler layer, the first thread will 
compile the code and put it into the code cache. Threads 2 through 20 will get 
the compiled code from the cache.
   
   But, today, the process is **very** inefficient. Each thread:
   
   * Does the semantic expression analysis (function lookup, type resolution, 
etc.)
   * Generate the entire Java code.
   * Rewrite the Java code to remove the varying bits (the generated class 
name).
   * Hash the entire code to get a hash code to look up in the code cache.
   * If a match is found, compare the entire code block byte-by-byte to verify 
a match.
   * If new, generate code, cache it, and use the source code (which can be 
huge) as the cache key.
   
   The only real benefit of this approach is that it has worked all these 
years. 
   
   The better approach is to:
   
   * Create a parameterized "descriptor" object that holds all the factors 
needed to generate the code. (Input schema, expressions, relevant session 
options.)
   * Use that descriptor as a key into the code lookup table. If found, reuse 
the compiled code without any code gen.
   * If not found, only then tell the descriptor to generate the needed code, 
which will then be shared by all fragments.
   
   The work I did back in the managed sort, and that I'm starting here, at 
least splits code gen from the operator.
   
   I suspect one (very long term) improvement would be to introduce another 
layer of abstraction like we had in Impala. The code gen code tries to do the 
kind of logical type analysis normally done in a planner. But, because the goal 
is Java code gen, it tends to mix SQL type analysis with Java implementation 
details, leading to overly complex code. (That's what I'm fighting with the 
typeof/UNION issue.).
   
   Such an approach would be doubly useful a we roll out the schema 
improvements your team has been doing. If we know the schema (types) at plan 
time, we can work out all the type conversion stuff at that time. In fact, we 
can even play the Spark trick: generate Java once in the planner and send it to 
the workers.
   
   I have only vague ideas here; have not spent much time on it. Sounds like 
you've looked at this some. What do you suggest we do?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007236#comment-17007236
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570465320
 
 
   @ihuzenko, thanks for your thorough review, as usual. Here is my general 
philosophy on refactoring: do it in bite-size chunks. The first step here was 
simply to break code gen out of the operator itself. By leaving the bulk of the 
code unchanged, one can do a review by diffing the old and new files (not in 
GitHub, sadly), and see that blocks of code are identical, they have just been 
moved; in some cases from one huge function to smaller functions. Then, once 
we're satisfied that this simple refactoring works, we can think about the next 
step.
   
   If you agree with the step-by-step approach, then perhaps we can commit this 
first step to put us in position to make subsequent changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007603#comment-17007603
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570618977
 
 
   @paul-rogers thanks for the quick update. I agree that using the already 
generated and compiled code whenever possible would be very good in the long 
run. Also, I agree with the step-by-step refactoring approach. With respect to 
this PR, I propose to extract the ```VectorState``` into a separate class 
together with all the ```ProjectRecordBatch```'s fields that can be 
encapsulated. I believe that the extraction of ```VectorState``` won't allow 
using mixed access to fields in the future and opens more ways for later 
improvements. In order to better express my thought, I created the sample 
[commit](https://github.com/ihuzenko/drill/commit/80fb0065cf4d11b836383b031dffc07b92a5ad91).
 Sorry for the late reply.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17007935#comment-17007935
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-570763354
 
 
   @ihuzenko, good suggestion on `VectorState`: my original thought was to make 
it a static class, then move it to its own file. Then I thought better of it. 
You'll notice that, at present, it is kind of a "shim" class to expose certain 
operations on the vector objects to the "materializer" without making the 
entire project operator visible. In particular, it is an inner class so it can 
tinker with project operator state.
   
   We could accomplish the same thing by defining an interface which 
`ProjectRecordBatch` would implement. But, in that case, all the complex setup 
logic would be back in the project operator, undoing the separation we just 
accomplished. So, the interface approach feels not quite right.
   
   I agree that this can be improved, but I don't yet see an obvious move that 
leaves the code simpler. So, I thought to just leave it as is for now.
   
   Maybe it can evolve to own the allocation vectors, complex writers and so 
on. These are the kinds of questions we can ask now that we can start to see 
the pieces rather than just a big mess. Suggestions?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008590#comment-17008590
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571034273
 
 
   Hello @paul-rogers , if you don't want to extract the ```VectorState``` 
class then at least please put related fields into it, so it'll be visible 
which state is managed by the class. Although extraction is better way for me, 
since it won't allow anybody to easily mix access to related fields 
(```transfers, allocationVectors, complexWriters, complexFieldReferences 
etc...```). PS. Here there is no quick solution without serious rewriting to 
make everything look good:)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009046#comment-17009046
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571235610
 
 
   @ihuzenko, the goal of this refactoring was simply to pull out the code gen 
part. The `VectorState` is an admitted hack to create a clean interface on the 
CG side, though it is messy on the operator side.
   
   Since we don't like this approach, for now, I'll remove `VectorState` and 
simply move its methods onto the project operator, and pass the project 
operator to the CG component. Not ideal, but we can revisit the issue later 
when we want to do further cleanup.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009058#comment-17009058
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571241220
 
 
   One more thought about the CG. In working on another issue, it slowly dawned 
on me a couple of other worthwhile goals in addition to those mentioned earlier.
   
   We mentioned above that CD combines logical planning and code gen, creating 
a very complex bit of code. Again, some of us are not smart enough to be able 
to quickly understand the combined logic, so it would make sense to separate 
out the two steps so that average folks can more easily understand and modify 
them.
   
   Then, we need to think about debug and testing. As originally implemented, 
CG was direct path from operator to running byte codes. The original authors 
were smart enough to be able to debug the result by stepping through the 
compiled byte codes. Most of us are not that smart. So,  few years back we 
added the "plain Java" and "save code for debugging" modes which made it 
possible for us newbies to view and step through the Java code.
   
   Still, in order to debug, we need to run Drill, find a useful unit test, run 
the query, and then debug CG in the context of running a query. We are limited 
in what we can test based on what queries we can find or create to set up the 
scenarios of interest.
   
   Better would be to apply unit testing principles: separate out CG so we can 
set up a set of inputs, run CG, and diff the resulting code against a golden 
copy. This is how we debugged the Impala planner and it worked very well. This 
way, we can easily set up every data type without needing to use a zillion 
different input file formats. That is:
   
   ```
   (operator config, schema, options) --> CG --> (Java code, exec plan)
   ```
   
   Where the "exec plan" would be things like the knick-lacks that the 
`VectorState` currently wrap. That is, rather than having CG set up vectors, 
maybe CG creates a plan that says, "here are the vectors you will need", then 
the operator creates the vectors. Still pretty hand-wavey at this point.
   
   This refactoring can be seen as a naive grasping toward that goal. We need 
to pull out CG, but it is not clear yet the final form. The same can be said of 
the earlier refactoring of the external sort to separate out CG. Sometimes, 
just by taking a step, we can get a bit better understanding of where we need 
to go.
   
   All that said, the goal here is just to take a step, not to implement a 
final solution. I realize, however, that makes this PR hard to review since it 
is incomplete: it takes a step, but does not clearly state "a step to where?"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009066#comment-17009066
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571235610
 
 
   @ihuzenko, the goal of this refactoring was simply to pull out the code gen 
part. The `VectorState` is an admitted hack to create a clean interface on the 
CG side, though it is messy on the operator side.
   
   I was really hoping to not change the execution part of the project operator 
in this PR in order to limit the scope of changes. Maybe I'll go with the 
interface route (discussed in an earlier note), which will help with the longer 
term goal outlined below. Let me play with the code to see how that approach 
might work.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009333#comment-17009333
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571414414
 
 
   @ihuzenko, went ahead and adopted the interface solution. The interface is 
probably a better idea: if we try to isolate code gen for testing, we can 
create a mock implementation that skips the actual vector work. Please let me 
know if this satisfies your concerns. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009376#comment-17009376
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-571437864
 
 
   @ihuzenko, refactored some additional steps to adopt the solution you 
suggested. This version still uses the interface, with an implementation in a 
class other than the project record batch. This turns out to be handy because, 
oddly, the project operator generates code for two separate incoming batches: 
the "real" one and a "fake" empty one. The `ProjectRecordBatchBuilder` holds 
onto the input batch so we don't have to pass it into the materializer and back 
out.
   
   This version tries to eliminate all references to the incoming batch in the 
materializer, and instead work with the batch schema. Annoyingly, the 
`ExpressionTreeMaterializer` needs the input batch so it can iterate over the 
vectors to get their schemas. If all we need is the schema, we don't need 
actual vectors. So, if we can pass in a schema, we can completely separate code 
gen from physical vectors.
   
   The next refactoring move is to change this code to work with a schema (or 
interface to obtain the schema) rather than the actual vectors. Now, as it 
turns out, the batch schema has limitations for complex types, which is one of 
the reasons we created the `TupleMetadata` family of classes. So, perhaps we 
can convert the incoming batch to a `TupleMetadata` schema and use that. (The 
code to do that already exists in the `RowSet` classes.)
   
   Or, we can just pass an interface which will return the `TypedFieldId` for 
each column. Or, do that conversion ahead of time, and pass in the results. 
Will have to play with it some to see which solution is the simplest.
   
   Since the required work will be rather large; I propose we do that as a 
separate PR.
   
   Have we done enough for this one PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010732#comment-17010732
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364210835
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends 
AbstractExecExprVisitor {
-
-@Override
-public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
- 
OutputWidthVisitorState state) throws RuntimeException {
-
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-}
-
-
-/**
- *
- * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. 
IfElseWidthExpr will be reduced to
- * a {@link FixedLenExpr} by taking the max of the if-expr-width and the 
else-expr-width.
- *
- * @param ifExpression
- * @param state
- * @return IfElseWidthExpr
- * @throws RuntimeException
- */
-@Override
-public OutputWidthExpression visitIfExpression(IfExpression ifExpression, 
OutputWidthVisitorState state)
-throws 
RuntimeException {
-IfExpression.IfCondition condition = ifExpression.ifCondition;
-LogicalExpression ifExpr = condition.expression;
-LogicalExpression elseExpr = ifExpression.elseExpression;
-
-OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-OutputWidthExpression elseWidthExpr = null;
-if (elseExpr != null) {
-elseWidthExpr = elseExpr.accept(this, state);
-}
-return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+extends AbstractExecExprVisitor {
+
+  @Override
+  public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
+   OutputWidthVisitorState 
state) throws RuntimeException {
+
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
+return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
+  }
+
+  /**
+   *
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010731#comment-17010731
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364208438
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/OutputWidthVisitor.java
 ##
 @@ -43,238 +43,213 @@
 
 import java.util.ArrayList;
 
-public class OutputWidthVisitor extends 
AbstractExecExprVisitor {
-
-@Override
-public OutputWidthExpression visitVarDecimalConstant(VarDecimalExpression 
varDecimalExpression,
- 
OutputWidthVisitorState state) throws RuntimeException {
-
Preconditions.checkArgument(varDecimalExpression.getMajorType().hasPrecision());
-return new 
FixedLenExpr(varDecimalExpression.getMajorType().getPrecision());
-}
-
-
-/**
- *
- * Records the {@link IfExpression} as a {@link IfElseWidthExpr}. 
IfElseWidthExpr will be reduced to
- * a {@link FixedLenExpr} by taking the max of the if-expr-width and the 
else-expr-width.
- *
- * @param ifExpression
- * @param state
- * @return IfElseWidthExpr
- * @throws RuntimeException
- */
-@Override
-public OutputWidthExpression visitIfExpression(IfExpression ifExpression, 
OutputWidthVisitorState state)
-throws 
RuntimeException {
-IfExpression.IfCondition condition = ifExpression.ifCondition;
-LogicalExpression ifExpr = condition.expression;
-LogicalExpression elseExpr = ifExpression.elseExpression;
-
-OutputWidthExpression ifWidthExpr = ifExpr.accept(this, state);
-OutputWidthExpression elseWidthExpr = null;
-if (elseExpr != null) {
-elseWidthExpr = elseExpr.accept(this, state);
-}
-return new IfElseWidthExpr(ifWidthExpr, elseWidthExpr);
+public class OutputWidthVisitor
+extends AbstractExecExprVisitor {
 
 Review comment:
   Please reformat to: 
   
   ```java
   public class OutputWidthVisitor extends
   AbstractExecExprVisitor {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010735#comment-17010735
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364234843
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
 
 Review comment:
   please update the comment since ```VectorState``` is gone.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combine

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010733#comment-17010733
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364218502
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
 
 Review comment:
   please make also private + add getter + move ```= new ArrayList<>()``` into 
constructor. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010734#comment-17010734
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

ihuzenko commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364223072
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
+
+  public ProjectBatchBuilder(ProjectRecordBatch projectBatch, VectorContainer 
container,
+  SchemaChangeCallBack callBack, RecordBatch incomingBatch) {
+this.projectBatch = projectBatch;
+this.container = container;
+this.callBack = callBack;
+this.incomingBatch = incomingBatch;
+  }
+
+  @Override
+  public void addTransferField(String name, ValueVector vvIn) {
+FieldReference ref = new FieldReference(name);
+ValueVector vvOut = 
container.addOrGet(MaterializedField.create(ref.getAsNamePart().getName(),
+  vvIn.getField().getType()), callBack);
+projectBatch.memoryManager.addTransferField(vvIn, 
vvIn.getField().getName(), vvOut.getField().getName());
+transfers.add(vvIn.makeTransferPair(vvOut));
+  }
+
+  @Override
+  public int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead) {
+TypedFieldId id = vectorRead.getFieldId();
+ValueVector vvIn = 
incomingBatch.getValueAccessorById(id.getIntermediateClass(), 
id.getFieldIds()).getValueVector();
+Preconditions.checkNotNull(incomingBatch);
+
+ValueVector vvOut =
+
container.addOrGet(MaterializedField.create(ref.getLastSegment().getNameSegment().getPath(),
+vectorRead.getMajorType()), callBack);
+TransferPair tp = vvIn.makeTransferPair(vvOut);
+projectBatch.memoryManager.addTransferField(vvIn, TypedFieldId.getPath(id, 
incomingBatch), vvOut.getField().getName());
+transfers.add(tp);
+return vectorRead.getFieldId().getFieldIds()[0];
+  }
+
+  @Override
+  public ValueVectorWriteExpression addOutputVector(String name, 
LogicalExpression expr) {
+MaterializedField outputField = MaterializedField.create(name, 
expr.getMajorType());
+ValueVector vv = container.addOrGet(outputField, callBack);
+projectBatch.allocationVectors.add(vv);
+TypedFieldId fid = 
container.getValueVectorId(SchemaPath.getSimplePath(outputField.getName()));
+ValueVectorWriteExpression write = new ValueVectorWriteExpression(fid, 
expr, true);
+projectBatch.memoryManager.addNewField(vv, write);
+return write;
+  }
+
+  @Override
+  public void addComplexField(FieldReference ref) {
+initComplexWriters();
+if (projectBatch.complexFieldReferencesList == null) {
+  projectBatch.complexFieldReferencesList = 

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010769#comment-17010769
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364288964
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
 secondaryFinal = finalType;
   }
 
-  MajorType actualFinalType = finalType;
+  //MajorType actualFinalType = finalType;
 
 Review comment:
   Should we keep this commented code?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010775#comment-17010775
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364290284
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = 
LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in ProjectRecordBatch, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+void addTransferField(String name, ValueVector vvIn);
+ValueVectorWriteExpression addOutputVector(String name, LogicalExpression 
expr);
+int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead);
+void addComplexField(FieldReference ref);
+ValueVectorWriteExpression addEvalVector(String outputName,
+LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+private boolean isStar;
+private List outputNames;
+private String p

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010774#comment-17010774
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

arina-ielchiieva commented on pull request #1944: DRILL-7503: Refactor the 
project operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364289943
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectionMaterializer.java
 ##
 @@ -0,0 +1,625 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+
+import org.apache.commons.collections.map.CaseInsensitiveMap;
+import org.apache.drill.common.expression.ConvertExpression;
+import org.apache.drill.common.expression.ErrorCollector;
+import org.apache.drill.common.expression.ErrorCollectorImpl;
+import org.apache.drill.common.expression.ExpressionPosition;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.FunctionCall;
+import org.apache.drill.common.expression.FunctionCallFactory;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.expression.ValueExpressions;
+import org.apache.drill.common.expression.PathSegment.NameSegment;
+import org.apache.drill.common.expression.fn.FunctionReplacementUtils;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.expr.DrillFuncHolderExpr;
+import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.expr.fn.FunctionLookupContext;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.planner.StarColumnHelper;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.VectorAccessible;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.carrotsearch.hppc.IntHashSet;
+
+/**
+ * Plans the projection given the incoming and requested outgoing schemas. 
Works
+ * with the {@link VectorState} to create required vectors, writers and so on.
+ * Populates the code generator with the "projector" expressions.
+ */
+class ProjectionMaterializer {
+  private static final Logger logger = 
LoggerFactory.getLogger(ProjectionMaterializer.class);
+  private static final String EMPTY_STRING = "";
+
+  /**
+   * Abstracts the physical vector setup operations to separate
+   * the physical setup, in ProjectRecordBatch, from the
+   * logical setup in the materializer class.
+   */
+  public interface BatchBuilder {
+void addTransferField(String name, ValueVector vvIn);
+ValueVectorWriteExpression addOutputVector(String name, LogicalExpression 
expr);
+int addDirectTransfer(FieldReference ref, ValueVectorReadExpression 
vectorRead);
+void addComplexField(FieldReference ref);
+ValueVectorWriteExpression addEvalVector(String outputName,
+LogicalExpression expr);
+  }
+
+  private static class ClassifierResult {
+private boolean isStar;
+private List outputNames;
+private String p

[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011399#comment-17011399
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554333
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectBatchBuilder.java
 ##
 @@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.project;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.LogicalExpression;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.exec.expr.ValueVectorReadExpression;
+import org.apache.drill.exec.expr.ValueVectorWriteExpression;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.FixedWidthVector;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
+
+public class ProjectBatchBuilder implements 
ProjectionMaterializer.BatchBuilder {
+  private final ProjectRecordBatch projectBatch;
+  private final VectorContainer container;
+  private final SchemaChangeCallBack callBack;
+  private final RecordBatch incomingBatch;
+  final List transfers = new ArrayList<>();
 
 Review comment:
   Made private. Added getter. But. left initializer with the field since this 
is a final member and we often initialize such objects this way as a way of 
saying that the field does not depend on constructor input.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011402#comment-17011402
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944#discussion_r364554593
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -241,7 +241,7 @@ public TypedFieldId build() {
 secondaryFinal = finalType;
   }
 
-  MajorType actualFinalType = finalType;
+  //MajorType actualFinalType = finalType;
 
 Review comment:
   Not sure. Not sure what the comments here are about, so I thought I'd leave 
them. Just commenting out an unused variable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011406#comment-17011406
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

paul-rogers commented on issue #1944: DRILL-7503: Refactor the project operator
URL: https://github.com/apache/drill/pull/1944#issuecomment-572380826
 
 
   Thank you @ihuzenko and @arina-ielchiieva for your reviews. Addressed 
remaining minor issues. Squashed commits.
   
   @ihuzenko, your many suggestions made this a much better solution. It is 
almost, but not quite, clean enough that I could write code gen unit tests. 
Need to clean up that pesky `ExpressionTreeMaterializer` issue, then we'll be 
able to write such tests. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7503) Refactor project operator

2020-01-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011982#comment-17011982
 ] 

ASF GitHub Bot commented on DRILL-7503:
---

asfgit commented on pull request #1944: DRILL-7503: Refactor the project 
operator
URL: https://github.com/apache/drill/pull/1944
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor project operator
> -
>
> Key: DRILL-7503
> URL: https://issues.apache.org/jira/browse/DRILL-7503
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.17.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.18.0
>
>
> Work on another ticket revealed that the Project operator ("record batch") 
> has grown quite complex. The setup phase lives in the operator as one huge 
> function. The function combines the "logical" tasks of working out the 
> projection expressions and types, the code gen for those expressions, and the 
> physical setup of vectors.
> The refactoring breaks up the logic so that it is easier to focus on the 
> specific bits of interest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)