[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340094625 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/runtime/stream/sql/SqlITCase.scala ## @@ -870,6 +871,45 @@ class SqlITCase extends StreamingWithStateTestBase { assertEquals(List(expected.toString()), StreamITCase.testResults.sorted) } + + @Test + def testProjectionWithManyColumns(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = StreamTableEnvironment.create(env) +StreamITCase.clear + +// force code split +tEnv.getConfig.setMaxGeneratedCodeLength(1) Review comment: makes sense that functionally it will be the same after removing this explicit setup. i think we might want to explicitly set it so that this unit test does not depend on the default value of `maxGeneratedCodeLength`, which hypothetically can change, similarly to the test `testSelectExpressionWithSplitFromTable()`. Plus, we don't have any perf issue adding this. what do you think? i'm also okay to remove it if you see that's not an issue, but think setting it explicitly is more robust. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340094625 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/runtime/stream/sql/SqlITCase.scala ## @@ -870,6 +871,45 @@ class SqlITCase extends StreamingWithStateTestBase { assertEquals(List(expected.toString()), StreamITCase.testResults.sorted) } + + @Test + def testProjectionWithManyColumns(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = StreamTableEnvironment.create(env) +StreamITCase.clear + +// force code split +tEnv.getConfig.setMaxGeneratedCodeLength(1) Review comment: makes sense that functionally it will be the same after removing this explicit setup. i think we might want to explicitly set it so that this unit test does not depend on the default value of `maxGeneratedCodeLength`, which hypothetically can change, similarly to the test `testSelectExpressionWithSplitFromTable()`. Plus, we don't have any perf issue adding this. what do you think? i'm also okay to remove it if you see that's not an issue, but think setting it explicitly is more resilient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340094625 ## File path: flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/runtime/stream/sql/SqlITCase.scala ## @@ -870,6 +871,45 @@ class SqlITCase extends StreamingWithStateTestBase { assertEquals(List(expected.toString()), StreamITCase.testResults.sorted) } + + @Test + def testProjectionWithManyColumns(): Unit = { + +val env = StreamExecutionEnvironment.getExecutionEnvironment +val tEnv = StreamTableEnvironment.create(env) +StreamITCase.clear + +// force code split +tEnv.getConfig.setMaxGeneratedCodeLength(1) Review comment: makes sense that functionally it will be the same after removing this explicit setup. i think we might want to explicitly set it so that this unit test does not depend on the default value of `maxGeneratedCodeLength`, which hypothetically can change, similarly to the test `testSelectExpressionWithSplitFromTable()`. Plus, we don't have any perf issue adding this. what do you think? i'm okay to remove it if you see that's not an issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340091314 ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/codegen/CollectorCodeGenerator.scala ## @@ -73,19 +73,13 @@ class CollectorCodeGenerator( val input1TypeClass = boxedTypeTermForTypeInfo(input1) val input2TypeClass = boxedTypeTermForTypeInfo(collectedType) -// declaration in case of code splits -val recordMember = if (hasCodeSplits) { - s"private $input2TypeClass $input2Term;" -} else { - "" -} +val inputDecleration = + List(s"private $input1TypeClass $input1Term;", + s"private $input2TypeClass $input2Term;") -// assignment in case of code splits -val recordAssignment = if (hasCodeSplits) { - s"$input2Term" // use member -} else { - s"$input2TypeClass $input2Term" // local variable -} +val inputAssignment = + List(s"$input1Term = ($input1TypeClass) getInput();", + s"$input2Term = ($input2TypeClass) record;") Review comment: Okay, I'll write them explicitly in the generated class code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340090729 ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/codegen/MatchCodeGenerator.scala ## @@ -574,13 +584,14 @@ class MatchCodeGenerator( private def generateMeasurePatternVariableExp(patternName: String): GeneratedPatternList = { val listName = newName("patternEvents") +reusableMemberStatements.add(s"java.util.List $listName = new java.util.ArrayList();") val code = if (patternName == ALL_PATTERN_VARIABLE) { addReusablePatternNames() val patternTerm = newName("pattern") j""" - |java.util.List $listName = new java.util.ArrayList(); + |$listName.clear(); Review comment: About `$listName = new java.util.ArrayList();` and things can be buffered downstream, that's a good point, I'll make the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r340089928 ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/codegen/MatchCodeGenerator.scala ## @@ -574,13 +584,14 @@ class MatchCodeGenerator( private def generateMeasurePatternVariableExp(patternName: String): GeneratedPatternList = { val listName = newName("patternEvents") +reusableMemberStatements.add(s"java.util.List $listName = new java.util.ArrayList();") val code = if (patternName == ALL_PATTERN_VARIABLE) { addReusablePatternNames() val patternTerm = newName("pattern") j""" - |java.util.List $listName = new java.util.ArrayList(); + |$listName.clear(); Review comment: On the test coverage, I think this test should have covered it, because our change is executed with the rest of the code split: https://github.com/haodang/flink/blob/hdang/split_input_unboxing_exprs/flink-table/flink-table-planner/src/test/scala/org/apache/flink/table/runtime/stream/sql/MatchRecognizeITCase.scala#L139 Let me know if you think we should add another test to target our change in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [flink] haodang commented on a change in pull request #10000: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods
haodang commented on a change in pull request #1: [FLINK-14398][SQL/Legacy Planner]Further split input unboxing code into separate methods URL: https://github.com/apache/flink/pull/1#discussion_r339884830 ## File path: flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/codegen/CollectorCodeGenerator.scala ## @@ -73,19 +73,13 @@ class CollectorCodeGenerator( val input1TypeClass = boxedTypeTermForTypeInfo(input1) val input2TypeClass = boxedTypeTermForTypeInfo(collectedType) -// declaration in case of code splits -val recordMember = if (hasCodeSplits) { - s"private $input2TypeClass $input2Term;" -} else { - "" -} +val inputDecleration = + List(s"private $input1TypeClass $input1Term;", + s"private $input2TypeClass $input2Term;") -// assignment in case of code splits -val recordAssignment = if (hasCodeSplits) { - s"$input2Term" // use member -} else { - s"$input2TypeClass $input2Term" // local variable -} +val inputAssignment = + List(s"$input1Term = ($input1TypeClass) getInput();", + s"$input2Term = ($input2TypeClass) record;") Review comment: Any particular reason you'd prefer to write it out directly inside the class code? I was thinking grouping them into `inputAssignment` and `inputDeclaration` might make generated class code look more succinct, organized, and thus easier to read. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services