[ https://issues.apache.org/jira/browse/BEAM-7545?focusedWorklogId=277230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-277230 ]
ASF GitHub Bot logged work on BEAM-7545: ---------------------------------------- Author: ASF GitHub Bot Created on: 16/Jul/19 05:45 Start Date: 16/Jul/19 05:45 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9040: [BEAM-7545] Reordering Beam Joins URL: https://github.com/apache/beam/pull/9040#discussion_r303734439 ########## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java ########## @@ -103,6 +106,10 @@ // join rules JoinPushExpressionsRule.INSTANCE, + JoinCommuteRule.INSTANCE, Review comment: Because this PR is implementing reordering joins, a useful test would be a test, in which a three-way join is reordered. As join reordering is already measured in https://docs.google.com/document/d/1DM_bcfFbIoc_vEoqQxhC7AvHBUDVCAwToC8TYGukkII/edit#, wouldn't it be straightforward to have a similar test? Without such a test, how do we even know if join reordering is working? In terms of checking output plans, Flink has been doing many tests on Calcite optimization rules (see [here](https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/plan/rules/logical/RewriteMultiJoinConditionRuleTest.scala) and [here](https://github.com/apache/flink/tree/master/flink-table/flink-table-planner-blink/src/test/resources/org/apache/flink/table/plan)). Flink's practice has shown that verifying output plan is deterministic and stable. The basic idea is if you want to test an optimization, only enable relevant rules in test case(so rules are hit will be known) I can see by Flink's way, you can test rules even if rules can be disabled and enabled independently: In [BeamRuleSets.java](https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamRuleSets.java) ``` static BeamJoinReorderingRelSet = {JoinCommuteRule.INSTANCE, JoinAssociateRule.INSTANCE} ``` In JoinReorderingTest.java ``` FrameworkConfig testConfig = createTestConfig(BeamJoinReorderingRelSet) Planner testPlanner = Frameworks.getPlanner(testConfig); // setup input tables expected_plan = PlanLoader.load(testcase.class) verifyPlan(testPlanner.getPlan(sql), expected_plan) ``` By doing so, if a relevant rule is disabled(e.g. `JoinCommuteRule.INSTANCE`), it will break existing join reordering tests, which guards join ordering for us. It also justifies this PR is doing join reordering. Because we are starting the effort to have more optimization rules in BeamSQL, Flink's practice on testing is a great example that we can learn and apply to BeamSQL to maintain our codebase's health. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 277230) Time Spent: 7h 50m (was: 7h 40m) > Row Count Estimation for CSV TextTable > -------------------------------------- > > Key: BEAM-7545 > URL: https://issues.apache.org/jira/browse/BEAM-7545 > Project: Beam > Issue Type: New Feature > Components: dsl-sql > Reporter: Alireza Samadianzakaria > Assignee: Alireza Samadianzakaria > Priority: Major > Fix For: Not applicable > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Implementing Row Count Estimation for CSV Tables by reading the first few > lines of the file and estimating the number of records based on the length of > these lines and the total length of the file. -- This message was sent by Atlassian JIRA (v7.6.14#76016)