[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2018-07-27 Thread maryannxue
Github user maryannxue closed the pull request at:

https://github.com/apache/phoenix/pull/281


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-28 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153594426
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,171 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Map;
+import java.util.Properties;
+
+import org.apache.phoenix.query.BaseTest;
+import org.apache.phoenix.query.QueryServices;
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import com.google.common.collect.Maps;
+
+public class CostBasedDecisionIT extends BaseUniqueNamesOwnClusterIT {
--- End diff --

3 tests is better than 1, but more would be better. Here are some ideas:
- test that DELETE is optimized through cost-based as expected
- same for UPSERT SELECT
- how about a UNION query?
- same for some join queries
- how about a case where one plan would have a filter on a single leading 
column while another would have a filter on the first two leading columns, but 
the former plan would scan less data. In theory, the cost-based approach would 
choose the former.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-28 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153574645
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

I'm still not seeing this change. The only existing method that should 
change in QueryOptimizer is this private one:

private List getApplicablePlans(QueryPlan dataPlan, 
PhoenixStatement statement, List targetColumns, 
ParallelIteratorFactory parallelIteratorFactory, boolean stopAtBestPlan) throws 
SQLException {

The only change necessary should be to call a new method that orders the 
plans according to cost.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-27 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153404439
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,176 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Map;
+import java.util.Properties;
+
+import org.apache.phoenix.query.BaseTest;
+import org.apache.phoenix.query.QueryServices;
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import com.google.common.collect.Maps;
+
+public class CostBasedDecisionIT extends BaseTest {
--- End diff --

Thanks for pointing this out, @JamesRTaylor! Changed in my latest CI as 
suggested.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-27 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153378386
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

QueryOptimize.optimize() is not the only entry point to the optimizer. For 
example, DeleteCompiler calls both getApplicablePlans() and getBestPlan(). I 
think it'd be best if you hid this logic at the end of the private 
getApplicablePlans since all calls channel through that. That way all optimizer 
decisions will use cost if it's enabled. You could conditional order the plans 
based on their cost if cost-based is enabled here:

return hintedPlan != null ? plans : isCostBasedEnabled ? 
orderPlansBasedOnCost(plans) : orderPlansBestToWorst(select, plans, 
stopAtBestPlan);



---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-27 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153377074
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,176 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Map;
+import java.util.Properties;
+
+import org.apache.phoenix.query.BaseTest;
+import org.apache.phoenix.query.QueryServices;
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import com.google.common.collect.Maps;
+
+public class CostBasedDecisionIT extends BaseTest {
+
+@BeforeClass
+public static void doSetup() throws Exception {
+Map props = Maps.newHashMapWithExpectedSize(1);
+props.put(QueryServices.STATS_GUIDEPOST_WIDTH_BYTES_ATTRIB, 
Long.toString(20));
+props.put(QueryServices.STATS_UPDATE_FREQ_MS_ATTRIB, 
Long.toString(5));
+props.put(QueryServices.USE_STATS_FOR_PARALLELIZATION, 
Boolean.toString(true));
+props.put(QueryServices.COST_BASED_OPTIMIZER_ENABLED, 
Boolean.toString(true));
+setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
+}
+
+@AfterClass
+public static void tearDownMiniCluster() throws Exception {
--- End diff --

This method shouldn't be necessary


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-27 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r153377003
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,176 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Map;
+import java.util.Properties;
+
+import org.apache.phoenix.query.BaseTest;
+import org.apache.phoenix.query.QueryServices;
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import com.google.common.collect.Maps;
+
+public class CostBasedDecisionIT extends BaseTest {
--- End diff --

Derive from BaseUniqueNamesOwnClusterIT instead so that it runs when mvn 
verify is run (it's based on the annotation which this won't have)


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-17 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151829519
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

Thank you both for the input. I've change the logic for hinted plans as 
well as added new tests. Please review again.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-17 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151623809
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

I agree with @katameru. If you plug in your optimization code to impact the 
sort order done by QueryOptimizer.orderPlansBestToWorst() when 
getApplicablePlans() is called, we'd keep the behavior that the hints would 
override the optimizer (whether cost-based is enabled or not).

I still think we need more tests. We've found that if the tests don't get 
added with the initial commit, they unfortunately never get added. 


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-16 Thread katameru
Github user katameru commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151620890
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

I'd argue that the observable behavior should stay the same, so a hinted 
query plan should be used above all others. Currently we could just sidestep 
the cost calculations in favor of a hinted plan, but I think that having a way 
to influence the optimizer via hints could be very useful.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-16 Thread maryannxue
Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151580119
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

Thanks for the feedback, @katameru! We need to define a strategy how hints 
work in the cost-based optimizer mode. Your input would be very much 
appreciated!
Since this is the first issue in which I've brought "cost-based" in, the 
idea is to plug-in the framework and pieces of code here and there needed to 
make it work in general. We can see there's still a lot to do to refine this 
framework, and as I dig further into other issues under PHOENIX-4313. Right now 
I think adding a option to disable cost-based optimizer (pls see my latest ci) 
should be enough.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-16 Thread katameru
Github user katameru commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151421537
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
+if (plans.size() == 1) {
+return plans.get(0);
+}
+
+// Get the best plan based on their costs. Costs will be ZERO if 
stats are not
+// available, thus the first plan will be returned.
+Cost minCost = null;
+QueryPlan bestPlan = null;
+for (QueryPlan plan : plans) {
+Cost cost = plan.getCost();
+if (minCost == null || cost.compareTo(minCost) < 0) {
+minCost = cost;
+bestPlan = plan;
+}
+}
+return bestPlan;
--- End diff --

This will ignore any index hints used in the query.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151013728
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,50 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Properties;
+
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.junit.Test;
+
+public class CostBasedDecisionIT extends ParallelStatsEnabledIT {
--- End diff --

I think we need more tests for this.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151015089
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/AggregatePlan.java ---
@@ -112,7 +114,26 @@ private AggregatePlan(StatementContext context, 
FilterableStatement statement, T
 public Expression getHaving() {
 return having;
 }
-
+
+@Override
+public Cost getCost() throws SQLException {
+Long byteCount = getEstimatedBytesToScan();
+if (byteCount == null) {
+return Cost.ZERO;
+}
+
--- End diff --

What does parallelLevel represent? Is that the level of parallelization 
done on the client? Is that derivable from thread pool size or cluster size? 
For now, maybe have a constant and doc it a bit.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151014919
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/AggregatePlan.java ---
@@ -112,7 +114,26 @@ private AggregatePlan(StatementContext context, 
FilterableStatement statement, T
 public Expression getHaving() {
 return having;
 }
-
+
+@Override
+public Cost getCost() throws SQLException {
+Long byteCount = getEstimatedBytesToScan();
+if (byteCount == null) {
+return Cost.ZERO;
+}
+
+int parallelLevel = 10;
--- End diff --

How about passing in groupBy instead of just groupBy.isUngroupedAggregate() 
as I think we'd want to get more info eventually out of the groupBy (for 
example, if it's ordered or unordered which impacts memory usage substantially)?


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151016226
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/AggregatePlan.java ---
@@ -112,7 +114,26 @@ private AggregatePlan(StatementContext context, 
FilterableStatement statement, T
 public Expression getHaving() {
 return having;
 }
-
+
+@Override
+public Cost getCost() throws SQLException {
+Long byteCount = getEstimatedBytesToScan();
+if (byteCount == null) {
+return Cost.ZERO;
+}
+
+int parallelLevel = 10;
--- End diff --

How about passing in groupBy instead of just groupBy.isUngroupedAggregate() 
as I think we'd want to get more info eventually out of the groupBy (for 
example, if it's ordered or unordered which impacts memory usage 
substantially)? This might help keep the interfaces a little more stable.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151015944
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java ---
@@ -91,8 +91,23 @@ public QueryPlan optimize(PhoenixStatement statement, 
SelectStatement select, Co
 }
 
 public QueryPlan optimize(QueryPlan dataPlan, PhoenixStatement 
statement, List targetColumns, ParallelIteratorFactory 
parallelIteratorFactory) throws SQLException {
-Listplans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, true);
-return plans.get(0);
+List plans = getApplicablePlans(dataPlan, statement, 
targetColumns, parallelIteratorFactory, false);
--- End diff --

We'll need to have a new master boolean config option (like 
phoenix.optimize.costBased in QueryServices) that controls whether or not we're 
doing cost based optimization to make sure users can disable if they want to 
keep the existing behavior the same.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151015329
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/ClientAggregatePlan.java 
---
@@ -87,6 +89,25 @@ public ClientAggregatePlan(StatementContext context, 
FilterableStatement stateme
 }
 
 @Override
+public Cost getCost() throws SQLException {
+Long byteCount = getEstimatedBytesToScan();
+if (byteCount == null) {
+return Cost.ZERO;
+}
+
+int parallelLevel = 1;
--- End diff --

So parallelLevel is 1 because aggregation is done client-side instead of 
pushed to the cluster, right? Would be good to doc the thinking behind 
calculating the cost throughout.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151012514
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,50 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Properties;
+
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.junit.Test;
+
+public class CostBasedDecisionIT extends ParallelStatsEnabledIT {
+
+@Test
+public void testBug4288() throws Exception {
+Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
+Connection conn = DriverManager.getConnection(getUrl(), props);
+conn.setAutoCommit(true);
+try {
+conn.createStatement().execute("CREATE TABLE test4288 (\n" + 
+"rowkey VARCHAR PRIMARY KEY,\n" + 
+"c1 VARCHAR,\n" + 
+"c2 VARCHAR)");
+conn.createStatement().execute("CREATE LOCAL INDEX 
test4388_c1_idx ON test4288 (c1)");
+
--- End diff --

How about getting the explain plan here and verify that it's doing a full 
table scan (i.e. before there are any stats available)?


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151014126
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/HashJoinPlan.java ---
@@ -290,6 +291,25 @@ public FilterableStatement getStatement() {
 return statement;
 }
 
+@Override
+public Cost getCost() throws SQLException {
+Long byteCount = getEstimatedBytesToScan();
+if (byteCount == null) {
+return Cost.ZERO;
+}
+
+Cost cost = new Cost(0, 0, byteCount);
+Cost lhsCost = delegate.getCost();
+if (keyRangeExpressions != null) {
+lhsCost = lhsCost.multiplyBy(0.01);
--- End diff --

Why the 0.01? More comments would be good.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151012292
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/CostBasedDecisionIT.java ---
@@ -0,0 +1,50 @@
+package org.apache.phoenix.end2end;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertTrue;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.ResultSet;
+import java.util.Properties;
+
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.QueryUtil;
+import org.junit.Test;
+
+public class CostBasedDecisionIT extends ParallelStatsEnabledIT {
+
+@Test
+public void testBug4288() throws Exception {
+Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
+Connection conn = DriverManager.getConnection(getUrl(), props);
+conn.setAutoCommit(true);
+try {
+conn.createStatement().execute("CREATE TABLE test4288 (\n" + 
--- End diff --

Make sure to generate a unique name for your table (see other tests for 
examples)


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-14 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/281#discussion_r151014001
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/BaseQueryPlan.java ---
@@ -500,7 +500,10 @@ public ExplainPlan getExplainPlan() throws 
SQLException {
 if (context.getScanRanges() == ScanRanges.NOTHING) {
 return new ExplainPlan(Collections.singletonList("DEGENERATE 
SCAN OVER " + getTableRef().getTable().getName().getString()));
 }
-
+
+// Initialize dummy iterator to get the stats.
--- End diff --

Comment that you need the stats prior to optimization as they help drive 
cost based decisions.


---


[GitHub] phoenix pull request #281: PHOENIX-4288 Indexes not used when ordering by pr...

2017-11-07 Thread maryannxue
GitHub user maryannxue opened a pull request:

https://github.com/apache/phoenix/pull/281

PHOENIX-4288 Indexes not used when ordering by primary key

1. Add class Cost.
2. Add method getCost() in QueryPlan.
3. Let QueryOptimizer choose the best plan based on Cost; meanwhile if 
stats are not available the QueryOptimizer will keep the original behavior.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/phoenix phoenix-4388

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #281


commit f75c1b095683c153f8749107c5505304309dcb36
Author: maryannxue 
Date:   2017-11-07T21:40:37Z

PHOENIX-4288 Indexes not used when ordering by primary key




---