[
https://issues.apache.org/jira/browse/PHOENIX-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095523#comment-15095523
]
James Taylor commented on PHOENIX-2143:
---------------------------------------
Thanks for the patch, [~ankit.singhal]. Overall, it looks very good. Just a
couple of questions and minor nits (and I agree with [~samarthjain] about the
formatting changes, but we can give you a free pass this time as the formatting
changes are mostly around code you've already changed).
One question about the upgrade code in ConnectionQueryServicesImpl:
* Is this change necessary (to use a ts of MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0 -
1)? If not, please revert that line.
{code}
metaConnection = addColumn(metaConnection,
PhoenixDatabaseMetaData.SYSTEM_CATALOG,
-
MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0, columnsToAdd, false);
+
MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP_4_7_0 - 1, columnsToAdd, false);
{code}
* Also, did you test the upgrade path manually with a pre 4.7.0 client and
4.7.0 server?
Minor nit here in MetaDataUtil, can you create a new ScanUtil.setTimeRange(long
minTime, long maxTime) that throws a RuntimeException instead of an IOException
and then use that instead of scan.setTimeRange() to prevent the IOException
having to be thrown from this function?:
{code}
--- a/phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java
+++ b/phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java
@@ -436,24 +436,32 @@ public class MetaDataUtil {
public static final String IS_LOCAL_INDEX_TABLE_PROP_NAME =
"IS_LOCAL_INDEX_TABLE";
public static final byte[] IS_LOCAL_INDEX_TABLE_PROP_BYTES =
Bytes.toBytes(IS_LOCAL_INDEX_TABLE_PROP_NAME);
- public static Scan newTableRowsScan(byte[] key, long startTimeStamp, long
stopTimeStamp)
+ public static Scan newTableRowsScan(byte[] key, long startTimeStamp, long
stopTimeStamp) throws IOException {
+ return newTableRowsScan(key, null, startTimeStamp, stopTimeStamp);
+ }
+
+ public static Scan newTableRowsScan(byte[] startKey, byte[] stopKey, long
startTimeStamp, long stopTimeStamp)
throws IOException {
Scan scan = new Scan();
scan.setTimeRange(startTimeStamp, stopTimeStamp);
- scan.setStartRow(key);
- byte[] stopKey = ByteUtil.concat(key,
QueryConstants.SEPARATOR_BYTE_ARRAY);
- ByteUtil.nextKey(stopKey, stopKey.length);
+ scan.setStartRow(startKey);
+ if (stopKey == null) {
+ stopKey = ByteUtil.concat(startKey,
QueryConstants.SEPARATOR_BYTE_ARRAY);
+ ByteUtil.nextKey(stopKey, stopKey.length);
+ }
scan.setStopRow(stopKey);
return scan;
}
{code}
Also, looks like you won't need to call this line anymore in
StatisticsUtil.readStatistics as you're setting the time range in the util
function already:
{code}
+ s.setTimeRange(MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
{code}
> Use guidepost bytes instead of region name in stats primary key
> ---------------------------------------------------------------
>
> Key: PHOENIX-2143
> URL: https://issues.apache.org/jira/browse/PHOENIX-2143
> Project: Phoenix
> Issue Type: Sub-task
> Reporter: James Taylor
> Assignee: Ankit Singhal
> Attachments: PHOENIX-2143.patch, PHOENIX-2143_v2.patch,
> PHOENIX-2143_wip.patch, PHOENIX-2143_wip_2.patch
>
>
> Our current SYSTEM.STATS table uses the region name as the last column in the
> primary key constraint. Instead, we should use the MIN_KEY column (which
> corresponds to the region start key). The advantage would be that the stats
> would then be ordered by region start key allowing us to approximate the
> number of guideposts which would be traversed given the start/stop row of a
> scan:
> {code}
> SELECT SUM(guide_posts_count) FROM SYSTEM.STATS WHERE min_key > :1 AND
> min_key < :2
> {code}
> where :1 is the start row and :2 is the stop row of the scan. With an UNNEST
> operator for ARRAYs, we could get a better approximation.
> As part of the upgrade to the new Phoenix version containing this fix, stats
> could simply be dropped and they'd be recalculated with the new schema.
> An alternative, even more granular approach would be to *not* use arrays to
> store the guide posts, but instead store them as individual rows with a
> schema like this.
> |PHYSICAL_NAME|VARCHAR|
> |COLUMN_FAMILY|VARCHAR|
> |GUIDE_POST_KEY|VARBINARY|
> In this alternative, the maintenance during compaction is higher, though, as
> you'd need to run a separate query to do the deletion of the old guideposts,
> followed by a commit of the new guideposts. The other disadvantage (besides
> requiring multiple queries) is that this couldn't be done transactionally.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)