[jira] [Commented] (PHOENIX-5795) Supporting selective queries for index rows updated concurrently

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066594#comment-17066594
 ] 

Hadoop QA commented on PHOENIX-5795:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997641/PHOENIX-5795.4.x-HBase-1.5.002.patch
  against 4.x-HBase-1.5 branch at commit 
61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997641

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+diff --git 
a/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
 
b/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
+-
org.apache.hadoop.hbase.TableName.valueOf(pIndexTable.getPhysicalName().getBytes());
+diff --git 
a/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
 
b/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
++ * IndexMaintainer.getIndexedColumns() returns the data column references 
for indexed columns. The data columns are
++ * grouped into three classes, pk columns (data table pk columns), the 
indexed columns (the columns for which
++ * we want to have indexing; they form the prefix for the primary key for 
the index table (after salt and tenant id))
++ * and covered columns. The purpose of this method is to find out if all 
the indexed columns are included in the
+ private boolean hasAllIndexedColumns(IndexMaintainer indexMaintainer, 
MultiMutation multiMutation) {
+-Bytes.compareTo(CellUtil.cloneQualifier(cell), 
columnReference.getQualifier() ) == 0) {
+-   BatchMutateContext context, long 
now, PhoenixIndexMetaData indexMetaData)

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3651//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3651//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3651//console

This message is automatically generated.

> Supporting selective queries for index rows updated concurrently
> 
>
> Key: PHOENIX-5795
> URL: https://issues.apache.org/jira/browse/PHOENIX-5795
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Critical
> Attachments: PHOENIX-5795.4.x-HBase-1.5.001.patch, 
> PHOENIX-5795.4.x-HBase-1.5.002.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> In addition to leaving index rows unverified, the concurrent updates may 
> generate index row with incorrect row keys. For example, consider that an 
> application issues the verify first two upserts on the same row concurrently 
> and the second update does not include one or more of the indexed columns. 
> When these updates arrive concurrently to IndexRegionObserver, the existing 
> row state would be null for both of these updates. This mean the index 
> updates will be generated solely from the pending updates. The partial upsert 
> with missing indexed columns will generate an index row by assuming missing 
> indexed columns have null value, and this assumption may not true as the 
> other concurrent upsert may have non-null values for indexed columns. After 
> issuing the concurrent update, if

[jira] [Commented] (PHOENIX-5797) RVC Offset does not work with tenant views on global indexes

2020-03-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066596#comment-17066596
 ] 

Hudson commented on PHOENIX-5797:
-

FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #3651 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/3651/])
PHOENIX-5797 RVC Offset does not work with tenant views on global (yanxinyi: 
rev 61589a903f8c5176ce46e5af0a83729f4f4c90ec)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/compile/RVCOffsetCompiler.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java


> RVC Offset does not work with tenant views on global indexes
> 
>
> Key: PHOENIX-5797
> URL: https://issues.apache.org/jira/browse/PHOENIX-5797
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.16.0
>Reporter: Daniel Wong
>Assignee: Daniel Wong
>Priority: Minor
> Attachments: PHOENIX-5797_4.x.patch
>
>
> Found this during use of RVC Offset in Phoenix-4845.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PHOENIX-5795) Supporting selective queries for index rows updated concurrently

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066602#comment-17066602
 ] 

Hadoop QA commented on PHOENIX-5795:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997641/PHOENIX-5795.4.x-HBase-1.5.002.patch
  against 4.x-HBase-1.5 branch at commit 
61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997641

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+diff --git 
a/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
 
b/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
+-
org.apache.hadoop.hbase.TableName.valueOf(pIndexTable.getPhysicalName().getBytes());
+diff --git 
a/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
 
b/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
++ * IndexMaintainer.getIndexedColumns() returns the data column references 
for indexed columns. The data columns are
++ * grouped into three classes, pk columns (data table pk columns), the 
indexed columns (the columns for which
++ * we want to have indexing; they form the prefix for the primary key for 
the index table (after salt and tenant id))
++ * and covered columns. The purpose of this method is to find out if all 
the indexed columns are included in the
+ private boolean hasAllIndexedColumns(IndexMaintainer indexMaintainer, 
MultiMutation multiMutation) {
+-Bytes.compareTo(CellUtil.cloneQualifier(cell), 
columnReference.getQualifier() ) == 0) {
+-   BatchMutateContext context, long 
now, PhoenixIndexMetaData indexMetaData)

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.IndexRebuildTaskIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3652//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3652//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3652//console

This message is automatically generated.

> Supporting selective queries for index rows updated concurrently
> 
>
> Key: PHOENIX-5795
> URL: https://issues.apache.org/jira/browse/PHOENIX-5795
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Critical
> Attachments: PHOENIX-5795.4.x-HBase-1.5.001.patch, 
> PHOENIX-5795.4.x-HBase-1.5.002.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> In addition to leaving index rows unverified, the concurrent updates may 
> generate index row with incorrect row keys. For example, consider that an 
> application issues the verify first two upserts on the same row concurrently 
> and the second update does not include one or more of the indexed columns. 
> When these updates arrive concurrently to IndexRegionObserver, the existing 
> row state would be null for both of these updates. This mean the index 
> updates will be generated solely from the pending updates. The partial upsert 
> with missing indexed columns will generate an index row by assuming missing 
> indexed columns have null value, and this assumption may not true as the 
> other concurren

[jira] [Commented] (PHOENIX-5781) Add components in JIRA

2020-03-25 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066785#comment-17066785
 ] 

Istvan Toth commented on PHOENIX-5781:
--

[~elserj] do you have an opinion on this ? (No code, so I cannot ask for a 
review)

> Add components in JIRA
> --
>
> Key: PHOENIX-5781
> URL: https://issues.apache.org/jira/browse/PHOENIX-5781
> Project: Phoenix
>  Issue Type: Task
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> Phoenix has been split into sub projects, and adopted some projects, so I 
> think that it is time to reflect this in Jira as well.
> I propose adding the following components to the project (one per repo)
>  * core
>  * queryserver
>  * connectors
>  * tephra
>  * omid
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [phoenix] gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r397989831
 
 

 ##
 File path: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/ParameterizedIndexUpgradeToolIT.java
 ##
 @@ -298,20 +299,22 @@ private void checkOldIndexingCoprocessors(String [] 
indexList, String [] tableLi
 }
 }
 
-@Parameters(name 
="IndexUpgradeToolIT_mutable={0},upgrade={1},isNamespaceEnabled={2}")
+@Parameters(name 
="IndexUpgradeToolIT_mutable={0},upgrade={1},isNamespaceEnabled={2}, 
rebuild={3}")
 public static synchronized Collection data() {
 return Arrays.asList(new Object[][] {
-{false, false, true},
-{true, false, false},
-{false, true, false},
-{true, true, true}
+{false, false, true, false},
 
 Review comment:
   Don't you want to test false, false, true, true  and
   true, false, false, false?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] gjacoby126 commented on a change in pull request #735: PHOENIX-5734 - IndexScrutinyTool should not report rows beyond maxLoo…

2020-03-25 Thread GitBox
gjacoby126 commented on a change in pull request #735: PHOENIX-5734 - 
IndexScrutinyTool should not report rows beyond maxLoo…
URL: https://github.com/apache/phoenix/pull/735#discussion_r397990506
 
 

 ##
 File path: 
phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexScrutinyWithMaxLookbackIT.java
 ##
 @@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.end2end;
+
+import com.google.common.collect.Maps;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.regionserver.ScanInfoUtil;
+import org.apache.hadoop.mapreduce.Counters;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.phoenix.mapreduce.index.IndexScrutinyMapper;
+import org.apache.phoenix.mapreduce.index.IndexScrutinyTool;
+import org.apache.phoenix.query.QueryServices;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ManualEnvironmentEdge;
+import org.apache.phoenix.util.MetaDataUtil;
+import org.apache.phoenix.util.PropertiesUtil;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.apache.phoenix.util.SchemaUtil;
+import org.apache.phoenix.util.TestUtil;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.phoenix.mapreduce.index.IndexScrutinyMapperForTest.MAX_LOOKBACK;
+import static 
org.apache.phoenix.mapreduce.index.PhoenixScrutinyJobCounters.INVALID_ROW_COUNT;
+import static 
org.apache.phoenix.mapreduce.index.PhoenixScrutinyJobCounters.BEYOND_MAX_LOOKBACK;
+import static 
org.apache.phoenix.mapreduce.index.PhoenixScrutinyJobCounters.VALID_ROW_COUNT;
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class IndexScrutinyWithMaxLookbackIT extends IndexScrutinyToolBaseIT {
+
+private static PreparedStatement upsertDataStmt;
+private static String dataTableFullName;
+private static String schema;
+private static String dataTableName;
+private static String indexTableName;
+private static String viewName;
+private static boolean isViewIndex;
+private static ManualEnvironmentEdge testClock;
+public static final String UPSERT_DATA = "UPSERT INTO %s VALUES (?, ?, ?)";
+
+@BeforeClass
+public static synchronized void doSetup() throws Exception {
+Map props = Maps.newHashMapWithExpectedSize(1);
+
props.put(QueryServices.GLOBAL_INDEX_ROW_AGE_THRESHOLD_TO_DELETE_MS_ATTRIB, 
Long.toString(0));
+props.put(ScanInfoUtil.PHOENIX_MAX_LOOKBACK_AGE_CONF_KEY,
+Integer.toString(MAX_LOOKBACK));
+setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
+}
+
+@Test
+public void testScrutinyOnRowsBeyondMaxLookBack() throws Exception {
+schema = generateUniqueName();
+dataTableName = generateUniqueName();
+indexTableName = generateUniqueName();
+dataTableFullName = SchemaUtil.getTableName(schema, dataTableName);
+isViewIndex = false;
+String dataTableDDL = "CREATE TABLE %s (ID INTEGER NOT NULL PRIMARY 
KEY, NAME VARCHAR, "
++ "ZIP INTEGER) COLUMN_ENCODED_BYTES=0, VERSIONS=1";
+String indexTableDDL = "CREATE INDEX %s ON %s (NAME) INCLUDE (ZIP)";
+testClock = new ManualEnvironmentEdge();
+
+try (Connection conn =
+ DriverManager.getConnection(getUrl(), 
PropertiesUtil.deepCopy(TEST_PROPERTIES))) {
+conn.createStatement().execute(String.format(dataTableDDL, 
dataTableFullName));
+conn.createStatement().execute(String.format(indexTableDDL, 
indexTableName,
+dataTableFullName));
+conn.commit();
+}
+upsertDataAndScrutinize(dataTableName, dataTableFullName, testClock);
+}
+
+@Test
+public void testScrutinyOnRowsBeyondMaxLookback_viewIndex() throws 
Exception {
+schema = "S"+generateUniqueName();
+dataTableName = "T"+generateUniqueName();
+dataTableFullName

[GitHub] [phoenix] gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r397990857
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -98,16 +98,14 @@
 private static final Option LOG_FILE_OPTION = new Option("lf", "logfile",
 true,
 "[Optional] Log file path where the logs are written");
-private static final Option INDEX_SYNC_REBUILD_OPTION = new Option("sr",
-"index-sync-rebuild",
+private static final Option INDEX_REBUILD_OPTION = new Option("rb",
+"index-rebuild",
 false,
-"[Optional] Whether or not synchronously rebuild the indexes; "
-+ "default rebuild asynchronous");
-
-private static final Option INDEX_VERIFY_OPTION = new Option("v",
-"verify",
+"[Optional] Rebuild the indexes");
 
 Review comment:
   Can you change this description to include what type of build this is 
(async)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r397990857
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -98,16 +98,14 @@
 private static final Option LOG_FILE_OPTION = new Option("lf", "logfile",
 true,
 "[Optional] Log file path where the logs are written");
-private static final Option INDEX_SYNC_REBUILD_OPTION = new Option("sr",
-"index-sync-rebuild",
+private static final Option INDEX_REBUILD_OPTION = new Option("rb",
+"index-rebuild",
 false,
-"[Optional] Whether or not synchronously rebuild the indexes; "
-+ "default rebuild asynchronous");
-
-private static final Option INDEX_VERIFY_OPTION = new Option("v",
-"verify",
+"[Optional] Rebuild the indexes");
 
 Review comment:
   Can you change this description to include that the indexToolOption is tied 
to this one? And vice versa


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r397990857
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -98,16 +98,14 @@
 private static final Option LOG_FILE_OPTION = new Option("lf", "logfile",
 true,
 "[Optional] Log file path where the logs are written");
-private static final Option INDEX_SYNC_REBUILD_OPTION = new Option("sr",
-"index-sync-rebuild",
+private static final Option INDEX_REBUILD_OPTION = new Option("rb",
+"index-rebuild",
 false,
-"[Optional] Whether or not synchronously rebuild the indexes; "
-+ "default rebuild asynchronous");
-
-private static final Option INDEX_VERIFY_OPTION = new Option("v",
-"verify",
+"[Optional] Rebuild the indexes");
 
 Review comment:
   Can you change this description to include that the indexToolOption is tied 
to this one?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
gokceni commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r397992405
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -577,6 +582,10 @@ private void rebuildIndexes(Connection conn, 
Configuration conf, ArrayList

[GitHub] [phoenix] stoty opened a new pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
stoty opened a new pull request #744: PHOENIX-5780 Add mvn dependency:analyze 
to build process
URL: https://github.com/apache/phoenix/pull/744
 
 
   cleanup/reorganize dependency handling
   also includes the following java code changes:
   * remove accidental jline usages
   * fix accidental shaded guava import
   * replace commins-lang with commons-lang3 everywhere
   * replace incidental commons-logging usages
   * replace netty ThreadLocalRandom with j.u.c.ThreadLocalRandom
   * replace com.sun.istack.NotNull with edu.umd.cs.findbugs.annotations.NonNull


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (PHOENIX-5780) Add mvn dependency:analyze to build process

2020-03-25 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066842#comment-17066842
 ] 

Istvan Toth commented on PHOENIX-5780:
--

The attached patch is quite large and noisy, so I will try to summarize the 
changes.

The main goal was to make dependency:analyze pass, i.e. have the poms reflect 
the actually used dependencies.

The secondary goal was to have all dependency versions consolidated to 
properties/dependencyManagement.

Upgrading dependency versions was a non-goal, for this ticket.

I have also made some minor changes the java code that should not affect the 
the functionality adversely, but let me remove dependencies:
 * replaced commons-lang with commons-lang3
 * replaced commons-logging with slf4j (this looks like an oversight/bug)
 * replaced com.sun.istack.NotNull with edu.umd.cs.findbugs.annotations.NonNull 
(this actually may do something with findbugs)
 * replaced io.netty.util.internal.ThreadLocalRandom with 
java.util.concurrent.ThreadLocalRandom
 * removed spurious jline.internal.TestAccessible import
 * fixed a test in pherf that failed on Mac

The resulting dependencies are by design very close to the dependencies without 
the patch, as this is meant to be a foundation on which further  
build/dependency work can be built.

Most of these changes would apply to 4.x, but it would have to be re-done, 
instead of cherry-picking, so the patch is for master only for now.

> Add mvn dependency:analyze to build process
> ---
>
> Key: PHOENIX-5780
> URL: https://issues.apache.org/jira/browse/PHOENIX-5780
> Project: Phoenix
>  Issue Type: Task
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Attachments: PHOENIX-5780.master.v1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> mvn dependency:analyze has shown that the dependency definitions in Phoenix 
> are in a bad shape.
> Include it in the build process, so that we can keep the dependencies true 
> and up to date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PHOENIX-5781) Add components in JIRA

2020-03-25 Thread Josh Elser (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066866#comment-17066866
 ] 

Josh Elser commented on PHOENIX-5781:
-

Sounds like a good DISCUSS thread on the dev list to me :)

> Add components in JIRA
> --
>
> Key: PHOENIX-5781
> URL: https://issues.apache.org/jira/browse/PHOENIX-5781
> Project: Phoenix
>  Issue Type: Task
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
>
> Phoenix has been split into sub projects, and adopted some projects, so I 
> think that it is time to reflect this in Jira as well.
> I propose adding the following components to the project (one per repo)
>  * core
>  * queryserver
>  * connectors
>  * tephra
>  * omid
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PHOENIX-5796) Possible query optimization when projecting uncovered columns and querying on indexed columns

2020-03-25 Thread Chinmay Kulkarni (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066934#comment-17066934
 ] 

Chinmay Kulkarni commented on PHOENIX-5796:
---

[~dbwong] [~comnetwork] Thanks for looking through the code and analyzing this 
in more detail! Makes sense. So it looks like the behavior after you provide an 
index hint is optimal. IMO, this should be the default behavior if such an 
index exists (even when an index hint is not provided), so there is still room 
for improvement in our query optimizer. 

Also, perhaps the index hint query should not show up as a FULL TABLE SCAN in 
the explain plan. We can improve the explain plan to indicate the semi join, or 
at the very least, add more documentation around understanding a 
full-table-scan in the context of a "SKIP-SCAN-JOIN" on the website for users.

Overall, it looks to me that the improvement is still a valid one, though 
luckily it is already done when you provide the index hint, WDYT [~dbwong] 
[~comnetwork]?

> Possible query optimization when projecting uncovered columns and querying on 
> indexed columns
> -
>
> Key: PHOENIX-5796
> URL: https://issues.apache.org/jira/browse/PHOENIX-5796
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Chinmay Kulkarni
>Priority: Major
> Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot 
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
>
>
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to 
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
>  !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the 
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly 
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
>  !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! 
> In the first query, can there be an optimization to *query the index table, 
> get the start and stop keys of the base table and then issue a range 
> scan/(bunch of point lookups) on the base table* instead of doing a full 
> table scan on the base table like we currently do?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PHOENIX-5796) Possible query optimization when projecting uncovered columns and querying on indexed columns

2020-03-25 Thread Daniel Wong (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066944#comment-17066944
 ] 

Daniel Wong commented on PHOENIX-5796:
--

Its not necessarily optimal.  

There are a bunch of possible jiras around optimization that we could consider. 
 If the selectivity of the index query is high, that is we select a large 
portion of the rows it will be faster to full table scan the base table.  
With stats or with region boundaries we can better estimate selectivity of 
predicates and thus decide if we want to join or FTS.

One easy one i want to open is small scan optimizations where for example if 
the user provides a limit we do stuff like this.  This exact case plus limit 
limit with index join on uncovered columns, i forget if i opened a Jira 
already.  Using region boundaries would be possibly enough in this case as the 
index is on B only, so its meaningful enough, if the index had been on C,B it 
would not be sufficient to estimate selectivity and we would need stats.

> Possible query optimization when projecting uncovered columns and querying on 
> indexed columns
> -
>
> Key: PHOENIX-5796
> URL: https://issues.apache.org/jira/browse/PHOENIX-5796
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Chinmay Kulkarni
>Priority: Major
> Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot 
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
>
>
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to 
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
>  !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the 
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly 
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
>  !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! 
> In the first query, can there be an optimization to *query the index table, 
> get the start and stop keys of the base table and then issue a range 
> scan/(bunch of point lookups) on the base table* instead of doing a full 
> table scan on the base table like we currently do?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [phoenix] joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398019604
 
 

 ##
 File path: phoenix-core/pom.xml
 ##
 @@ -167,8 +167,117 @@
   
 
   
+
+
+  org.apache.phoenix
+  phoenix-hbase-compat-${hbase.compat.version}
+
+
 
 Review comment:
   nit: "Hadoop"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398025213
 
 

 ##
 File path: phoenix-pherf/pom.xml
 ##
 @@ -15,463 +15,546 @@
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
-   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
-   4.0.0
-   
-   org.apache.phoenix
-   phoenix
-   5.1.0-SNAPSHOT
-   
+http://maven.apache.org/POM/4.0.0";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+  4.0.0
+  
+org.apache.phoenix
+phoenix
+5.1.0-SNAPSHOT
+  
 
-   phoenix-pherf
-   jar
-   Phoenix - Pherf
+  phoenix-pherf
+  jar
+  Phoenix - Pherf
 
-   
-   ${project.basedir}/..
-   org.apache.phoenix.shaded
-   
+  
+${project.basedir}/..
+org.apache.phoenix.shaded
 
-   
-   
-   org.apache.phoenix
-   phoenix-core
-   
-   
-   org.apache.phoenix
-   phoenix-core
-   test-jar
-   test
-   
-   
-   com.googlecode.java-diff-utils
-   diffutils
-   
-   
-   org.apache.commons
-   commons-lang3
-   
-   
-   org.apache.commons
-   commons-math3
-   
+
+1.2.1
+3.3
+1.1
+0.15
+1.8.0
+1.0.1
+2.2.11
+  
+
+  
+
+  
+com.googlecode.java-diff-utils
+diffutils
+${diffutils.version}
+  
+  
+org.apache.commons
+commons-math3
+${commons-math3.version}
+  
+  
+javax.activation
+activation
+${activation.version}
+  
+  
+stax
+stax-api
+${stax.version}
+  
+  
+javax.xml.bind
+jaxb-api
+${jaxb.version}
+  
+  
+org.glassfish.jaxb
+jaxb-runtime
+${jaxb.version}
+  
+  
+com.jcabi
+jcabi-jdbc
 
 Review comment:
   This one caught my eye. Looks like we have this dependency for a single IT 
in phoenix-pherf. Not something to change in this PR, but seems silly :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (PHOENIX-5796) Possible query optimization when projecting uncovered columns and querying on indexed columns

2020-03-25 Thread Chinmay Kulkarni (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17066956#comment-17066956
 ] 

Chinmay Kulkarni commented on PHOENIX-5796:
---

[~dbwong] Agreed that in general, we can use stats for selecting more optimal 
queries. However, if stats is turned off and we have a scenario such as this 
particular example, more often than not a range scan on the global index 
followed by a range scan on the base table may be more optimal than a full 
table scan on the base table (would need stats to be sure) in normal production 
use-cases, given a good indexing scheme.
This is debatable and possibly a config that a client would want to be able to 
toggle. This is possibly why we never do this by default in the first case.

Can you point me towards that Jira you opened? I'd like to track such 
stats-based query optimization Jiras since such micro-improvements can 
contribute a lot to the overall query time.


> Possible query optimization when projecting uncovered columns and querying on 
> indexed columns
> -
>
> Key: PHOENIX-5796
> URL: https://issues.apache.org/jira/browse/PHOENIX-5796
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Chinmay Kulkarni
>Priority: Major
>  Labels: query-optimization
> Attachments: Screen Shot 2020-03-23 at 3.25.38 PM.png, Screen Shot 
> 2020-03-23 at 3.32.24 PM.png, Screen Shot 2020-03-24 at 11.51.12 AM.png
>
>
> Start HBase-1.3 server with Phoenix-4.15.0-HBase-1.3 server jar. Connect to 
> it using sqlline.py which has Phoenix-4.15.0-HBase-1.3 Phoenix client.
> Create a base table like:
> {code:sql}
> create table t (a integer primary key, b varchar(10), c integer);
> {code}
> Create an uncovered index on top of it like:
> {code:sql}
> create index uncov_index_t on t(b);
> {code}
> Now if you issue the query:
> {code:sql}
> explain select c from t where b='abc';
> {code}
> You'd see the following explain plan:
>  !Screen Shot 2020-03-23 at 3.25.38 PM.png|height=150,width=700!
> *Which is a full table scan on the base table 't'* since we cannot use the 
> global index as 'c' is not a covered column in the global index.
> *However, projecting columns contained fully within the index pk is correctly 
> a range scan:*
> {code:sql}
> explain select a,b from t where b='abc';
> {code}
> produces the following explain plan:
>  !Screen Shot 2020-03-23 at 3.32.24 PM.png|height=150,width=700! 
> In the first query, can there be an optimization to *query the index table, 
> get the start and stop keys of the base table and then issue a range 
> scan/(bunch of point lookups) on the base table* instead of doing a full 
> table scan on the base table like we currently do?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [phoenix] joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
joshelser commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398103417
 
 

 ##
 File path: phoenix-core/pom.xml
 ##
 @@ -212,30 +335,16 @@
   org.apache.tephra
   tephra-core
 
-
-  org.apache.tephra
-  tephra-core
-  test-jar
-  test
-
 
   org.apache.tephra
   tephra-hbase-compat-2.0
 
 
-
+
 
   org.antlr
   antlr-runtime
 
-
-  jline
-  jline
-
-
-  sqlline
-  sqlline
 
 Review comment:
   Dropping this here, removes sqlline from phoenix-client jar (via 
phoenix-core).
   
   Either need to keep it here with an exclusion, or add sqlline as a 
dependency on phoenix-client.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r398107999
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -577,6 +582,10 @@ private void rebuildIndexes(Connection conn, 
Configuration conf, ArrayList

[GitHub] [phoenix] swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r398105421
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -707,12 +716,12 @@ private int startIndexRebuilds(HashMap indexInfos,
 list.add("-tenant");
 list.add(tenantId);
 }
-if (syncRebuild) {
-list.add("-runfg");
-}
-if(!Strings.isNullOrEmpty(verify)) {
-list.add("-v");
-list.add(verify);
+
+if(!Strings.isNullOrEmpty(indexToolOpts)) {
 
 Review comment:
   nit: spaces instead of tabs


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r398106829
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -707,12 +716,12 @@ private int startIndexRebuilds(HashMap indexInfos,
 list.add("-tenant");
 list.add(tenantId);
 }
-if (syncRebuild) {
-list.add("-runfg");
-}
-if(!Strings.isNullOrEmpty(verify)) {
-list.add("-v");
-list.add(verify);
+
+if(!Strings.isNullOrEmpty(indexToolOpts)) {
+String[] options = indexToolOpts.split("\\s+");
 
 Review comment:
   how would this work in a bash script? Does bash identify consecutive options 
like this?
   -tool -v ONLY 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
swaroopak commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r398122180
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -707,12 +716,12 @@ private int startIndexRebuilds(HashMap indexInfos,
 list.add("-tenant");
 list.add(tenantId);
 }
-if (syncRebuild) {
-list.add("-runfg");
-}
-if(!Strings.isNullOrEmpty(verify)) {
-list.add("-v");
-list.add(verify);
+
+if(!Strings.isNullOrEmpty(indexToolOpts)) {
+String[] options = indexToolOpts.split("\\s+");
 
 Review comment:
   should be fine, but good to double-check?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] swaroopak commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
swaroopak commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398126994
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
+throws IOException {
+Iterator iterator = actualMutationList.iterator();
+Mutation previous = null;
+while (iterator.hasNext()) {
+Mutation mutation = iterator.next();
+if ((mutation instanceof Put && !isVerified((Put) mutation)) ||
+(mutation instanceof Delete && 
isDeleteFamilyVersion(mutation))) {
+iterator.remove();
+} else {
+if (previous != null && getTimestamp(previous) == 
getTimestamp(mutation) &&
+((previous instanceof Put && mutation instanceof Put) 
||
+previous instanceof Delete && mutation 
instanceof Delete)) {
+iterator.remove();
+} else {
+previous = mutation;
+}
+}
+}
+}
+
 /**
- * indexRow is the set of all cells of all the row version of an index row 
from the index table. These are actual
- * cells. We group these cells based on timestamp and type (put vs 
delete), and form the actual set of
- * index mutations. indexKeyToMutationMap is a map from an index row key 
to a set of mutations that are generated
- * using the rebuild process (i.e., by replaying raw data table 
mutations). These sets are sets of expected
- * index mutations, one set for each index row key. Since not all 
mutations in the index table have both phase
- * (i.e., pre and post data phase) mutations, we cannot compare actual 
index mutations with expected one by one
- * and expect to find them identical. We need to consider concurrent data 
mutation effects, data table row write
- * failures, post index write failures. Thus, we need to allow some 
expected and actual mutations to be skipped
- * during comparing actual mutations to index mutations.
+ * There are two types of verification: without repair and with repair. 
Without-repair verification is done before
+ * or after index rebuild. It is done before index rebuild to identify the 
rows to be rebuilt. It is done after
+ * index rebuild to verify the rows that have been rebuilt. With-repair 
verification can be done anytime using
+ * the “-v ONLY” option to check the consistency of the index table.
+ *
+ * Unverified Rows
+ *
+ * For each mutable data table mutation during regular data table updates, 
two operations are done on the data table.
+ * One is to read the existing row state, and the second is to update the 
data table for this row. The processing of
+ * concurrent data mutations are serialized once for reading the existing 
row states, and then serialized again
+ * for updating the data table. In other words, they go through locking 
twice, i.e., [lock, read, unlock] and
+ * [lock, write, unlock]. Because of this two phase locking, for a pair of 
concurrent mutations (for 

[GitHub] [phoenix] tkhurana commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade tool command line improvements

2020-03-25 Thread GitBox
tkhurana commented on a change in pull request #743: PHOENIX-5798 IndexUpgrade 
tool command line improvements
URL: https://github.com/apache/phoenix/pull/743#discussion_r398128008
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexUpgradeTool.java
 ##
 @@ -707,12 +716,12 @@ private int startIndexRebuilds(HashMap indexInfos,
 list.add("-tenant");
 list.add(tenantId);
 }
-if (syncRebuild) {
-list.add("-runfg");
-}
-if(!Strings.isNullOrEmpty(verify)) {
-list.add("-v");
-list.add(verify);
+
+if(!Strings.isNullOrEmpty(indexToolOpts)) {
+String[] options = indexToolOpts.split("\\s+");
 
 Review comment:
   They will have to be quoted. Like -tool '-v ONLY -runfg -starttime 1'


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398136606
 
 

 ##
 File path: phoenix-core/pom.xml
 ##
 @@ -167,8 +167,117 @@
   
 
   
+
+
+  org.apache.phoenix
+  phoenix-hbase-compat-${hbase.compat.version}
+
+
 
 Review comment:
   fixed
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398136615
 
 

 ##
 File path: phoenix-pherf/pom.xml
 ##
 @@ -15,463 +15,546 @@
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
-http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
-   xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
-   4.0.0
-   
-   org.apache.phoenix
-   phoenix
-   5.1.0-SNAPSHOT
-   
+http://maven.apache.org/POM/4.0.0";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd";>
+  4.0.0
+  
+org.apache.phoenix
+phoenix
+5.1.0-SNAPSHOT
+  
 
-   phoenix-pherf
-   jar
-   Phoenix - Pherf
+  phoenix-pherf
+  jar
+  Phoenix - Pherf
 
-   
-   ${project.basedir}/..
-   org.apache.phoenix.shaded
-   
+  
+${project.basedir}/..
+org.apache.phoenix.shaded
 
-   
-   
-   org.apache.phoenix
-   phoenix-core
-   
-   
-   org.apache.phoenix
-   phoenix-core
-   test-jar
-   test
-   
-   
-   com.googlecode.java-diff-utils
-   diffutils
-   
-   
-   org.apache.commons
-   commons-lang3
-   
-   
-   org.apache.commons
-   commons-math3
-   
+
+1.2.1
+3.3
+1.1
+0.15
+1.8.0
+1.0.1
+2.2.11
+  
+
+  
+
+  
+com.googlecode.java-diff-utils
+diffutils
+${diffutils.version}
+  
+  
+org.apache.commons
+commons-math3
+${commons-math3.version}
+  
+  
+javax.activation
+activation
+${activation.version}
+  
+  
+stax
+stax-api
+${stax.version}
+  
+  
+javax.xml.bind
+jaxb-api
+${jaxb.version}
+  
+  
+org.glassfish.jaxb
+jaxb-runtime
+${jaxb.version}
+  
+  
+com.jcabi
+jcabi-jdbc
 
 Review comment:
   Yes, there are some more of those, but I had to draw the line somewhere.
   My favourite so far is having four places in the code where we process JSON, 
and using three different libraries for that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
stoty commented on a change in pull request #744: PHOENIX-5780 Add mvn 
dependency:analyze to build process
URL: https://github.com/apache/phoenix/pull/744#discussion_r398137415
 
 

 ##
 File path: phoenix-core/pom.xml
 ##
 @@ -212,30 +335,16 @@
   org.apache.tephra
   tephra-core
 
-
-  org.apache.tephra
-  tephra-core
-  test-jar
-  test
-
 
   org.apache.tephra
   tephra-hbase-compat-2.0
 
 
-
+
 
   org.antlr
   antlr-runtime
 
-
-  jline
-  jline
-
-
-  sqlline
-  sqlline
 
 Review comment:
   Right, I forgot because the code does not refer to it.
   I feel that it belongs to client. I also updated it to 1.9 now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] stoty commented on issue #744: PHOENIX-5780 Add mvn dependency:analyze to build process

2020-03-25 Thread GitBox
stoty commented on issue #744: PHOENIX-5780 Add mvn dependency:analyze to build 
process
URL: https://github.com/apache/phoenix/pull/744#issuecomment-604060771
 
 
   > Trying to run a `mvn verify -DskipTests` locally (also failed in the same 
way with the 2.1 profile)
   
   That's embarrassing. At least we know that the check works :) . Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (PHOENIX-5780) Add mvn dependency:analyze to build process

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067077#comment-17067077
 ] 

Hadoop QA commented on PHOENIX-5780:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997704/PHOENIX-5780.master.v1.patch
  against master branch at commit 61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997704

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 0 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+public UpdateStatisticsStatement(NamedTableNode table, @NonNull 
StatisticsCollectionScope scope, Map props) {
+private static void addCustomAnnotationsToSpan(@Nullable Span span, 
@NonNull PhoenixConnection conn) {
+public static Map getAnnotations(@NonNull String url, 
@NonNull Properties info) {
+  
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"
 />
+  
implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
+  
implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">
+  
implementation="org.apache.maven.plugins.shade.resource.IncludeResourceTransformer">

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3653//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3653//console

This message is automatically generated.

> Add mvn dependency:analyze to build process
> ---
>
> Key: PHOENIX-5780
> URL: https://issues.apache.org/jira/browse/PHOENIX-5780
> Project: Phoenix
>  Issue Type: Task
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Attachments: PHOENIX-5780.master.v1.patch, 
> PHOENIX-5780.master.v2.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> mvn dependency:analyze has shown that the dependency definitions in Phoenix 
> are in a bad shape.
> Include it in the build process, so that we can keep the dependencies true 
> and up to date.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PHOENIX-5698) Phoenix Query with RVC IN list expression generates wrong scan with non-pk ordered pks

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067116#comment-17067116
 ] 

Hadoop QA commented on PHOENIX-5698:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997715/PHOENIX-5698-4.x.v6.patch
  against 4.x branch at commit 61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997715

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 23 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private static final String TENANT_SPECIFIC_URL1 = getUrl() + ';' + 
TENANT_ID_ATTRIB + "=tenant1";
+String tableName = generateUniqueName() + "in_test" + 
pkType.getSqlTypeName() + saltBuckets + (isMultiTenant ? "_multi" : "_single");
+private void buildSchema(String fullTableName, String fullViewName, 
boolean isDecOrder) throws Exception {
+stmt.execute("CREATE TABLE " + fullTableName + "(\n" + " 
TENANT_ID CHAR(15) NOT NULL,\n" + " KEY_PREFIX CHAR(3) NOT NULL, ID5 BIGINT \n" 
+
+" CONSTRAINT PK PRIMARY KEY (\n" + " TENANT_ID," + " 
KEY_PREFIX" + ")) MULTI_TENANT=TRUE");
+stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 
VARCHAR NOT NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " ID3 BIGINT, ID4 BIGINT 
\n" +
+" CONSTRAINT PKVIEW PRIMARY KEY\n" + " (\n" + " 
ID1, ID2 DESC\n" + ")) " +
+tenantStmt.execute("CREATE VIEW IF NOT EXISTS " + 
this.descViewName + " AS SELECT * FROM " + fullViewName);
+stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 
VARCHAR NOT NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " ID3 BIGINT, ID4 BIGINT 
\n" +
+tenantStmt.execute("CREATE VIEW IF NOT EXISTS " + 
this.ascViewName + " AS SELECT * FROM " + fullViewName);

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3654//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3654//console

This message is automatically generated.

> Phoenix Query with RVC IN list expression generates wrong scan with non-pk 
> ordered pks
> --
>
> Key: PHOENIX-5698
> URL: https://issues.apache.org/jira/browse/PHOENIX-5698
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.15.0, 4.14.3
>Reporter: Daniel Wong
>Assignee: Xinyi Yan
>Priority: Major
>  Labels: DESC
> Attachments: PHOENIX-5698-4.14-HBase-1.3.patch, 
> PHOENIX-5698-4.x-HBase-1.3.patch, PHOENIX-5698-4.x.patch, 
> PHOENIX-5698-4.x.v3.patch, PHOENIX-5698-4.x.v4.patch, 
> PHOENIX-5698-4.x.v5.patch, PHOENIX-5698-4.x.v6.patch, 
> PHOENIX-5698-master.v2.patch, PHOENIX-5698.patch
>
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> In the code below ideally we'd expect a SINGLE ROW DELETE plan client side. 
> However, this generates an incorrect scan with range ['tenant1
> 0CY005xx01Sv6o'). If the order of the RVCs is changed to row key order 
> Phoenix correctly generates a SINGLE ROW SCAN.  As we provide the full PK 
> this we expect a either tightly bounded range scan or a client side delete.  
> Instead we get a range scan on composite leading edge 
> TENANT_ID,KEY_PREFIX,ID1.
>  
> {code:java}
> @Test
>  public void testInListExpressionWithDescAgain() throws Exception {
>  String fullTableName = generateUniqueName();
>  String fullViewName = generateUniqueName();
>  String tenantView = generateUniqueName();
>  // create base table and global view using global connection
>  try (Connection conn = DriverManager.getConnection(getUrl()))
> { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); 
> stmt.execute("CREATE TABLE " + fullTableName + "(\n" + " TENANT_ID CHAR(15) 
> NOT NULL,\n" + " KEY_PREFIX CHAR(3) NOT NULL,\n" + " CONSTRAINT PK PRIMARY 
> KEY (\n" + " TENANT_ID," + " KEY_PREFIX" + ")) MULTI_TENANT=TRUE"); 
> stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 VARCHAR NOT 
> NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " EVENT_DATE DATE NOT NULL,\n" + " 
> CONSTRAINT PKVIEW PRIMARY KEY\n" + " (\n" + " ID1, ID2 DESC, EVENT_DATE 
> DESC\n" + ")) AS SELECT * FROM " + fullTableName + " WHERE KEY_PREFIX = 
> '0CY'"); 

[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067212#comment-17067212
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997761/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997761

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3656//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> Since expected index mutations are derived from the data table row after 
> these concurrent mutations are applied, the expected list would not match 
> with the actual list of index mutations.  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [phoenix] kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398227615
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
 
 Review comment:
   This simulates the read repair given that the list of revisions for the 
actual and the expected (repaired form) mutations. It is much simpler than the 
read repair process in GlobalIndexChecker which works on one index row at a 
time and leverages the HBase scans on the actual table, which is not the case 
here. Also, GlobalIndexChecker also purges the invalid rows from HBase in 
addition rebuilding unverified rows. So, they are very different in nature and 
not sure how they can be unified. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398228255
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
 
 Review comment:
   @wangweiming800, Please see the updated the code above and let me know if 
you have a question on this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398230498
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
+throws IOException {
+Iterator iterator = actualMutationList.iterator();
+Mutation previous = null;
+while (iterator.hasNext()) {
+Mutation mutation = iterator.next();
+if ((mutation instanceof Put && !isVerified((Put) mutation)) ||
+(mutation instanceof Delete && 
isDeleteFamilyVersion(mutation))) {
+iterator.remove();
+} else {
+if (previous != null && getTimestamp(previous) == 
getTimestamp(mutation) &&
+((previous instanceof Put && mutation instanceof Put) 
||
+previous instanceof Delete && mutation 
instanceof Delete)) {
+iterator.remove();
+} else {
+previous = mutation;
+}
+}
+}
+}
+
 /**
- * indexRow is the set of all cells of all the row version of an index row 
from the index table. These are actual
- * cells. We group these cells based on timestamp and type (put vs 
delete), and form the actual set of
- * index mutations. indexKeyToMutationMap is a map from an index row key 
to a set of mutations that are generated
- * using the rebuild process (i.e., by replaying raw data table 
mutations). These sets are sets of expected
- * index mutations, one set for each index row key. Since not all 
mutations in the index table have both phase
- * (i.e., pre and post data phase) mutations, we cannot compare actual 
index mutations with expected one by one
- * and expect to find them identical. We need to consider concurrent data 
mutation effects, data table row write
- * failures, post index write failures. Thus, we need to allow some 
expected and actual mutations to be skipped
- * during comparing actual mutations to index mutations.
+ * There are two types of verification: without repair and with repair. 
Without-repair verification is done before
+ * or after index rebuild. It is done before index rebuild to identify the 
rows to be rebuilt. It is done after
+ * index rebuild to verify the rows that have been rebuilt. With-repair 
verification can be done anytime using
+ * the “-v ONLY” option to check the consistency of the index table.
+ *
+ * Unverified Rows
+ *
+ * For each mutable data table mutation during regular data table updates, 
two operations are done on the data table.
+ * One is to read the existing row state, and the second is to update the 
data table for this row. The processing of
+ * concurrent data mutations are serialized once for reading the existing 
row states, and then serialized again
+ * for updating the data table. In other words, they go through locking 
twice, i.e., [lock, read, unlock] and
+ * [lock, write, unlock]. Because of this two phase locking, for a pair of 
concurrent mutations (for 

[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067247#comment-17067247
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997762/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997762

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+diff --git 
a/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
 
b/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
+-
org.apache.hadoop.hbase.TableName.valueOf(pIndexTable.getPhysicalName().getBytes());
+diff --git 
a/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
 
b/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
++ * IndexMaintainer.getIndexedColumns() returns the data column references 
for indexed columns. The data columns are
++ * grouped into three classes, pk columns (data table pk columns), the 
indexed columns (the columns for which
++ * we want to have indexing; they form the prefix for the primary key for 
the index table (after salt and tenant id))
++ * and covered columns. The purpose of this method is to find out if all 
the indexed columns are included in the
+ private boolean hasAllIndexedColumns(IndexMaintainer indexMaintainer, 
MultiMutation multiMutation) {
+-Bytes.compareTo(CellUtil.cloneQualifier(cell), 
columnReference.getQualifier() ) == 0) {
+-   BatchMutateContext context, long 
now, PhoenixIndexMetaData indexMetaData)

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.phoenix.index.VerifySingleIndexRowTest

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3658//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3658//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3658//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update

[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067248#comment-17067248
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997762/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
61589a903f8c5176ce46e5af0a83729f4f4c90ec.
  ATTACHMENT ID: 12997762

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+diff --git 
a/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
 
b/phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsExtendedIT.java
+-
org.apache.hadoop.hbase.TableName.valueOf(pIndexTable.getPhysicalName().getBytes());
+diff --git 
a/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
 
b/phoenix-core/src/main/java/org/apache/phoenix/hbase/index/IndexRegionObserver.java
++ * IndexMaintainer.getIndexedColumns() returns the data column references 
for indexed columns. The data columns are
++ * grouped into three classes, pk columns (data table pk columns), the 
indexed columns (the columns for which
++ * we want to have indexing; they form the prefix for the primary key for 
the index table (after salt and tenant id))
++ * and covered columns. The purpose of this method is to find out if all 
the indexed columns are included in the
+ private boolean hasAllIndexedColumns(IndexMaintainer indexMaintainer, 
MultiMutation multiMutation) {
+-Bytes.compareTo(CellUtil.cloneQualifier(cell), 
columnReference.getQualifier() ) == 0) {
+-   BatchMutateContext context, long 
now, PhoenixIndexMetaData indexMetaData)

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.phoenix.index.VerifySingleIndexRowTest

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3659//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3659//artifact/patchprocess/patchReleaseAuditWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3659//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update

[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r397523167
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
+throws IOException {
+Iterator iterator = actualMutationList.iterator();
+Mutation previous = null;
+while (iterator.hasNext()) {
+Mutation mutation = iterator.next();
+if ((mutation instanceof Put && !isVerified((Put) mutation)) ||
+(mutation instanceof Delete && 
isDeleteFamilyVersion(mutation))) {
+iterator.remove();
+} else {
+if (previous != null && getTimestamp(previous) == 
getTimestamp(mutation) &&
+((previous instanceof Put && mutation instanceof Put) 
||
+previous instanceof Delete && mutation 
instanceof Delete)) {
+iterator.remove();
+} else {
+previous = mutation;
+}
+}
+}
+}
+
 /**
- * indexRow is the set of all cells of all the row version of an index row 
from the index table. These are actual
- * cells. We group these cells based on timestamp and type (put vs 
delete), and form the actual set of
- * index mutations. indexKeyToMutationMap is a map from an index row key 
to a set of mutations that are generated
- * using the rebuild process (i.e., by replaying raw data table 
mutations). These sets are sets of expected
- * index mutations, one set for each index row key. Since not all 
mutations in the index table have both phase
- * (i.e., pre and post data phase) mutations, we cannot compare actual 
index mutations with expected one by one
- * and expect to find them identical. We need to consider concurrent data 
mutation effects, data table row write
- * failures, post index write failures. Thus, we need to allow some 
expected and actual mutations to be skipped
- * during comparing actual mutations to index mutations.
+ * There are two types of verification: without repair and with repair. 
Without-repair verification is done before
+ * or after index rebuild. It is done before index rebuild to identify the 
rows to be rebuilt. It is done after
+ * index rebuild to verify the rows that have been rebuilt. With-repair 
verification can be done anytime using
+ * the “-v ONLY” option to check the consistency of the index table.
+ *
+ * Unverified Rows
+ *
+ * For each mutable data table mutation during regular data table updates, 
two operations are done on the data table.
+ * One is to read the existing row state, and the second is to update the 
data table for this row. The processing of
+ * concurrent data mutations are serialized once for reading the existing 
row states, and then serialized again
+ * for updating the data table. In other words, they go through locking 
twice, i.e., [lock, read, unlock] and
+ * [lock, write, unlock]. Because of this two phase locking, for a pair of 
concurrent mutations 

[GitHub] [phoenix] yanxinyi closed pull request #736: PHOENIX-5698 Phoenix Query with RVC IN list expression generates wron…

2020-03-25 Thread GitBox
yanxinyi closed pull request #736: PHOENIX-5698 Phoenix Query with RVC IN list 
expression generates wron…
URL: https://github.com/apache/phoenix/pull/736
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398250136
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
+throws IOException {
+Iterator iterator = actualMutationList.iterator();
+Mutation previous = null;
+while (iterator.hasNext()) {
+Mutation mutation = iterator.next();
+if ((mutation instanceof Put && !isVerified((Put) mutation)) ||
+(mutation instanceof Delete && 
isDeleteFamilyVersion(mutation))) {
+iterator.remove();
+} else {
+if (previous != null && getTimestamp(previous) == 
getTimestamp(mutation) &&
+((previous instanceof Put && mutation instanceof Put) 
||
+previous instanceof Delete && mutation 
instanceof Delete)) {
+iterator.remove();
+} else {
+previous = mutation;
+}
+}
+}
+}
+
 /**
- * indexRow is the set of all cells of all the row version of an index row 
from the index table. These are actual
- * cells. We group these cells based on timestamp and type (put vs 
delete), and form the actual set of
- * index mutations. indexKeyToMutationMap is a map from an index row key 
to a set of mutations that are generated
- * using the rebuild process (i.e., by replaying raw data table 
mutations). These sets are sets of expected
- * index mutations, one set for each index row key. Since not all 
mutations in the index table have both phase
- * (i.e., pre and post data phase) mutations, we cannot compare actual 
index mutations with expected one by one
- * and expect to find them identical. We need to consider concurrent data 
mutation effects, data table row write
- * failures, post index write failures. Thus, we need to allow some 
expected and actual mutations to be skipped
- * during comparing actual mutations to index mutations.
+ * There are two types of verification: without repair and with repair. 
Without-repair verification is done before
+ * or after index rebuild. It is done before index rebuild to identify the 
rows to be rebuilt. It is done after
+ * index rebuild to verify the rows that have been rebuilt. With-repair 
verification can be done anytime using
+ * the “-v ONLY” option to check the consistency of the index table.
+ *
+ * Unverified Rows
+ *
+ * For each mutable data table mutation during regular data table updates, 
two operations are done on the data table.
+ * One is to read the existing row state, and the second is to update the 
data table for this row. The processing of
+ * concurrent data mutations are serialized once for reading the existing 
row states, and then serialized again
+ * for updating the data table. In other words, they go through locking 
twice, i.e., [lock, read, unlock] and
+ * [lock, write, unlock]. Because of this two phase locking, for a pair of 
concurrent mutations 

[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398250816
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
 
 Review comment:
   thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067267#comment-17067267
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997767/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
14a17d8f9b23089bea4c2a910430a73c669bc0fb.
  ATTACHMENT ID: 12997767

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private long verifyIndexTable(String tableName, String indexName, 
Connection conn) throws Exception {
+// Now we rebuild the entire index table and expect that it is still 
good after the full rebuild
+long actualRowCountAfterCompaction = 
IndexScrutiny.scrutinizeIndex(conn, tableName, indexName);
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
++ (RAND.nextBoolean() ? null : 
(RAND.nextInt() % nIndexValues)) + ", "
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
+"UPSERT INTO " + tableName + " (k1, 
k2, b.v2, c.v3, d.v4) VALUES ("
++ (RAND.nextBoolean() ? null : 
RAND.nextInt()) + ", "

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.phoenix.index.VerifySingleIndexRowTest

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3660//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3660//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> Since expected index mutations are der

[jira] [Commented] (PHOENIX-5698) Phoenix Query with RVC IN list expression generates wrong scan with non-pk ordered pks

2020-03-25 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067268#comment-17067268
 ] 

Hudson commented on PHOENIX-5698:
-

FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #3660 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/3660/])
PHOENIX-5698 Phoenix Query with RVC IN list expression generates wrong 
(yanxinyi: rev 14a17d8f9b23089bea4c2a910430a73c669bc0fb)
* (edit) phoenix-core/src/main/java/org/apache/phoenix/iterate/ExplainTable.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/InListExpression.java
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/InListIT.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/expression/InListExpressionTest.java


> Phoenix Query with RVC IN list expression generates wrong scan with non-pk 
> ordered pks
> --
>
> Key: PHOENIX-5698
> URL: https://issues.apache.org/jira/browse/PHOENIX-5698
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.15.0, 4.14.3
>Reporter: Daniel Wong
>Assignee: Xinyi Yan
>Priority: Major
>  Labels: DESC
> Attachments: PHOENIX-5698-4.14-HBase-1.3.patch, 
> PHOENIX-5698-4.x-HBase-1.3.patch, PHOENIX-5698-4.x.patch, 
> PHOENIX-5698-4.x.v3.patch, PHOENIX-5698-4.x.v4.patch, 
> PHOENIX-5698-4.x.v5.patch, PHOENIX-5698-4.x.v6.patch, 
> PHOENIX-5698-master.v2.patch, PHOENIX-5698.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In the code below ideally we'd expect a SINGLE ROW DELETE plan client side. 
> However, this generates an incorrect scan with range ['tenant1
> 0CY005xx01Sv6o'). If the order of the RVCs is changed to row key order 
> Phoenix correctly generates a SINGLE ROW SCAN.  As we provide the full PK 
> this we expect a either tightly bounded range scan or a client side delete.  
> Instead we get a range scan on composite leading edge 
> TENANT_ID,KEY_PREFIX,ID1.
>  
> {code:java}
> @Test
>  public void testInListExpressionWithDescAgain() throws Exception {
>  String fullTableName = generateUniqueName();
>  String fullViewName = generateUniqueName();
>  String tenantView = generateUniqueName();
>  // create base table and global view using global connection
>  try (Connection conn = DriverManager.getConnection(getUrl()))
> { conn.setAutoCommit(true); Statement stmt = conn.createStatement(); 
> stmt.execute("CREATE TABLE " + fullTableName + "(\n" + " TENANT_ID CHAR(15) 
> NOT NULL,\n" + " KEY_PREFIX CHAR(3) NOT NULL,\n" + " CONSTRAINT PK PRIMARY 
> KEY (\n" + " TENANT_ID," + " KEY_PREFIX" + ")) MULTI_TENANT=TRUE"); 
> stmt.execute("CREATE VIEW " + fullViewName + "(\n" + " ID1 VARCHAR NOT 
> NULL,\n" + " ID2 VARCHAR NOT NULL,\n" + " EVENT_DATE DATE NOT NULL,\n" + " 
> CONSTRAINT PKVIEW PRIMARY KEY\n" + " (\n" + " ID1, ID2 DESC, EVENT_DATE 
> DESC\n" + ")) AS SELECT * FROM " + fullTableName + " WHERE KEY_PREFIX = 
> '0CY'"); }
> // create and use a tenant specific view to write data
>  try (Connection viewConn = DriverManager.getConnection(TENANT_SPECIFIC_URL1) 
> ) {
>  viewConn.setAutoCommit(true); //need autocommit for serverside deletion
>  Statement stmt = viewConn.createStatement();
>  stmt.execute("CREATE VIEW IF NOT EXISTS " + tenantView + " AS SELECT * FROM 
> " + fullViewName );
>  viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, 
> EVENT_DATE) VALUES ('005xx01Sv6o', '300', 153245823)");
>  viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, 
> EVENT_DATE) VALUES ('005xx01Sv6o', '400', 153245824)");
>  viewConn.createStatement().execute("UPSERT INTO " + tenantView + "(ID1, ID2, 
> EVENT_DATE) VALUES ('005xx01Sv6o', '500', 153245825)");
>  viewConn.commit();
> ResultSet rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + 
> tenantView );
>  printResultSet(rs);
> System.out.println("Delete Start");
> rs = stmt.executeQuery("EXPLAIN DELETE FROM " + tenantView + " WHERE (ID1, 
> EVENT_DATE, ID2) IN (('005xx01Sv6o', 153245824, 
> '400'),('005xx01Sv6o', 153245823, '300'))");
>  printResultSet(rs); // THIS SHOULD BE A SINGLE ROW SCAN
> stmt.execute("DELETE FROM " + tenantView + " WHERE (ID1, EVENT_DATE, ID2) IN 
> (('005xx01Sv6o', 153245824, '400'),('005xx01Sv6o', 
> 153245823, '300'))");
>  viewConn.commit();
>  System.out.println("Delete End");
> rs = stmt.executeQuery("SELECT ID1, ID2, EVENT_DATE FROM " + tenantView );
>  printResultSet(rs);
> }
>  }
> private void printResultSet(ResultSet rs) throws SQLException {
>  StringBuilder builder = new StringBuilder();
>  while(rs.next()) {
>  for(int i = 0; i < rs.getMetaData().getColumn

[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067373#comment-17067373
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997778/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
14a17d8f9b23089bea4c2a910430a73c669bc0fb.
  ATTACHMENT ID: 12997778

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private long verifyIndexTable(String tableName, String indexName, 
Connection conn) throws Exception {
+// Now we rebuild the entire index table and expect that it is still 
good after the full rebuild
+long actualRowCountAfterCompaction = 
IndexScrutiny.scrutinizeIndex(conn, tableName, indexName);
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
++ (RAND.nextBoolean() ? null : 
(RAND.nextInt() % nIndexValues)) + ", "
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
+"UPSERT INTO " + tableName + " (k1, 
k2, b.v2, c.v3, d.v4) VALUES ("
++ (RAND.nextBoolean() ? null : 
RAND.nextInt()) + ", "

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.ConcurrentMutationsExtendedIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3661//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3661//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> Sinc

[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398332277
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
 
 Review comment:
   @kadirozde , I am thinking whether it is possible to replay the actual 
mutations by ignoring the unverified mutations and compare them with the 
expected mutations. 
   
   If I understand correctly, there are only 4 types of mutations as a result 
of raw scan on index table.
   unverified put: it is introduced by failure on 2nd or 3rd phase data 
mutation, successful data row deletion or ongoing data mutation.  
   verified put: it is introduced by successful data put mutation or read repair
   deleteFamily: it is introduced by successful data mutation which leads to 
the change or remove of the index row key. T
   deleteFamilyVersion: it is introduced by the read repair.
   
   Since verified put has the full information of the index mutation, even it 
is generated by the read repair. So if we replay all these 3 kinds of mutations 
by the time order, we are expected to get the expected mutation list. 
   
   Do I misunderstand something here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398332277
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
 
 Review comment:
   @kadirozde , I am thinking whether it is possible to replay the actual 
mutations by ignoring the unverified mutations and compare them with the 
expected mutations. 
   
   If I understand correctly, there are only 4 types of mutations as a result 
of raw scan on index table.
   unverified put: it is introduced by failure on 2nd or 3rd phase data 
mutation, successful data row deletion or ongoing data mutation.  
   verified put: it is introduced by successful data put mutation or read repair
   deleteFamily: it is introduced by successful data mutation which leads to 
the change or remove of the index row key. 
   deleteFamilyVersion: it is introduced by the read repair.
   
   Since verified put has the full information of the index mutation, even it 
is generated by the read repair. So if we replay all these 3 kinds of mutations 
by the time order, we are expected to get the expected mutation list. 
   
   Do I misunderstand something here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (PHOENIX-5791) Eliminate false invalid row detection due to concurrent updates

2020-03-25 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17067388#comment-17067388
 ] 

Hadoop QA commented on PHOENIX-5791:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12997778/PHOENIX-5791.4.x-HBase-1.5.001.patch
  against 4.x-HBase-1.5 branch at commit 
14a17d8f9b23089bea4c2a910430a73c669bc0fb.
  ATTACHMENT ID: 12997778

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 25 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+private long verifyIndexTable(String tableName, String indexName, 
Connection conn) throws Exception {
+// Now we rebuild the entire index table and expect that it is still 
good after the full rebuild
+long actualRowCountAfterCompaction = 
IndexScrutiny.scrutinizeIndex(conn, tableName, indexName);
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
++ (RAND.nextBoolean() ? null : 
(RAND.nextInt() % nIndexValues)) + ", "
++ "(k1 INTEGER NOT NULL, k2 INTEGER NOT NULL, a.v1 INTEGER, 
b.v2 INTEGER, c.v3 INTEGER, d.v4 INTEGER," +
+conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v1) INCLUDE(v2, v3)");
+"UPSERT INTO " + tableName + " (k1, 
k2, b.v2, c.v3, d.v4) VALUES ("
++ (RAND.nextBoolean() ? null : 
RAND.nextInt()) + ", "

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3662//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/3662//console

This message is automatically generated.

> Eliminate false invalid row detection due to concurrent updates 
> 
>
> Key: PHOENIX-5791
> URL: https://issues.apache.org/jira/browse/PHOENIX-5791
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Kadir OZDEMIR
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-5791.4.x-HBase-1.5.001.patch
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> IndexTool verification generates an expected list of index mutations from the 
> data table rows and uses this list to check if index table rows are 
> consistent with the data table. To do that it follows the following steps:
>  # The data table rows are scanned with a raw scan. This raw scan is 
> configured to read all versions of rows. 
>  # For each scanned row, the cells that are scanned are grouped into two 
> sets: put and delete. The put set is the set of put cells and the delete set 
> is the set of delete cells.
>  # The put and delete sets for a given row are further grouped based on their 
> timestamps into put and delete mutations such that all the cells in a 
> mutation have the timestamp. 
>  # The put and delete mutations are then sorted within a single list. 
> Mutations in this list are sorted in ascending order of their timestamp. 
> The above process assumes that for each data table update, the index table 
> will be updated with the correct index row key. However, this assumption does 
> not hold in the presence of concurrent updates.
> From the consistent indexing design (PHOENIX-5156) perspective, two or more 
> pending updates from different batches on the same data row are concurrent if 
> and only if for all of these updates the data table row state is read from 
> HBase under the row lock and for none of them the row lock has been acquired 
> the second time for updating the data table. In other words, all of them are 
> in the first update phase concurrently. For concurrent updates, the first two 
> update phases are done but the last update phase is skipped. This means the 
> data table row will be updated by these updates but the corresponding index 
> table rows will be left with the unverified status. Then, the read repair 
> process will repair these unverified index rows during scans.
> Since expected index mutations are derived from the data table row after 
> these concurrent mutations are applied, 

[GitHub] [phoenix] priyankporwal commented on a change in pull request #735: PHOENIX-5734 - IndexScrutinyTool should not report rows beyond maxLoo…

2020-03-25 Thread GitBox
priyankporwal commented on a change in pull request #735: PHOENIX-5734 - 
IndexScrutinyTool should not report rows beyond maxLoo…
URL: https://github.com/apache/phoenix/pull/735#discussion_r398340808
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexScrutinyTableOutput.java
 ##
 @@ -68,14 +68,18 @@
 "SOURCE_TS BIGINT,\n" +
 "TARGET_TS BIGINT,\n" +
 "HAS_TARGET_ROW BOOLEAN,\n" +
+"BEYOND_MAX_LOOKBACK BOOLEAN,\n" +
 "CONSTRAINT PK PRIMARY KEY\n" +
 "(\n" +
 "" + SOURCE_TABLE_COL_NAME + ",\n" +
 "" + TARGET_TABLE_COL_NAME + ",\n" +
 "" + SCRUTINY_EXECUTE_TIME_COL_NAME + ",\n" + // time at 
which the scrutiny ran
 "SOURCE_ROW_PK_HASH\n" + //  this hash makes the PK unique
 ")\n" + // dynamic columns consisting of the source and target 
columns will follow
-")";
+")  COLUMN_ENCODED_BYTES = 0 "; //column encoding not supported 
with dyn columns (PHOENIX-5107)
 
 Review comment:
   Do we need to do anything to disable COLUMN_ENCODING for clusters with this 
table already existing?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] priyankporwal commented on a change in pull request #735: PHOENIX-5734 - IndexScrutinyTool should not report rows beyond maxLoo…

2020-03-25 Thread GitBox
priyankporwal commented on a change in pull request #735: PHOENIX-5734 - 
IndexScrutinyTool should not report rows beyond maxLoo…
URL: https://github.com/apache/phoenix/pull/735#discussion_r398340808
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexScrutinyTableOutput.java
 ##
 @@ -68,14 +68,18 @@
 "SOURCE_TS BIGINT,\n" +
 "TARGET_TS BIGINT,\n" +
 "HAS_TARGET_ROW BOOLEAN,\n" +
+"BEYOND_MAX_LOOKBACK BOOLEAN,\n" +
 "CONSTRAINT PK PRIMARY KEY\n" +
 "(\n" +
 "" + SOURCE_TABLE_COL_NAME + ",\n" +
 "" + TARGET_TABLE_COL_NAME + ",\n" +
 "" + SCRUTINY_EXECUTE_TIME_COL_NAME + ",\n" + // time at 
which the scrutiny ran
 "SOURCE_ROW_PK_HASH\n" + //  this hash makes the PK unique
 ")\n" + // dynamic columns consisting of the source and target 
columns will follow
-")";
+")  COLUMN_ENCODED_BYTES = 0 "; //column encoding not supported 
with dyn columns (PHOENIX-5107)
 
 Review comment:
   Do we need to do anything to disable COLUMN_ENCODIG for clusters with this 
table already existing?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398344146
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -613,39 +605,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 }
 return getMutationsWithSameTS(put, del);
 }
+/**
+ * In this method, the actual list is repaired in memory using the 
expected list which is actually the output of
+ * rebuilding the index table row. The result of this repair is used only 
for verification.
+ */
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// Find the first (latest) actual unverified put mutation
+for (Mutation actual : actualMutationList) {
+if (actual instanceof Put && !isVerified((Put) actual)) {
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) 
<= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+continue;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = 
expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
 
 Review comment:
   will it be a problem to modify the list when iterating it?
   
https://stackoverflow.com/questions/6866238/concurrent-modification-exception-adding-to-an-arraylist
   https://www.baeldung.com/java-concurrentmodificationexception


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398344146
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -613,39 +605,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 }
 return getMutationsWithSameTS(put, del);
 }
+/**
+ * In this method, the actual list is repaired in memory using the 
expected list which is actually the output of
+ * rebuilding the index table row. The result of this repair is used only 
for verification.
+ */
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// Find the first (latest) actual unverified put mutation
+for (Mutation actual : actualMutationList) {
+if (actual instanceof Put && !isVerified((Put) actual)) {
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) 
<= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+continue;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = 
expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
 
 Review comment:
   will it be a problem to modify the list when iterating it? I am not an 
expert on Java, just a guess here.
   
https://stackoverflow.com/questions/6866238/concurrent-modification-exception-adding-to-an-arraylist
   https://www.baeldung.com/java-concurrentmodificationexception


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] wangweiming800 commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
wangweiming800 commented on a change in pull request #741: PHOENIX-5791 
Eliminate false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398344146
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -613,39 +605,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 }
 return getMutationsWithSameTS(put, del);
 }
+/**
+ * In this method, the actual list is repaired in memory using the 
expected list which is actually the output of
+ * rebuilding the index table row. The result of this repair is used only 
for verification.
+ */
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// Find the first (latest) actual unverified put mutation
+for (Mutation actual : actualMutationList) {
+if (actual instanceof Put && !isVerified((Put) actual)) {
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) 
<= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+continue;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = 
expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
 
 Review comment:
   @kadirozde, will it be a problem to modify the list when iterating it? I am 
not an expert on Java, just a guess here.
   
https://stackoverflow.com/questions/6866238/concurrent-modification-exception-adding-to-an-arraylist
   https://www.baeldung.com/java-concurrentmodificationexception


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398346481
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -614,38 +606,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 return getMutationsWithSameTS(put, del);
 }
 
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// find the first (latest) actual unverified put mutation
+Mutation actual = null;
+for (Mutation mutation : actualMutationList) {
+if (mutation instanceof Put && !isVerified((Put) mutation)) {
+actual = mutation;
+break;
+}
+}
+if (actual == null) {
+return;
+}
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) <= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+return;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
+}
+Collections.sort(actualMutationList, MUTATION_TS_DESC_COMPARATOR);
+}
+
+private void cleanUpActualMutationList(List actualMutationList)
 
 Review comment:
   @wangweiming800, your understanding of mutation types is correct. However, I 
have no idea what replaying an index mutation means. Let's discuss this offline.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [phoenix] kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate false invalid row detection due to concurrent …

2020-03-25 Thread GitBox
kadirozde commented on a change in pull request #741: PHOENIX-5791 Eliminate 
false invalid row detection due to concurrent …
URL: https://github.com/apache/phoenix/pull/741#discussion_r398349866
 
 

 ##
 File path: 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/IndexRebuildRegionScanner.java
 ##
 @@ -613,39 +605,132 @@ private boolean isDeleteFamilyVersion(Mutation 
mutation) {
 }
 return getMutationsWithSameTS(put, del);
 }
+/**
+ * In this method, the actual list is repaired in memory using the 
expected list which is actually the output of
+ * rebuilding the index table row. The result of this repair is used only 
for verification.
+ */
+private void repairActualMutationList(List actualMutationList, 
List expectedMutationList)
+throws IOException {
+// Find the first (latest) actual unverified put mutation
+for (Mutation actual : actualMutationList) {
+if (actual instanceof Put && !isVerified((Put) actual)) {
+long ts = getTimestamp(actual);
+int expectedIndex;
+int expectedListSize = expectedMutationList.size();
+for (expectedIndex = 0; expectedIndex < expectedListSize; 
expectedIndex++) {
+if (getTimestamp(expectedMutationList.get(expectedIndex)) 
<= ts) {
+if (expectedIndex > 0) {
+expectedIndex--;
+}
+break;
+}
+}
+if (expectedIndex == expectedListSize) {
+continue;
+}
+for (; expectedIndex < expectedListSize; expectedIndex++) {
+Mutation mutation = 
expectedMutationList.get(expectedIndex);
+if (mutation instanceof Put) {
+mutation = new Put((Put) mutation);
+} else {
+mutation = new Delete((Delete) mutation);
+}
+actualMutationList.add(mutation);
 
 Review comment:
   Yes, this should be a problem. Good catch! I will fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services