[jira] [Commented] (PHOENIX-4247) Phoenix/Spark/ZK connection

2017-10-23 Thread Jepson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216370#comment-16216370
 ] 

Jepson commented on PHOENIX-4247:
-

I has also this problem.
The zookeeper connections can't be auto closed.


> Phoenix/Spark/ZK connection
> ---
>
> Key: PHOENIX-4247
> URL: https://issues.apache.org/jira/browse/PHOENIX-4247
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
> Environment: HBase 1.2 
> Spark 1.6 
> Phoenix 4.10 
>Reporter: Kumar Palaniappan
>
> After upgrading to CDH 5.9.1/Phoenix 4.10/Spark 1.6 from CDH 5.5.2/Phoenix 
> 4.6/Spark 1.5, streaming jobs that read data from Phoenix no longer release 
> their zookeeper connections, meaning that the number of connections from the 
> driver grow with each batch until the ZooKeeper limit on connections per IP 
> address is reached, at which point the Spark streaming job can no longer read 
> data from Phoenix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-727) UPSERT SELECT does not work with JOIN

2017-10-23 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216335#comment-16216335
 ] 

Sergey Soldatov commented on PHOENIX-727:
-

commit be8c2fb500b017f4b8300f8afd2152aea8a70905
Author: maryannxue 
Date:   Tue Jan 28 19:22:32 2014 -0500

Fix UPSERT SELECT does not work with JOIN (github #596)


> UPSERT SELECT does not work with JOIN
> -
>
> Key: PHOENIX-727
> URL: https://issues.apache.org/jira/browse/PHOENIX-727
> Project: Phoenix
>  Issue Type: Task
>Affects Versions: 3.0-Release
>Reporter: Nitin Kumar
>Assignee: Maryann Xue
>
> Issue:-
> Upsert select didn't work with join . :-(
> UPSERT INTO ATTRIBUTE(ENTITY,ATTRIBUTE) select EM.entity,EM.EVCOL_NO_OF_PARTS 
> from EVENT_MANAGEMENT EM(EVCOL_NO_OF_PARTS INTEGER,EVCOL_STANDARD_COST 
> FLOAT,EVCOL_INVENTORY INTEGER,EVCOL_TYPE VARCHAR) INNER JOIN TEMPENTITY MA ON 
> EM.EVENTTIME = MA.EVENTTIME AND EM.ENTITY = MA.ENTITY;
> Error: Joins not supported (state=,code=0)
> But select worked.
>  select EM.entity,EM.EVCOL_NO_OF_PARTS from EVENT_MANAGEMENT 
> EM(EVCOL_NO_OF_PARTS INTEGER,EVCOL_STANDARD_COST FLOAT,EVCOL_INVENTORY 
> INTEGER,EVCOL_TYPE VARCHAR) INNER JOIN TEMPENTITY MA ON EM.EVENTTIME = 
> MA.EVENTTIME AND EM.ENTITY = MA.ENTITY;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-2187) Creating a front-end application for Phoenix Tracing Web App

2017-10-23 Thread Nishani (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishani  updated PHOENIX-2187:
--
Summary: Creating a front-end application for Phoenix Tracing Web App  
(was: Creating the front-end application for Phoenix Tracing Web App)

> Creating a front-end application for Phoenix Tracing Web App
> 
>
> Key: PHOENIX-2187
> URL: https://issues.apache.org/jira/browse/PHOENIX-2187
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Nishani 
>Assignee: Nishani 
> Fix For: 4.6.0
>
> Attachments: 2187-master.patch
>
>
> This will include the following tracing visualization features.
> List - lists the traces with their attributes
> Trace Count - Chart view over the trace description
> Dependency Tree - tree view of  trace ids
> Timeline - timeline of trace ids
> Trace Distribution - Distribution chart of hosts of traces



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-727) UPSERT SELECT does not work with JOIN

2017-10-23 Thread Zhuofu Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216222#comment-16216222
 ] 

Zhuofu Chen commented on PHOENIX-727:
-

May I know where is the patch?

> UPSERT SELECT does not work with JOIN
> -
>
> Key: PHOENIX-727
> URL: https://issues.apache.org/jira/browse/PHOENIX-727
> Project: Phoenix
>  Issue Type: Task
>Affects Versions: 3.0-Release
>Reporter: Nitin Kumar
>Assignee: Maryann Xue
>
> Issue:-
> Upsert select didn't work with join . :-(
> UPSERT INTO ATTRIBUTE(ENTITY,ATTRIBUTE) select EM.entity,EM.EVCOL_NO_OF_PARTS 
> from EVENT_MANAGEMENT EM(EVCOL_NO_OF_PARTS INTEGER,EVCOL_STANDARD_COST 
> FLOAT,EVCOL_INVENTORY INTEGER,EVCOL_TYPE VARCHAR) INNER JOIN TEMPENTITY MA ON 
> EM.EVENTTIME = MA.EVENTTIME AND EM.ENTITY = MA.ENTITY;
> Error: Joins not supported (state=,code=0)
> But select worked.
>  select EM.entity,EM.EVCOL_NO_OF_PARTS from EVENT_MANAGEMENT 
> EM(EVCOL_NO_OF_PARTS INTEGER,EVCOL_STANDARD_COST FLOAT,EVCOL_INVENTORY 
> INTEGER,EVCOL_TYPE VARCHAR) INNER JOIN TEMPENTITY MA ON EM.EVENTTIME = 
> MA.EVENTTIME AND EM.ENTITY = MA.ENTITY;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-3999) Optimize inner joins as SKIP-SCAN-JOIN when possible

2017-10-23 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216078#comment-16216078
 ] 

Maryann Xue commented on PHOENIX-3999:
--

[~jamestaylor], it would still be a hash join but executed on client side 
instead (nested loop is sure to be more costly). In theory the difference 
between server-side and client-side hash join is the former is (should be) done 
in parallel but has extra cost of broadcasting the RHS over the network to all 
servers. I guess if the transfer of RHS gets too costly that it would undo the 
benefit of executing join in parallel, it's probably not a good idea to do a 
hash join anyway. 

> Optimize inner joins as SKIP-SCAN-JOIN when possible
> 
>
> Key: PHOENIX-3999
> URL: https://issues.apache.org/jira/browse/PHOENIX-3999
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Semi joins on the leading part of the primary key end up doing batches of 
> point queries (as opposed to a broadcast hash join), however inner joins do 
> not.
> Here's a set of example schemas that executes a skip scan on the inner query:
> {code}
> CREATE TABLE COMPLETED_BATCHES (
> BATCH_SEQUENCE_NUM BIGINT NOT NULL,
> BATCH_ID   BIGINT NOT NULL,
> CONSTRAINT PK PRIMARY KEY
> (
> BATCH_SEQUENCE_NUM,
> BATCH_ID
> )
> );
> CREATE TABLE ITEMS (
>BATCH_ID BIGINT NOT NULL,
>ITEM_ID BIGINT NOT NULL,
>ITEM_TYPE BIGINT,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
> BATCH_ID,
> ITEM_ID
>)
> );
> CREATE TABLE COMPLETED_ITEMS (
>ITEM_TYPE  BIGINT NOT NULL,
>BATCH_SEQUENCE_NUM BIGINT NOT NULL,
>ITEM_IDBIGINT NOT NULL,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
>   ITEM_TYPE,
>   BATCH_SEQUENCE_NUM,  
>   ITEM_ID
>)
> );
> {code}
> The explain plan of these indicate that a dynamic filter will be performed 
> like this:
> {code}
> UPSERT SELECT
> CLIENT PARALLEL 1-WAY FULL SCAN OVER ITEMS
> SKIP-SCAN-JOIN TABLE 0
> CLIENT PARALLEL 1-WAY RANGE SCAN OVER COMPLETED_BATCHES [1] - [2]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY [BATCH_ID]
> CLIENT MERGE SORT
> DYNAMIC SERVER FILTER BY I.BATCH_ID IN ($8.$9)
> {code}
> We should also be able to leverage this optimization when an inner join is 
> used such as this:
> {code}
> UPSERT INTO COMPLETED_ITEMS (ITEM_TYPE, BATCH_SEQUENCE_NUM, ITEM_ID, 
> ITEM_VALUE)
>SELECT i.ITEM_TYPE, b.BATCH_SEQUENCE_NUM, i.ITEM_ID, i.ITEM_VALUE   
>FROM  ITEMS i, COMPLETED_BATCHES b
>WHERE b.BATCH_ID = i.BATCH_ID AND  
>b.BATCH_SEQUENCE_NUM > 1000 AND b.BATCH_SEQUENCE_NUM < 2000;
> {code}
> A complete unit test looks like this:
> {code}
> @Test
> public void testNestedLoopJoin() throws Exception {
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> String t1="COMPLETED_BATCHES";
> String ddl1 = "CREATE TABLE " + t1 + " (\n" + 
> "BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "BATCH_ID   BIGINT NOT NULL,\n" + 
> "CONSTRAINT PK PRIMARY KEY\n" + 
> "(\n" + 
> "BATCH_SEQUENCE_NUM,\n" + 
> "BATCH_ID\n" + 
> ")\n" + 
> ")" + 
> "";
> conn.createStatement().execute(ddl1);
> 
> String t2="ITEMS";
> String ddl2 = "CREATE TABLE " + t2 + " (\n" + 
> "   BATCH_ID BIGINT NOT NULL,\n" + 
> "   ITEM_ID BIGINT NOT NULL,\n" + 
> "   ITEM_TYPE BIGINT,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "BATCH_ID,\n" + 
> "ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl2);
> String t3="COMPLETED_ITEMS";
> String ddl3 = "CREATE TABLE " + t3 + "(\n" + 
> "   ITEM_TYPE  BIGINT NOT NULL,\n" + 
> "   BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "   ITEM_IDBIGINT NOT NULL,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "  ITEM_TYPE,\n" + 
> "  BATCH_SEQUENCE_NUM,  \n" + 
> "  ITEM_ID\n" + 
> "   )\

[jira] [Created] (PHOENIX-4316) Local Index - Splitting a local index on multi-tenant view fails with TNF exception

2017-10-23 Thread Mujtaba Chohan (JIRA)
Mujtaba Chohan created PHOENIX-4316:
---

 Summary: Local Index - Splitting a local index on multi-tenant 
view fails with TNF exception
 Key: PHOENIX-4316
 URL: https://issues.apache.org/jira/browse/PHOENIX-4316
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.12.0
Reporter: Mujtaba Chohan


In the following logs TM is the base multi-tenant table and TV is the tenant 
specific view. Local index is created on tenant specific view. RS aborts when 
table is split.

{noformat}
2017-10-23 16:25:42,263 ERROR 
[localhost,34512,1508783072608-daughterOpener=420df77ad7317fcf213772974498e192] 
regionserver.HRegion: Could not initialize all stores for the 
region=TM,X\x001024163863277142772527737277482810122859132922143041163750200364,1508801142038.420df77ad7317fcf213772974498e192.
2017-10-23 16:25:42,275 INFO  
[localhost,34512,1508783072608-daughterOpener=420df77ad7317fcf213772974498e192] 
regionserver.HStore: Closed 0
2017-10-23 16:25:42,275 ERROR 
[localhost,34512,1508783072608-daughterOpener=c753e6d674dd5c797ea6cf23941ce9f3] 
regionserver.HRegion: Could not initialize all stores for the 
region=TM,,1508801142038.c753e6d674dd5c797ea6cf23941ce9f3.
2017-10-23 16:25:42,286 INFO  
[localhost,34512,1508783072608-daughterOpener=c753e6d674dd5c797ea6cf23941ce9f3] 
regionserver.HStore: Closed 0
2017-10-23 16:25:42,286 INFO  [RS:0;localhost:34512-splits-1508783666402] 
regionserver.SplitRequest: Running rollback/cleanup of failed split of 
TM,,1508799266515.f3c6ebcb4e605e0b5c2098633967d73e.; Failed 
localhost,34512,1508783072608-daughterOpener=c753e6d674dd5c797ea6cf23941ce9f3
java.io.IOException: Failed 
localhost,34512,1508783072608-daughterOpener=c753e6d674dd5c797ea6cf23941ce9f3
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughters(SplitTransactionImpl.java:499)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsAfterPONR(SplitTransactionImpl.java:597)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:580)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:154)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.io.IOException: java.io.IOException: 
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
undefined. tableName=TV
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:952)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:827)
at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:802)
at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6708)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.openDaughterRegion(SplitTransactionImpl.java:731)
at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl$DaughterOpener.run(SplitTransactionImpl.java:711)
... 1 more
Caused by: java.io.IOException: java.io.IOException: 
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
undefined. tableName=TV
at 
org.apache.hadoop.hbase.regionserver.HStore.openStoreFiles(HStore.java:560)
at 
org.apache.hadoop.hbase.regionserver.HStore.loadStoreFiles(HStore.java:514)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:277)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5185)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:926)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:923)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: java.io.IOException: 
org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
undefined. tableName=TV
at 
org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReaderGenerator.preStoreFileReaderOpen(IndexHalfStoreFileReaderGenerator.java:174)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$64.call(RegionCoprocessorHost.java:1580)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673)
at 

[jira] [Commented] (PHOENIX-3757) System mutex table not being created in SYSTEM namespace when namespace mapping is enabled

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215918#comment-16215918
 ] 

ASF GitHub Bot commented on PHOENIX-3757:
-

Github user karanmehta93 commented on the issue:

https://github.com/apache/phoenix/pull/277
  
@twdsilva Please review.
@aertoria 


> System mutex table not being created in SYSTEM namespace when namespace 
> mapping is enabled
> --
>
> Key: PHOENIX-3757
> URL: https://issues.apache.org/jira/browse/PHOENIX-3757
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Karan Mehta
>Priority: Critical
>  Labels: namespaces
> Fix For: 4.13.0
>
> Attachments: PHOENIX-3757.001.patch, PHOENIX-3757.002.patch
>
>
> Noticed this issue while writing a test for PHOENIX-3756:
> The SYSTEM.MUTEX table is always created in the default namespace, even when 
> {{phoenix.schema.isNamespaceMappingEnabled=true}}. At a glance, it looks like 
> the logic for the other system tables isn't applied to the mutex table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] phoenix issue #277: PHOENIX-3757 System mutex table not being created in SYS...

2017-10-23 Thread karanmehta93
Github user karanmehta93 commented on the issue:

https://github.com/apache/phoenix/pull/277
  
@twdsilva Please review.
@aertoria 


---


[jira] [Commented] (PHOENIX-3757) System mutex table not being created in SYSTEM namespace when namespace mapping is enabled

2017-10-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215917#comment-16215917
 ] 

ASF GitHub Bot commented on PHOENIX-3757:
-

GitHub user karanmehta93 opened a pull request:

https://github.com/apache/phoenix/pull/277

PHOENIX-3757 System mutex table not being created in SYSTEM namespace…

… when namespace mapping is enabled
On fresh cluster with system table namespace mapping enabled, SYSMUTEX will 
be created in SYSTEM namespace directly.
It will be migrated to SYSTEM namespace if that property is enabled
The migrations also require locking in SYSMUTEX table since it disables and 
enables SYSCAT.
SYSCAT upgrades will acquire lock in appropriate SYSMUTEX table based on 
the client properties.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/karanmehta93/phoenix PHOENIX-3757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #277


commit 58c70f6ea46b76ed6474121b23e9756b7c5d9912
Author: Karan Mehta 
Date:   2017-10-23T21:44:21Z

PHOENIX-3757 System mutex table not being created in SYSTEM namespace when 
namespace mapping is enabled




> System mutex table not being created in SYSTEM namespace when namespace 
> mapping is enabled
> --
>
> Key: PHOENIX-3757
> URL: https://issues.apache.org/jira/browse/PHOENIX-3757
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Karan Mehta
>Priority: Critical
>  Labels: namespaces
> Fix For: 4.13.0
>
> Attachments: PHOENIX-3757.001.patch, PHOENIX-3757.002.patch
>
>
> Noticed this issue while writing a test for PHOENIX-3756:
> The SYSTEM.MUTEX table is always created in the default namespace, even when 
> {{phoenix.schema.isNamespaceMappingEnabled=true}}. At a glance, it looks like 
> the logic for the other system tables isn't applied to the mutex table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] phoenix pull request #277: PHOENIX-3757 System mutex table not being created...

2017-10-23 Thread karanmehta93
GitHub user karanmehta93 opened a pull request:

https://github.com/apache/phoenix/pull/277

PHOENIX-3757 System mutex table not being created in SYSTEM namespace…

… when namespace mapping is enabled
On fresh cluster with system table namespace mapping enabled, SYSMUTEX will 
be created in SYSTEM namespace directly.
It will be migrated to SYSTEM namespace if that property is enabled
The migrations also require locking in SYSMUTEX table since it disables and 
enables SYSCAT.
SYSCAT upgrades will acquire lock in appropriate SYSMUTEX table based on 
the client properties.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/karanmehta93/phoenix PHOENIX-3757

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #277


commit 58c70f6ea46b76ed6474121b23e9756b7c5d9912
Author: Karan Mehta 
Date:   2017-10-23T21:44:21Z

PHOENIX-3757 System mutex table not being created in SYSTEM namespace when 
namespace mapping is enabled




---


[jira] [Commented] (PHOENIX-3999) Optimize inner joins as SKIP-SCAN-JOIN when possible

2017-10-23 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215840#comment-16215840
 ] 

Ethan Wang commented on PHOENIX-3999:
-

(Maybe  irrelevant) Is time out enforced for a query as a whole?  If we only 
care the timeout for each rpc call for each parallel scans, should it be more 
efficient  to leave it to parallescanner to deicide how to chunk the scans.

> Optimize inner joins as SKIP-SCAN-JOIN when possible
> 
>
> Key: PHOENIX-3999
> URL: https://issues.apache.org/jira/browse/PHOENIX-3999
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Semi joins on the leading part of the primary key end up doing batches of 
> point queries (as opposed to a broadcast hash join), however inner joins do 
> not.
> Here's a set of example schemas that executes a skip scan on the inner query:
> {code}
> CREATE TABLE COMPLETED_BATCHES (
> BATCH_SEQUENCE_NUM BIGINT NOT NULL,
> BATCH_ID   BIGINT NOT NULL,
> CONSTRAINT PK PRIMARY KEY
> (
> BATCH_SEQUENCE_NUM,
> BATCH_ID
> )
> );
> CREATE TABLE ITEMS (
>BATCH_ID BIGINT NOT NULL,
>ITEM_ID BIGINT NOT NULL,
>ITEM_TYPE BIGINT,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
> BATCH_ID,
> ITEM_ID
>)
> );
> CREATE TABLE COMPLETED_ITEMS (
>ITEM_TYPE  BIGINT NOT NULL,
>BATCH_SEQUENCE_NUM BIGINT NOT NULL,
>ITEM_IDBIGINT NOT NULL,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
>   ITEM_TYPE,
>   BATCH_SEQUENCE_NUM,  
>   ITEM_ID
>)
> );
> {code}
> The explain plan of these indicate that a dynamic filter will be performed 
> like this:
> {code}
> UPSERT SELECT
> CLIENT PARALLEL 1-WAY FULL SCAN OVER ITEMS
> SKIP-SCAN-JOIN TABLE 0
> CLIENT PARALLEL 1-WAY RANGE SCAN OVER COMPLETED_BATCHES [1] - [2]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY [BATCH_ID]
> CLIENT MERGE SORT
> DYNAMIC SERVER FILTER BY I.BATCH_ID IN ($8.$9)
> {code}
> We should also be able to leverage this optimization when an inner join is 
> used such as this:
> {code}
> UPSERT INTO COMPLETED_ITEMS (ITEM_TYPE, BATCH_SEQUENCE_NUM, ITEM_ID, 
> ITEM_VALUE)
>SELECT i.ITEM_TYPE, b.BATCH_SEQUENCE_NUM, i.ITEM_ID, i.ITEM_VALUE   
>FROM  ITEMS i, COMPLETED_BATCHES b
>WHERE b.BATCH_ID = i.BATCH_ID AND  
>b.BATCH_SEQUENCE_NUM > 1000 AND b.BATCH_SEQUENCE_NUM < 2000;
> {code}
> A complete unit test looks like this:
> {code}
> @Test
> public void testNestedLoopJoin() throws Exception {
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> String t1="COMPLETED_BATCHES";
> String ddl1 = "CREATE TABLE " + t1 + " (\n" + 
> "BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "BATCH_ID   BIGINT NOT NULL,\n" + 
> "CONSTRAINT PK PRIMARY KEY\n" + 
> "(\n" + 
> "BATCH_SEQUENCE_NUM,\n" + 
> "BATCH_ID\n" + 
> ")\n" + 
> ")" + 
> "";
> conn.createStatement().execute(ddl1);
> 
> String t2="ITEMS";
> String ddl2 = "CREATE TABLE " + t2 + " (\n" + 
> "   BATCH_ID BIGINT NOT NULL,\n" + 
> "   ITEM_ID BIGINT NOT NULL,\n" + 
> "   ITEM_TYPE BIGINT,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "BATCH_ID,\n" + 
> "ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl2);
> String t3="COMPLETED_ITEMS";
> String ddl3 = "CREATE TABLE " + t3 + "(\n" + 
> "   ITEM_TYPE  BIGINT NOT NULL,\n" + 
> "   BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "   ITEM_IDBIGINT NOT NULL,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "  ITEM_TYPE,\n" + 
> "  BATCH_SEQUENCE_NUM,  \n" + 
> "  ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl3);
> conn.createStatement().execute("UPSERT INTO 
> "+t1+"(BATCH_SEQUENCE_NUM, batch_id) VALUES (1,2)");
> conn.createStatement().execute("UPSERT INTO 

[jira] [Commented] (PHOENIX-3999) Optimize inner joins as SKIP-SCAN-JOIN when possible

2017-10-23 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215796#comment-16215796
 ] 

James Taylor commented on PHOENIX-3999:
---

bq. This new approach has BATCH_SEQUENCE_NUM cache as variables in Java so no 
join operation is needed now
Right - so this might be a general, viable solution - chunk it up and execute 
the join on the client instead of doing a broadcast join. Is it like a nested 
loop join? Not sure under what conditions this could be done or when it'd be 
more efficient.

> Optimize inner joins as SKIP-SCAN-JOIN when possible
> 
>
> Key: PHOENIX-3999
> URL: https://issues.apache.org/jira/browse/PHOENIX-3999
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Semi joins on the leading part of the primary key end up doing batches of 
> point queries (as opposed to a broadcast hash join), however inner joins do 
> not.
> Here's a set of example schemas that executes a skip scan on the inner query:
> {code}
> CREATE TABLE COMPLETED_BATCHES (
> BATCH_SEQUENCE_NUM BIGINT NOT NULL,
> BATCH_ID   BIGINT NOT NULL,
> CONSTRAINT PK PRIMARY KEY
> (
> BATCH_SEQUENCE_NUM,
> BATCH_ID
> )
> );
> CREATE TABLE ITEMS (
>BATCH_ID BIGINT NOT NULL,
>ITEM_ID BIGINT NOT NULL,
>ITEM_TYPE BIGINT,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
> BATCH_ID,
> ITEM_ID
>)
> );
> CREATE TABLE COMPLETED_ITEMS (
>ITEM_TYPE  BIGINT NOT NULL,
>BATCH_SEQUENCE_NUM BIGINT NOT NULL,
>ITEM_IDBIGINT NOT NULL,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
>   ITEM_TYPE,
>   BATCH_SEQUENCE_NUM,  
>   ITEM_ID
>)
> );
> {code}
> The explain plan of these indicate that a dynamic filter will be performed 
> like this:
> {code}
> UPSERT SELECT
> CLIENT PARALLEL 1-WAY FULL SCAN OVER ITEMS
> SKIP-SCAN-JOIN TABLE 0
> CLIENT PARALLEL 1-WAY RANGE SCAN OVER COMPLETED_BATCHES [1] - [2]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY [BATCH_ID]
> CLIENT MERGE SORT
> DYNAMIC SERVER FILTER BY I.BATCH_ID IN ($8.$9)
> {code}
> We should also be able to leverage this optimization when an inner join is 
> used such as this:
> {code}
> UPSERT INTO COMPLETED_ITEMS (ITEM_TYPE, BATCH_SEQUENCE_NUM, ITEM_ID, 
> ITEM_VALUE)
>SELECT i.ITEM_TYPE, b.BATCH_SEQUENCE_NUM, i.ITEM_ID, i.ITEM_VALUE   
>FROM  ITEMS i, COMPLETED_BATCHES b
>WHERE b.BATCH_ID = i.BATCH_ID AND  
>b.BATCH_SEQUENCE_NUM > 1000 AND b.BATCH_SEQUENCE_NUM < 2000;
> {code}
> A complete unit test looks like this:
> {code}
> @Test
> public void testNestedLoopJoin() throws Exception {
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> String t1="COMPLETED_BATCHES";
> String ddl1 = "CREATE TABLE " + t1 + " (\n" + 
> "BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "BATCH_ID   BIGINT NOT NULL,\n" + 
> "CONSTRAINT PK PRIMARY KEY\n" + 
> "(\n" + 
> "BATCH_SEQUENCE_NUM,\n" + 
> "BATCH_ID\n" + 
> ")\n" + 
> ")" + 
> "";
> conn.createStatement().execute(ddl1);
> 
> String t2="ITEMS";
> String ddl2 = "CREATE TABLE " + t2 + " (\n" + 
> "   BATCH_ID BIGINT NOT NULL,\n" + 
> "   ITEM_ID BIGINT NOT NULL,\n" + 
> "   ITEM_TYPE BIGINT,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "BATCH_ID,\n" + 
> "ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl2);
> String t3="COMPLETED_ITEMS";
> String ddl3 = "CREATE TABLE " + t3 + "(\n" + 
> "   ITEM_TYPE  BIGINT NOT NULL,\n" + 
> "   BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "   ITEM_IDBIGINT NOT NULL,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "  ITEM_TYPE,\n" + 
> "  BATCH_SEQUENCE_NUM,  \n" + 
> "  ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl3);
> conn.createStatement().exec

[jira] [Commented] (PHOENIX-3999) Optimize inner joins as SKIP-SCAN-JOIN when possible

2017-10-23 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215785#comment-16215785
 ] 

Maryann Xue commented on PHOENIX-3999:
--

I think the purpose of the above client logic is, as [~wdumaresq] said, to 
avoid timeout. The whole query is divided and executed in small ranges. Just to 
confirm though, would the first query that selects from COMPLETED_BATCHES also 
timeout if not executed by ranges?
[~jamestaylor], "IN" queries like in Will's case have already been optimized. 
The only thing that triggered the broadcast join operation in the original 
query is "BATCH_SEQUENCE_NUM", which is not available from ITEMS table and has 
to be joined from COMPLETED_BATCHES. This new approach has BATCH_SEQUENCE_NUM 
cache as variables in Java so no join operation is needed now.

> Optimize inner joins as SKIP-SCAN-JOIN when possible
> 
>
> Key: PHOENIX-3999
> URL: https://issues.apache.org/jira/browse/PHOENIX-3999
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Semi joins on the leading part of the primary key end up doing batches of 
> point queries (as opposed to a broadcast hash join), however inner joins do 
> not.
> Here's a set of example schemas that executes a skip scan on the inner query:
> {code}
> CREATE TABLE COMPLETED_BATCHES (
> BATCH_SEQUENCE_NUM BIGINT NOT NULL,
> BATCH_ID   BIGINT NOT NULL,
> CONSTRAINT PK PRIMARY KEY
> (
> BATCH_SEQUENCE_NUM,
> BATCH_ID
> )
> );
> CREATE TABLE ITEMS (
>BATCH_ID BIGINT NOT NULL,
>ITEM_ID BIGINT NOT NULL,
>ITEM_TYPE BIGINT,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
> BATCH_ID,
> ITEM_ID
>)
> );
> CREATE TABLE COMPLETED_ITEMS (
>ITEM_TYPE  BIGINT NOT NULL,
>BATCH_SEQUENCE_NUM BIGINT NOT NULL,
>ITEM_IDBIGINT NOT NULL,
>ITEM_VALUE VARCHAR,
>CONSTRAINT PK PRIMARY KEY
>(
>   ITEM_TYPE,
>   BATCH_SEQUENCE_NUM,  
>   ITEM_ID
>)
> );
> {code}
> The explain plan of these indicate that a dynamic filter will be performed 
> like this:
> {code}
> UPSERT SELECT
> CLIENT PARALLEL 1-WAY FULL SCAN OVER ITEMS
> SKIP-SCAN-JOIN TABLE 0
> CLIENT PARALLEL 1-WAY RANGE SCAN OVER COMPLETED_BATCHES [1] - [2]
> SERVER FILTER BY FIRST KEY ONLY
> SERVER AGGREGATE INTO DISTINCT ROWS BY [BATCH_ID]
> CLIENT MERGE SORT
> DYNAMIC SERVER FILTER BY I.BATCH_ID IN ($8.$9)
> {code}
> We should also be able to leverage this optimization when an inner join is 
> used such as this:
> {code}
> UPSERT INTO COMPLETED_ITEMS (ITEM_TYPE, BATCH_SEQUENCE_NUM, ITEM_ID, 
> ITEM_VALUE)
>SELECT i.ITEM_TYPE, b.BATCH_SEQUENCE_NUM, i.ITEM_ID, i.ITEM_VALUE   
>FROM  ITEMS i, COMPLETED_BATCHES b
>WHERE b.BATCH_ID = i.BATCH_ID AND  
>b.BATCH_SEQUENCE_NUM > 1000 AND b.BATCH_SEQUENCE_NUM < 2000;
> {code}
> A complete unit test looks like this:
> {code}
> @Test
> public void testNestedLoopJoin() throws Exception {
> try (Connection conn = DriverManager.getConnection(getUrl())) {
> String t1="COMPLETED_BATCHES";
> String ddl1 = "CREATE TABLE " + t1 + " (\n" + 
> "BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "BATCH_ID   BIGINT NOT NULL,\n" + 
> "CONSTRAINT PK PRIMARY KEY\n" + 
> "(\n" + 
> "BATCH_SEQUENCE_NUM,\n" + 
> "BATCH_ID\n" + 
> ")\n" + 
> ")" + 
> "";
> conn.createStatement().execute(ddl1);
> 
> String t2="ITEMS";
> String ddl2 = "CREATE TABLE " + t2 + " (\n" + 
> "   BATCH_ID BIGINT NOT NULL,\n" + 
> "   ITEM_ID BIGINT NOT NULL,\n" + 
> "   ITEM_TYPE BIGINT,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
> "BATCH_ID,\n" + 
> "ITEM_ID\n" + 
> "   )\n" + 
> ")";
> conn.createStatement().execute(ddl2);
> String t3="COMPLETED_ITEMS";
> String ddl3 = "CREATE TABLE " + t3 + "(\n" + 
> "   ITEM_TYPE  BIGINT NOT NULL,\n" + 
> "   BATCH_SEQUENCE_NUM BIGINT NOT NULL,\n" + 
> "   ITEM_IDBIGINT NOT NULL,\n" + 
> "   ITEM_VALUE VARCHAR,\n" + 
> "   CONSTRAINT PK PRIMARY KEY\n" + 
> "   (\n" + 
>

[jira] [Commented] (PHOENIX-4277) Treat delete markers consistently with puts for point-in-time scans

2017-10-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215770#comment-16215770
 ] 

Hadoop QA commented on PHOENIX-4277:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12893571/PHOENIX-4277.test.patch
  against master branch at commit 7cdcb2313b08d2eaeb775f0c989642f8d416cfb6.
  ATTACHMENT ID: 12893571

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
or dev patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+setUpTestDriver(new 
ReadOnlyProps(serverProps.entrySet().iterator()), new 
ReadOnlyProps(clientProps.entrySet().iterator()));
+private String[] getArgValues(String schemaName, String dataTable, String 
indxTable, Long batchSize,
+private List runScrutinyCurrentSCN(String schemaName, String 
dataTableName, String indexTableName, Long scrutinyTS) throws Exception {
+return runScrutiny(getArgValues(schemaName, dataTableName, 
indexTableName, null, SourceTable.BOTH, false, null, null, scrutinyTS));

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.IndexScrutinyToolIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1566//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1566//console

This message is automatically generated.

> Treat delete markers consistently with puts for point-in-time scans
> ---
>
> Key: PHOENIX-4277
> URL: https://issues.apache.org/jira/browse/PHOENIX-4277
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4277.test.patch, PHOENIX-4277_v2.patch, 
> PHOENIX-4277_v3.patch, PHOENIX-4277_wip.patch
>
>
> The IndexScrutinyTool relies on doing point-in-time scans to determine 
> consistency between the index and data tables. Unfortunately, deletes to the 
> tables cause a problem with this approach, since delete markers take effect 
> even if they're at a later time stamp than the point-in-time at which the 
> scan is being done (unless KEEP_DELETED_CELLS is true). The logic of this is 
> that scans should get the same results before and after a compaction take 
> place.
> Taking snapshots does not help with this since they cannot be taken at a 
> point-in-time and the delete markers will act the same way - there's no way 
> to guarantee that the index and data table snapshots have the same "logical" 
> set of data.
> Using raw scans would allow us to see the delete markers and do the correct 
> point-in-time filtering ourselves. We'd need to write the filters to do this 
> correctly (see the Tephra TransactionVisibilityFilter for an implementation 
> of this that could be adapted). We'd also need to hook this into Phoenix or 
> potentially dip down to the HBase level  to do this.
> Thanks for brainstorming on this with me, [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4076) Move master branch up to HBase 1.4.0-SNAPSHOT

2017-10-23 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215715#comment-16215715
 ] 

Andrew Purtell edited comment on PHOENIX-4076 at 10/23/17 7:32 PM:
---

I ran the test suite against latest 1.3.2-SNAPSHOT, and applied this patch and 
ran it again against latest 1.4.0-SNAPSHOT.

The 1.3.2-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTests

[ERROR] Failures: 
[ERROR]   PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild:220 Ran out 
of time
{noformat}

The 1.4.0-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTest

[ERROR] Failures: 
[ERROR]   LocalIndexSplitMergeIT.testLocalIndexScanAfterRegionsMerge:236 
expected:<[h]> but was:<[i]>
{noformat}

The results are, basically, identical in terms of number of failed tests. Using 
1.4.0-SNAPSHOT instead of 1.3.2-SNAPSHOT does not appear to introduce more 
instability. The local index test failure on 1.4.0-SNAPSHOT may be germane to 
changes impacting that feature specifically.


was (Author: apurtell):
I ran the test suite against latest 1.3.2-SNAPSHOT, and applied this patch and 
ran it again against latest 1.4.0-SNAPSHOT.

The 1.3.2-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTests

[ERROR] Failures: 
[ERROR]   PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild:220 Ran out 
of time
{noformat}

The 1.4.0-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTest

[ERROR] Failures: 
[ERROR]   LocalIndexSplitMergeIT.testLocalIndexScanAfterRegionsMerge:236 
expected:<[h]> but was:<[i]>
{noformat}

The results are, basically, identical. 

> Move master branch up to HBase 1.4.0-SNAPSHOT
> -
>
> Key: PHOENIX-4076
> URL: https://issues.apache.org/jira/browse/PHOENIX-4076
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: PHOENIX-4076.patch
>
>
> Move master branch up to HBase 1.4.0-SNAPSHOT. 
> There are some compilation problems. 
> Valid compatibility breaks are addressed and fixed by HBASE-18431. This 
> analysis is a compilation attempt of Phoenix master branch against 
> 1.4.0-SNAPSHOT artifacts including the HBASE-18431 changes. 
> HBASE-16584 removed PayloadCarryingRpcController, breaking compilation of 
> MetadataRpcController, InterRegionServerIndexRpcControllerFactory, 
> IndexRpcController, ClientRpcControllerFactory, and 
> InterRegionServerMetadataRpcControllerFactory. This class was annotated as 
> Private so was fair game to remove. It will be gone in HBase 1.4.x and up. 
> DelegateRegionObserver needs to implement added interface method 
> postCommitStoreFile.
> DelegateHTable, TephraTransactionTable, and OmidTransactionTable need to 
> implement added interface methods for getting and setting read and write 
> timeouts. 
> PhoenixRpcScheduler needs to implement added interface methods for getting 
> handler counts. 
> Store file readers/writers/scanners have been refactored and the local index 
> implementation, which implements or overrides parts of this refactored 
> hierarchy will have to also be refactored.
> DelegateRegionCoprocessorEnvironment needs to implement added method 
> getMetricRegistryForRegionServer
> Another issue with IndexRpcController: incompatible types: int cannot be 
> converted to org.apache.hadoop.hbase.TableName



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4076) Move master branch up to HBase 1.4.0-SNAPSHOT

2017-10-23 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215715#comment-16215715
 ] 

Andrew Purtell commented on PHOENIX-4076:
-

I ran the test suite against latest 1.3.2-SNAPSHOT, and applied this patch and 
ran it again against latest 1.4.0-SNAPSHOT.

The 1.3.2-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTests

[ERROR] Failures: 
[ERROR]   PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild:220 Ran out 
of time
{noformat}

The 1.4.0-SNAPSHOT results:
{noformat}
HBaseManagedTimeTests

[ERROR] Failures: 
[ERROR]   SequenceIT.testDuplicateSequences:102 Duplicate sequences
[ERROR] Errors: 
[ERROR]   SequenceIT.testSequenceDefault:761 » SequenceAlreadyExists ERROR 1200 
(42Z00):...

ParallelStatsEnabledTest

[ERROR] Failures: 
[ERROR]   LocalIndexSplitMergeIT.testLocalIndexScanAfterRegionsMerge:236 
expected:<[h]> but was:<[i]>
{noformat}

The results are, basically, identical. 

> Move master branch up to HBase 1.4.0-SNAPSHOT
> -
>
> Key: PHOENIX-4076
> URL: https://issues.apache.org/jira/browse/PHOENIX-4076
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Attachments: PHOENIX-4076.patch
>
>
> Move master branch up to HBase 1.4.0-SNAPSHOT. 
> There are some compilation problems. 
> Valid compatibility breaks are addressed and fixed by HBASE-18431. This 
> analysis is a compilation attempt of Phoenix master branch against 
> 1.4.0-SNAPSHOT artifacts including the HBASE-18431 changes. 
> HBASE-16584 removed PayloadCarryingRpcController, breaking compilation of 
> MetadataRpcController, InterRegionServerIndexRpcControllerFactory, 
> IndexRpcController, ClientRpcControllerFactory, and 
> InterRegionServerMetadataRpcControllerFactory. This class was annotated as 
> Private so was fair game to remove. It will be gone in HBase 1.4.x and up. 
> DelegateRegionObserver needs to implement added interface method 
> postCommitStoreFile.
> DelegateHTable, TephraTransactionTable, and OmidTransactionTable need to 
> implement added interface methods for getting and setting read and write 
> timeouts. 
> PhoenixRpcScheduler needs to implement added interface methods for getting 
> handler counts. 
> Store file readers/writers/scanners have been refactored and the local index 
> implementation, which implements or overrides parts of this refactored 
> hierarchy will have to also be refactored.
> DelegateRegionCoprocessorEnvironment needs to implement added method 
> getMetricRegistryForRegionServer
> Another issue with IndexRpcController: incompatible types: int cannot be 
> converted to org.apache.hadoop.hbase.TableName



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4277) Treat delete markers consistently with puts for point-in-time scans

2017-10-23 Thread Vincent Poon (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vincent Poon updated PHOENIX-4277:
--
Attachment: PHOENIX-4277.test.patch

+1 [~jamestaylor] , this test fails before your patch, passes after the patch

> Treat delete markers consistently with puts for point-in-time scans
> ---
>
> Key: PHOENIX-4277
> URL: https://issues.apache.org/jira/browse/PHOENIX-4277
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4277.test.patch, PHOENIX-4277_v2.patch, 
> PHOENIX-4277_v3.patch, PHOENIX-4277_wip.patch
>
>
> The IndexScrutinyTool relies on doing point-in-time scans to determine 
> consistency between the index and data tables. Unfortunately, deletes to the 
> tables cause a problem with this approach, since delete markers take effect 
> even if they're at a later time stamp than the point-in-time at which the 
> scan is being done (unless KEEP_DELETED_CELLS is true). The logic of this is 
> that scans should get the same results before and after a compaction take 
> place.
> Taking snapshots does not help with this since they cannot be taken at a 
> point-in-time and the delete markers will act the same way - there's no way 
> to guarantee that the index and data table snapshots have the same "logical" 
> set of data.
> Using raw scans would allow us to see the delete markers and do the correct 
> point-in-time filtering ourselves. We'd need to write the filters to do this 
> correctly (see the Tephra TransactionVisibilityFilter for an implementation 
> of this that could be adapted). We'd also need to hook this into Phoenix or 
> potentially dip down to the HBase level  to do this.
> Thanks for brainstorming on this with me, [~lhofhansl].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)