[jira] [Created] (PHOENIX-4734) SQL Query with an RVC expression lexographically higher than all values in an OR clause causes query to blow up

2018-05-10 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-4734:
-

 Summary: SQL Query with an RVC expression lexographically higher 
than all values in an OR clause causes query to blow up
 Key: PHOENIX-4734
 URL: https://issues.apache.org/jira/browse/PHOENIX-4734
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando
 Attachments: SqlInClauseIssueIT.java

See Attached unit test for repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4721) Issuing ALTER TABLE to add a PK Column to a table with secondary indexes fails

2018-04-30 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-4721:
--
Attachment: AlterTableExtendPk.java

> Issuing ALTER TABLE to add a PK Column to a table with secondary indexes fails
> --
>
> Key: PHOENIX-4721
> URL: https://issues.apache.org/jira/browse/PHOENIX-4721
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Jan Fernando
>Priority: Major
> Attachments: AlterTableExtendPk.java
>
>
> The expected behavior when adding a PK column to table is that the column 
> will successfully be added, even if the table has secondary indexes.
> For example:
> {code:java}
> ALTER TABLE TEST.ACTIVITY ADD SOURCE VARCHAR(25) NULL PRIMARY KEY
> {code}
> should execute successfully even if the table has secondary indexes defined.
> However issuing the above ALTER statement on a table with secondary indexes 
> throws the following Exception:
> {code:java}
> java.util.NoSuchElementException
> at java.util.ArrayList$Itr.next(ArrayList.java:854)
> at 
> org.apache.phoenix.schema.RowKeyValueAccessor.(RowKeyValueAccessor.java:78)
> at 
> org.apache.phoenix.schema.MetaDataClient.addColumn(MetaDataClient.java:3452)
> at 
> org.apache.phoenix.schema.MetaDataClient.addColumn(MetaDataClient.java:3120)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAddColumnStatement$1.execute(PhoenixStatement.java:1328)
> at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:393)
> at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:1)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:375)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:363)
> at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:269)
> at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
> at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
> {code}
> See attached file for a detailed repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4721) Issuing ALTER TABLE to add a PK Column to a table with secondary indexes fails

2018-04-30 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-4721:
-

 Summary: Issuing ALTER TABLE to add a PK Column to a table with 
secondary indexes fails
 Key: PHOENIX-4721
 URL: https://issues.apache.org/jira/browse/PHOENIX-4721
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.13.0
Reporter: Jan Fernando


The expected behavior when adding a PK column to table is that the column will 
successfully be added, even if the table has secondary indexes.

For example:
{code:java}
ALTER TABLE TEST.ACTIVITY ADD SOURCE VARCHAR(25) NULL PRIMARY KEY
{code}
should execute successfully even if the table has secondary indexes defined.

However issuing the above ALTER statement on a table with secondary indexes 
throws the following Exception:
{code:java}
java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:854)
at 
org.apache.phoenix.schema.RowKeyValueAccessor.(RowKeyValueAccessor.java:78)
at org.apache.phoenix.schema.MetaDataClient.addColumn(MetaDataClient.java:3452)
at org.apache.phoenix.schema.MetaDataClient.addColumn(MetaDataClient.java:3120)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableAddColumnStatement$1.execute(PhoenixStatement.java:1328)
at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:393)
at org.apache.phoenix.jdbc.PhoenixStatement$3.call(PhoenixStatement.java:1)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:375)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:363)
at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:269)
at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
{code}
See attached file for a detailed repro.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211704#comment-16211704
 ] 

Jan Fernando commented on PHOENIX-4292:
---

I think things have gotten lost in translation here and I think this discussion 
is no longer productive as my intent of the question has gotten lost.  I'm not 
going to comment any further on this thread. I apologize if I wasn't clear. All 
points taken.

> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> 

[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211636#comment-16211636
 ] 

Jan Fernando commented on PHOENIX-4292:
---

I'm not arguing with that [~apurtell] at all. I was simply asking if the bug 
fix was complete. If that's not okay to ask then I stand corrected. FWIW I 
spent several hours coming up with a detailed repro for this which was turned 
into a test. I just wasn't sure if that was sufficient which is why I asked, as 
I'm an not super familiar with that code. 

> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk 

[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211612#comment-16211612
 ] 

Jan Fernando commented on PHOENIX-4292:
---

I am not suggesting we test everything. I was just concerned when I looked at 
the review that there was only 1 test. This is the normal kind of feedback I'd 
given any developer who's code I was reviewing. Seems to me this bug might 
warrant more than 1 test, I think it's a fair question to ask any developer and 
if 1 test makes sense they should be able to argue for that. 

My concern, came when the answer to my question was that there might be other 
cases that could still not be addressed and that would be handled by filing 
another JIRA. If the other cases are not pertinent to this particular issue and 
we feel we have locked this one down then great and let's state that. I don't 
believe asking community members to submit their own tests is the right 
response here. Everyone committing code should care about quality holistically. 
I was merely trying to ask questions to think about things from that 
perspective.  

I'm simply responding to thread on the dev list a few weeks back about quality 
principles [~lhofhansl] brought up, my take away from that was try and engage 
more with these kind of questions when opportunities arose. 

Apologies if this was not how my comments sounded.  This was my 2 cents and I 
appreciate the fast turn around on the fix. I'm happy to defer to you guys as 
you are in the code everyday and I am not these days.


> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> 

[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211555#comment-16211555
 ] 

Jan Fernando commented on PHOENIX-4292:
---

I guess my 2 cents is I disagree with that approach. It's great we address bugs 
quickly, don't get me wrong and that's always appreciated. However, If we find 
a bug that exposes holes that have been there from the very beginning,  I 
believe part of fixing that bug is exploring where else that bug might have 
implications. As part of this we should make sure there is adequate test 
coverage. That way we really solve the problem and have confidence in the 
quality of what we are shipping. Otherwise we are just plugging leaks as they 
are found. As Phoenix matures and used in more production environments I 
believe this kind of approach is becoming even more important. But as I said 
that's my 2 cents.

In this case, I just wasn't clear if we had looked beyond the couple of repros 
I supplied. Hence, my question.

> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 

[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211516#comment-16211516
 ] 

Jan Fernando commented on PHOENIX-4292:
---

Okay.  I'm not a big fan of doing testing as a separate work item, we should 
have everything we need with this patch to make sure we feel we caught all the 
cases where this might be an issue. Do you feel we have enough test coverage 
with this patch?

> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> 

[jira] [Commented] (PHOENIX-4292) Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-19 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211191#comment-16211191
 ] 

Jan Fernando commented on PHOENIX-4292:
---

[~tdsilva] Should we add more test coverage? This was one example of how this 
was triggered. Does this fact this slipped through point to some other gaps in 
our test coverage around DESC keys and we should add tests for other data types 
too and some combos of data types to make sure no regressions are introduced in 
the future?

> Filters on Tables and Views with composite PK of VARCHAR fields with sort 
> direction DESC do not work
> 
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Thomas D'Silva
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4292.patch
>
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into 

[jira] [Created] (PHOENIX-4292) SOQL Filters on Tables and Filters with composite PK of VARCHAR fields with sort direction DESC does not work

2017-10-16 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-4292:
-

 Summary: SOQL Filters on Tables and Filters with composite PK of 
VARCHAR fields with sort direction DESC does not work
 Key: PHOENIX-4292
 URL: https://issues.apache.org/jira/browse/PHOENIX-4292
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.10.0
Reporter: Jan Fernando


We noticed that in certain instances on tables and views that were defined with 
a Composite PK and where the elements of the PK were all DESC that queries 
exhibited strange behavior and did not return results when expected. A simple 
query on the first element of the PK returned 0 results e.g SELECT * FROM 
MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.

After some investigation it appears that querying tables and views with a 
Composite PK that :
a) have multiple VARCHAR columns in the PK
b) the sort direction of all the VARCHAR columns is defined as DESC 
 does not work correctly and the filters are not honored and SQL appears broken 
to the end user.

Detailed repro steps:
---

-- 1. Create Global Base Table
CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
TENANT_ID CHAR(15) NOT NULL, 
KEY_PREFIX CHAR(3) NOT NULL, 
CREATED_DATE DATE,
CREATED_BY CHAR(15),
SYSTEM_MODSTAMP DATE
CONSTRAINT PK PRIMARY KEY (
TENANT_ID, 
KEY_PREFIX 
)
) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1

-- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific connection
CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 VARCHAR(10) 
NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, pk2 
DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 VARCHAR(10) 
NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, pk2 
ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, pk2 
DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
'ab4';

-- 3. Test cases that exhibit this issues
-- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
values of VARCHAR values DESC
upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 

SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
returns no records, expected to return 4
SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This query 
returns 1 record, expected to return 5
SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
returns 1 record, expected to return 0

-- The following are cases where everything works as expected and which don't 
have composite VARCHAR PKs
-- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: View 
with composite PK with single pk value DESC
upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 

SELECT * FROM TEST."ab2" WHERE pk1 = 'testa'; -- This query returns 4 records 
as expected
SELECT * FROM TEST."ab2"; -- Returns 5 rows as expected
SELECT * FROM TEST."ab2" 

[jira] [Updated] (PHOENIX-4292) SOQL Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC do not work

2017-10-16 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-4292:
--
Summary: SOQL Filters on Tables and Views with composite PK of VARCHAR 
fields with sort direction DESC do not work  (was: SOQL Filters on Tables and 
Views with composite PK of VARCHAR fields with sort direction DESC does not 
work)

> SOQL Filters on Tables and Views with composite PK of VARCHAR fields with 
> sort direction DESC do not work
> -
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Jan Fernando
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> 

[jira] [Updated] (PHOENIX-4292) SOQL Filters on Tables and Views with composite PK of VARCHAR fields with sort direction DESC does not work

2017-10-16 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-4292:
--
Summary: SOQL Filters on Tables and Views with composite PK of VARCHAR 
fields with sort direction DESC does not work  (was: SOQL Filters on Tables and 
Filters with composite PK of VARCHAR fields with sort direction DESC does not 
work)

> SOQL Filters on Tables and Views with composite PK of VARCHAR fields with 
> sort direction DESC does not work
> ---
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Jan Fernando
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> 

[jira] [Assigned] (PHOENIX-4292) SOQL Filters on Tables and Filters with composite PK of VARCHAR fields with sort direction DESC does not work

2017-10-16 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando reassigned PHOENIX-4292:
-

Assignee: Jan Fernando

> SOQL Filters on Tables and Filters with composite PK of VARCHAR fields with 
> sort direction DESC does not work
> -
>
> Key: PHOENIX-4292
> URL: https://issues.apache.org/jira/browse/PHOENIX-4292
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Jan Fernando
>Assignee: Jan Fernando
>
> We noticed that in certain instances on tables and views that were defined 
> with a Composite PK and where the elements of the PK were all DESC that 
> queries exhibited strange behavior and did not return results when expected. 
> A simple query on the first element of the PK returned 0 results e.g SELECT * 
> FROM MY_TABLE WHERE PK1 = 'myvaluethatexists' would return 0 results.
> After some investigation it appears that querying tables and views with a 
> Composite PK that :
> a) have multiple VARCHAR columns in the PK
> b) the sort direction of all the VARCHAR columns is defined as DESC 
>  does not work correctly and the filters are not honored and SQL appears 
> broken to the end user.
> Detailed repro steps:
> ---
> -- 1. Create Global Base Table
> CREATE TABLE IF NOT EXISTS TEST.BASETABLE (
> TENANT_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> TENANT_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> -- 2. Create various TENANT SPECIFIC VIEWS i.e. use a tenant specific 
> connection
> CREATE VIEW IF NOT EXISTS TEST."abc" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'abc';
> CREATE VIEW IF NOT EXISTS TEST."ab2" (pk1 VARCHAR(10) NOT NULL, pk2 
> VARCHAR(10) NOT NULL, col1 DATE,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 ASC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab2';
> CREATE VIEW IF NOT EXISTS TEST."ab3" (pk1 DATE(10) NOT NULL, pk2 DATE(10) NOT 
> NULL, col1 VARCHAR(10),  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 DESC, 
> pk2 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 'ab3';
> CREATE VIEW IF NOT EXISTS TEST."ab4" (pk1 DATE(10) NOT NULL, pk2 DECIMAL NOT 
> NULL, pk3 VARCHAR(10) NOT NULL,  col3 DECIMAL CONSTRAINT PK PRIMARY KEY (pk1 
> DESC, pk2 DESC, pk3 DESC)) AS SELECT * FROM TEST.BASETABLE WHERE KEY_PREFIX = 
> 'ab4';
> -- 3. Test cases that exhibit this issues
> -- SIMULATE EQUALITY QUERY PROBLEMS: View with composite PK with multiple PK 
> values of VARCHAR values DESC
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testa', 'teste', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."abc" (pk1, pk2, col1, col3) VALUES ('testb', 'testa', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> SELECT * FROM TEST."abc" WHERE pk1 = 'testa'; -- INCORRECT RESULT: This query 
> returns no records, expected to return 4
> SELECT * FROM TEST."abc"; -- Returns 5 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 >= 'testa'; -- INCORRECT RESULT: This 
> query returns 1 record, expected to return 5
> SELECT * FROM TEST."abc" WHERE pk1 <= 'testa';  -- Returns 4 rows as expected
> SELECT * FROM TEST."abc" WHERE pk1 > 'testa'; -- Returns 1 row as expected
> SELECT * FROM TEST."abc" WHERE pk1 < 'testa'; -- INCORRECT RESULT: This query 
> returns 1 record, expected to return 0
> -- The following are cases where everything works as expected and which don't 
> have composite VARCHAR PKs
> -- DEMONOSTRATE NO QUERY PROBLEMS WHEN HAVE VIEW WITH SINGLE DESC TEXT PK: 
> View with composite PK with single pk value DESC
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testb', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testc', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> upsert into TEST."ab2" (pk1, pk2, col1, col3) VALUES ('testa', 'testd', 
> TO_DATE('2017-10-16 22:00:00', '-MM-dd HH:mm:ss'), 10); 
> 

[jira] [Commented] (PHOENIX-3787) RVC For Paged Queries not working as expected when PK leads with column defined as 'DATE DESC'

2017-04-13 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968476#comment-15968476
 ] 

Jan Fernando commented on PHOENIX-3787:
---

Best I can tell from other ad hoc testing I have done is that this is only an 
issue when the leading column in the PK is declared DESC. If the second column 
is declared DESC the RVCs work as expected.

> RVC For Paged Queries not working as expected when PK leads with column 
> defined as 'DATE DESC'
> --
>
> Key: PHOENIX-3787
> URL: https://issues.apache.org/jira/browse/PHOENIX-3787
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0, 4.10.0
>Reporter: Jan Fernando
>
> Our application supports paged queries that utilize RVCs. We have a table 
> where the PK leads with a column that is a DATE data type. The PK specifies 
> the sort order of the DATE PK column as DESC.
> The first query doesn't utilize an RVC but uses a limit to restrict the 
> number of records for first page:
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
> TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;
> We record the PK values of the last record returned and then issue a query 
> with an RVC to get the next page and so on:
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
> FROM TEST.EVENT_MT_VIEW
> WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
> HH:mm:ss.SS'), 'event3')
> ORDER BY EVENTDATE DESC, EVENTID;
> We expect that the RVC clause will honor the sort order of the Date column 
> and page through the data in descending order. However this does not happen. 
> The greater than operator appears to be applied literally. and we return the 
> same set of data again, minus the last record.
> This breaks our query paging application and users can't page through their 
> data.
> We utilize a Multi-tenant connection and specify the 
> phoenix.query.force.rowkeyorder=true attribute on the connection. 
> You can repro this behavior with as follows:
> 1) Create table with Date Desc data type in PK
> 
> CREATE TABLE IF NOT EXISTS TEST.EVENT (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> EVENTDATE DATE NOT NULL,
> EVENTID CHAR(15) NOT NULL,
> EVENTNAME VARCHAR,
> CONSTRAINT PK PRIMARY KEY 
> (
> ORGANIZATION_ID, 
> EVENTDATE DESC,
> EVENTID
> )
> ) VERSIONS=1,MULTI_TENANT=true
> 2) Insert data into the table
> 
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 23:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event1', 'eventname1');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 22:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event2', 'eventname2');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 20:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event3', 'eventname3');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 19:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event4', 'eventname4');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 18:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event5', 'eventname5');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 17:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event6', 'eventname6');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 16:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event7', 'eventname7');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 15:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event8', 'eventname8');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 14:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event9', 'eventname9');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 13:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event10', 'eventname10');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 12:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event11', 'eventname11');
> 3) Create a Multi-tenant view using an MT connection with TenantId=Tenant1
> -
> CREATE VIEW TEST.EVENT_MT_VIEW AS SELECT * FROM TEST.EVENT;
> 4) Execute initial query - 3 rows returned
> 
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
> TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;
> 5) Execute RVC query - expect remaining 8 rows returned, by first 2 are 
> returned again
> -
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
> FROM TEST.EVENT_MT_VIEW
> WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
> HH:mm:ss.SS'), 

[jira] [Commented] (PHOENIX-3787) RVC For Paged Queries not working as expected when PK leads with column defined as 'DATE DESC'

2017-04-13 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968464#comment-15968464
 ] 

Jan Fernando commented on PHOENIX-3787:
---

This is also an issue with other data types if declared DESC. I took the same 
example above and redefined EVENTDATE as an INT and saw the same behavior.

> RVC For Paged Queries not working as expected when PK leads with column 
> defined as 'DATE DESC'
> --
>
> Key: PHOENIX-3787
> URL: https://issues.apache.org/jira/browse/PHOENIX-3787
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0, 4.10.0
>Reporter: Jan Fernando
>
> Our application supports paged queries that utilize RVCs. We have a table 
> where the PK leads with a column that is a DATE data type. The PK specifies 
> the sort order of the DATE PK column as DESC.
> The first query doesn't utilize an RVC but uses a limit to restrict the 
> number of records for first page:
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
> TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;
> We record the PK values of the last record returned and then issue a query 
> with an RVC to get the next page and so on:
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
> FROM TEST.EVENT_MT_VIEW
> WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
> HH:mm:ss.SS'), 'event3')
> ORDER BY EVENTDATE DESC, EVENTID;
> We expect that the RVC clause will honor the sort order of the Date column 
> and page through the data in descending order. However this does not happen. 
> The greater than operator appears to be applied literally. and we return the 
> same set of data again, minus the last record.
> This breaks our query paging application and users can't page through their 
> data.
> We utilize a Multi-tenant connection and specify the 
> phoenix.query.force.rowkeyorder=true attribute on the connection. 
> You can repro this behavior with as follows:
> 1) Create table with Date Desc data type in PK
> 
> CREATE TABLE IF NOT EXISTS TEST.EVENT (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> EVENTDATE DATE NOT NULL,
> EVENTID CHAR(15) NOT NULL,
> EVENTNAME VARCHAR,
> CONSTRAINT PK PRIMARY KEY 
> (
> ORGANIZATION_ID, 
> EVENTDATE DESC,
> EVENTID
> )
> ) VERSIONS=1,MULTI_TENANT=true
> 2) Insert data into the table
> 
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 23:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event1', 'eventname1');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 22:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event2', 'eventname2');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 20:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event3', 'eventname3');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 19:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event4', 'eventname4');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 18:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event5', 'eventname5');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 17:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event6', 'eventname6');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 16:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event7', 'eventname7');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 15:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event8', 'eventname8');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 14:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event9', 'eventname9');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 13:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event10', 'eventname10');
> UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 12:38:00.000', 
> '-MM-dd HH:mm:ss.SSS'), 'event11', 'eventname11');
> 3) Create a Multi-tenant view using an MT connection with TenantId=Tenant1
> -
> CREATE VIEW TEST.EVENT_MT_VIEW AS SELECT * FROM TEST.EVENT;
> 4) Execute initial query - 3 rows returned
> 
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
> TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;
> 5) Execute RVC query - expect remaining 8 rows returned, by first 2 are 
> returned again
> -
> SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
> FROM TEST.EVENT_MT_VIEW
> WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
> HH:mm:ss.SS'), 'eventname3')
> ORDER BY EVENTDATE DESC, EVENTID;



--

[jira] [Created] (PHOENIX-3787) RVC For Paged Queries not working as expected when PK leads with column defined as 'DATE DESC'

2017-04-13 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-3787:
-

 Summary: RVC For Paged Queries not working as expected when PK 
leads with column defined as 'DATE DESC'
 Key: PHOENIX-3787
 URL: https://issues.apache.org/jira/browse/PHOENIX-3787
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.9.0, 4.10.0
Reporter: Jan Fernando


Our application supports paged queries that utilize RVCs. We have a table where 
the PK leads with a column that is a DATE data type. The PK specifies the sort 
order of the DATE PK column as DESC.

The first query doesn't utilize an RVC but uses a limit to restrict the number 
of records for first page:

SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;

We record the PK values of the last record returned and then issue a query with 
an RVC to get the next page and so on:

SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
FROM TEST.EVENT_MT_VIEW
WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
HH:mm:ss.SS'), 'event3')
ORDER BY EVENTDATE DESC, EVENTID;

We expect that the RVC clause will honor the sort order of the Date column and 
page through the data in descending order. However this does not happen. The 
greater than operator appears to be applied literally. and we return the same 
set of data again, minus the last record.

This breaks our query paging application and users can't page through their 
data.

We utilize a Multi-tenant connection and specify the 
phoenix.query.force.rowkeyorder=true attribute on the connection. 

You can repro this behavior with as follows:
1) Create table with Date Desc data type in PK

CREATE TABLE IF NOT EXISTS TEST.EVENT (
ORGANIZATION_ID CHAR(15) NOT NULL,
EVENTDATE DATE NOT NULL,
EVENTID CHAR(15) NOT NULL,
EVENTNAME VARCHAR,
CONSTRAINT PK PRIMARY KEY 
(
ORGANIZATION_ID, 
EVENTDATE DESC,
EVENTID
)
) VERSIONS=1,MULTI_TENANT=true

2) Insert data into the table

UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 23:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event1', 'eventname1');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 22:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event2', 'eventname2');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 20:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event3', 'eventname3');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 19:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event4', 'eventname4');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 18:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event5', 'eventname5');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 17:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event6', 'eventname6');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 16:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event7', 'eventname7');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 15:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event8', 'eventname8');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 14:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event9', 'eventname9');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 13:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event10', 'eventname10');
UPSERT INTO TEST.EVENT VALUES ('Tenant1', TO_DATE('2017-01-28 12:38:00.000', 
'-MM-dd HH:mm:ss.SSS'), 'event11', 'eventname11');

3) Create a Multi-tenant view using an MT connection with TenantId=Tenant1
-
CREATE VIEW TEST.EVENT_MT_VIEW AS SELECT * FROM TEST.EVENT;

4) Execute initial query - 3 rows returned

SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME FROM 
TEST.EVENT_MT_VIEW ORDER BY EVENTDATE DESC, EVENTID LIMIT 3;

5) Execute RVC query - expect remaining 8 rows returned, by first 2 are 
returned again
-
SELECT TO_CHAR(EVENTDATE, '-MM-DD HH:mm:ss.SSS'), EVENTID, EVENTNAME
FROM TEST.EVENT_MT_VIEW
WHERE (EVENTDATE, EVENTID) > (TO_DATE('2017-01-28 20:38:00.000', '-MM-dd 
HH:mm:ss.SS'), 'eventname3')
ORDER BY EVENTDATE DESC, EVENTID;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3725) Add support for per VIEW level TTLs in addition to table level TTLs

2017-03-08 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901791#comment-15901791
 ] 

Jan Fernando commented on PHOENIX-3725:
---

[~giacomotaylor] This is an idea [~elilevine] and I have been tossing around 
for a bit and we are seeing an uptick in use cases where this would be really 
great feature to have and simply developer's lives. I am curious whether you 
and [~lhofhansl] have discussed any variant of this idea? Where you think it 
might fall on the Phoenix roadmap? Very happy to discuss further.

> Add support for per VIEW level TTLs in addition to table level TTLs
> ---
>
> Key: PHOENIX-3725
> URL: https://issues.apache.org/jira/browse/PHOENIX-3725
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Jan Fernando
>
> We are seeing more and more use cases where it would be very helpful to 
> declare a TTL for data at the View level instead of the table level. We have 
> several use cases where a shared Phoenix table is partitioned by views and 
> each view has some shared columns with the base table and it's own unique 
> columns. In these cases it's not possible to declare a TTL for data at the 
> table level as each view may warrant a different TTL. 
> As a result, enforcement of a TTL for data relies on custom jobs that query 
> the data that has expired and then deletes them via SQL. This has 2 major 
> disadvantages a) developers have to write a custom job and b) we can 
> potentially create a lot of delete markers in HBase.
> From an end-user perspective it would really simplify the development 
> experience if users could instead simply declare a TTL on the view and let 
> Phoenix/HBase take care of deletion at compaction time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3516) Performance Issues with queries that have compound filters and specify phoenix.query.force.rowkeyorder=true

2016-12-02 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-3516:
--
Description: 
On all our connections we specify the phoenix.query.force.rowkeyorder=true 
property to force serial scans so that we only support queries that will scale 
horizontally with data size. 

In running performance tests, we found that queries with multiple AND'ed range 
filters were slow and not performing not as expected. We looked at the query 
plan and noticed that, in the slow query case, the query plan is doing a 
PARALLEL 1-WAY SCAN and doing a SERVER FILTER BY whereas the fast query is 
simply doing a SERIAL 1-WAY RANGE SCAN.

We expect these queries to both have the same plan as we are specifying 
phoenix.query.force.rowkeyorder=true.

You can repro as follows:

1. Use non-tenant specific connection to create the table:
CREATE TABLE IF NOT EXISTS MY_MT_TABLE.TEST_TABLE (
ORGANIZATION_ID CHAR(15) NOT NULL,
PARENT_TYPE CHAR(3) NOT NULL,
PARENT_ID CHAR(15) NOT NULL,
CREATED_DATE DATE NOT NULL
CONSTRAINT PK PRIMARY KEY 
(
ORGANIZATION_ID, 
PARENT_TYPE,
PARENT_ID,
CREATED_DATE DESC
)
) VERSIONS=1,MULTI_TENANT=true,REPLICATION_SCOPE=1;

2. Use non-tenant specific connection to execute index:
CREATE INDEX IF NOT EXISTS MY_TEST_TABLE_INDEX 
ON MY_MT_TABLE.TEST_TABLE (PARENT_TYPE, CREATED_DATE, PARENT_ID);

3. Use a tenant-specific connection to create the View:
CREATE VIEW IF NOT EXISTS MY_TEST_TABLE_VIEW AS SELECT * FROM 
MY_MT_TABLE.TEST_TABLE;

4. Run queries below with tenant-specific connection:
Query with expected plan:

EXPLAIN SELECT PARENT_ID
FROM MY_TEST_TABLE_VIEW
WHERE PARENT_TYPE='001'
AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
to_date('2016-10-31'))
ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;

+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_MT_TABLE.MY_TEST_TABLE_INDEX 
['00Dxx001gFA','001','2012-10-21 00:00:00.001'] - 
['00Dxx001gFA','001',' |
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER 501 ROW LIMIT |
| CLIENT 501 ROW LIMIT |
+--+

Slow query with unexpected plan. Since the date range are overlapping we 
expected Phoenix to consolidate this into a the smallest matching range and do 
a range scan. It does seem to do the consolidation but then do a parallel and 
not a range scan.

EXPLAIN SELECT PARENT_ID
FROM MY_TEST_TABLE_VIEW
WHERE PARENT_TYPE='001'
AND (CREATED_DATE >= to_date('2011-01-01') AND CREATED_DATE <= 
to_date('2016-01-01'))
AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
to_date('2016-10-31'))
ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;

+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER MY_MT_TABLE.MY_TEST_TABLE_INDEX 
['00Dxx001gFA','001','2012-10-21 00:00:00.001'] - ['00Dxx001gFA','001' |
| SERVER FILTER BY FIRST KEY ONLY AND (true AND true) |
| SERVER 501 ROW LIMIT |
| CLIENT 501 ROW LIMIT |
+--+

  was:
On all our connections we specify the phoenix.query.force.rowkeyorder=true 
property to force serial scans so that we only support queries that will scale 
horizontally with data size. 

In running performance tests, we found that queries with multiple AND'ed range 
filters were slow and not performing not as expected. We looked at the query 
plan and noticed that, in the slow query case, the query plan is doing a 
PARALLEL 1-WAY SCAN and doing a SERVER FILTER BY whereas the fast query is 
simply doing a SERIAL 1-WAY RANGE SCAN.

We expect these queries to both have the same plan as we are specifying 
phoenix.query.force.rowkeyorder=true.

You can repro as follows:

CREATE TABLE IF NOT EXISTS MY_MT_TABLE.TEST_TABLE (
ORGANIZATION_ID CHAR(15) NOT NULL,
PARENT_TYPE CHAR(3) NOT NULL,
PARENT_ID CHAR(15) NOT NULL,
CREATED_DATE DATE NOT NULL
CONSTRAINT PK PRIMARY KEY 
(
ORGANIZATION_ID, 
PARENT_TYPE,
PARENT_ID,
CREATED_DATE DESC
)
) VERSIONS=1,MULTI_TENANT=true,REPLICATION_SCOPE=1;

CREATE INDEX IF NOT EXISTS MY_TEST_TABLE_INDEX 
ON MY_MT_TABLE.TEST_TABLE (PARENT_TYPE, CREATED_DATE, PARENT_ID);

CREATE VIEW IF NOT EXISTS MY_TEST_TABLE_VIEW AS SELECT * FROM 
MY_MT_TABLE.TEST_TABLE;

Query with expected plan:

EXPLAIN SELECT PARENT_ID
FROM MY_TEST_TABLE_VIEW
WHERE PARENT_TYPE='001'
AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
to_date('2016-10-31'))
ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;

+--+
|

[jira] [Commented] (PHOENIX-3516) Performance Issues with queries that have compound filters and specify phoenix.query.force.rowkeyorder=true

2016-12-02 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15717035#comment-15717035
 ] 

Jan Fernando commented on PHOENIX-3516:
---

One thing I forgot to mention is that the view is a tenant-specific view and 
all the queries are run using a tenant specific connection and adding the 
phoenix.query.force.rowkeyorder=true property to the JDBC url.

> Performance Issues with queries that have compound filters and specify 
> phoenix.query.force.rowkeyorder=true
> ---
>
> Key: PHOENIX-3516
> URL: https://issues.apache.org/jira/browse/PHOENIX-3516
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Jan Fernando
>
> On all our connections we specify the phoenix.query.force.rowkeyorder=true 
> property to force serial scans so that we only support queries that will 
> scale horizontally with data size. 
> In running performance tests, we found that queries with multiple AND'ed 
> range filters were slow and not performing not as expected. We looked at the 
> query plan and noticed that, in the slow query case, the query plan is doing 
> a PARALLEL 1-WAY SCAN and doing a SERVER FILTER BY whereas the fast query is 
> simply doing a SERIAL 1-WAY RANGE SCAN.
> We expect these queries to both have the same plan as we are specifying 
> phoenix.query.force.rowkeyorder=true.
> You can repro as follows:
> CREATE TABLE IF NOT EXISTS MY_MT_TABLE.TEST_TABLE (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> PARENT_TYPE CHAR(3) NOT NULL,
> PARENT_ID CHAR(15) NOT NULL,
> CREATED_DATE DATE NOT NULL
> CONSTRAINT PK PRIMARY KEY 
> (
> ORGANIZATION_ID, 
> PARENT_TYPE,
> PARENT_ID,
> CREATED_DATE DESC
> )
> ) VERSIONS=1,MULTI_TENANT=true,REPLICATION_SCOPE=1;
> CREATE INDEX IF NOT EXISTS MY_TEST_TABLE_INDEX 
> ON MY_MT_TABLE.TEST_TABLE (PARENT_TYPE, CREATED_DATE, PARENT_ID);
> CREATE VIEW IF NOT EXISTS MY_TEST_TABLE_VIEW AS SELECT * FROM 
> MY_MT_TABLE.TEST_TABLE;
> Query with expected plan:
> EXPLAIN SELECT PARENT_ID
> FROM MY_TEST_TABLE_VIEW
> WHERE PARENT_TYPE='001'
> AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
> to_date('2016-10-31'))
> ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;
> +--+
> |   PLAN   |
> +--+
> | CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_MT_TABLE.MY_TEST_TABLE_INDEX 
> ['00Dxx001gFA','001','2012-10-21 00:00:00.001'] - 
> ['00Dxx001gFA','001',' |
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER 501 ROW LIMIT |
> | CLIENT 501 ROW LIMIT |
> +--+
> Slow query with unexpected plan. Since the date range are overlapping we 
> expected Phoenix to consolidate this into a the smallest matching range and 
> do a range scan. It does seem to do the consolidation but then do a parallel 
> and not a range scan.
> EXPLAIN SELECT PARENT_ID
> FROM MY_TEST_TABLE_VIEW
> WHERE PARENT_TYPE='001'
> AND (CREATED_DATE >= to_date('2011-01-01') AND CREATED_DATE <= 
> to_date('2016-01-01'))
> AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
> to_date('2016-10-31'))
> ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;
> +--+
> |   PLAN   |
> +--+
> | CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER 
> MY_MT_TABLE.MY_TEST_TABLE_INDEX ['00Dxx001gFA','001','2012-10-21 
> 00:00:00.001'] - ['00Dxx001gFA','001' |
> | SERVER FILTER BY FIRST KEY ONLY AND (true AND true) |
> | SERVER 501 ROW LIMIT |
> | CLIENT 501 ROW LIMIT |
> +--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3516) Performance Issues with queries that have compound filters and specify phoenix.query.force.rowkeyorder=true

2016-12-02 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-3516:
-

 Summary: Performance Issues with queries that have compound 
filters and specify phoenix.query.force.rowkeyorder=true
 Key: PHOENIX-3516
 URL: https://issues.apache.org/jira/browse/PHOENIX-3516
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.9.0
Reporter: Jan Fernando


On all our connections we specify the phoenix.query.force.rowkeyorder=true 
property to force serial scans so that we only support queries that will scale 
horizontally with data size. 

In running performance tests, we found that queries with multiple AND'ed range 
filters were slow and not performing not as expected. We looked at the query 
plan and noticed that, in the slow query case, the query plan is doing a 
PARALLEL 1-WAY SCAN and doing a SERVER FILTER BY whereas the fast query is 
simply doing a SERIAL 1-WAY RANGE SCAN.

We expect these queries to both have the same plan as we are specifying 
phoenix.query.force.rowkeyorder=true.

You can repro as follows:

CREATE TABLE IF NOT EXISTS MY_MT_TABLE.TEST_TABLE (
ORGANIZATION_ID CHAR(15) NOT NULL,
PARENT_TYPE CHAR(3) NOT NULL,
PARENT_ID CHAR(15) NOT NULL,
CREATED_DATE DATE NOT NULL
CONSTRAINT PK PRIMARY KEY 
(
ORGANIZATION_ID, 
PARENT_TYPE,
PARENT_ID,
CREATED_DATE DESC
)
) VERSIONS=1,MULTI_TENANT=true,REPLICATION_SCOPE=1;

CREATE INDEX IF NOT EXISTS MY_TEST_TABLE_INDEX 
ON MY_MT_TABLE.TEST_TABLE (PARENT_TYPE, CREATED_DATE, PARENT_ID);

CREATE VIEW IF NOT EXISTS MY_TEST_TABLE_VIEW AS SELECT * FROM 
MY_MT_TABLE.TEST_TABLE;

Query with expected plan:

EXPLAIN SELECT PARENT_ID
FROM MY_TEST_TABLE_VIEW
WHERE PARENT_TYPE='001'
AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
to_date('2016-10-31'))
ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;

+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN OVER MY_MT_TABLE.MY_TEST_TABLE_INDEX 
['00Dxx001gFA','001','2012-10-21 00:00:00.001'] - 
['00Dxx001gFA','001',' |
| SERVER FILTER BY FIRST KEY ONLY  |
| SERVER 501 ROW LIMIT |
| CLIENT 501 ROW LIMIT |
+--+

Slow query with unexpected plan. Since the date range are overlapping we 
expected Phoenix to consolidate this into a the smallest matching range and do 
a range scan. It does seem to do the consolidation but then do a parallel and 
not a range scan.

EXPLAIN SELECT PARENT_ID
FROM MY_TEST_TABLE_VIEW
WHERE PARENT_TYPE='001'
AND (CREATED_DATE >= to_date('2011-01-01') AND CREATED_DATE <= 
to_date('2016-01-01'))
AND (CREATED_DATE > to_date('2012-10-21') AND CREATED_DATE < 
to_date('2016-10-31'))
ORDER BY PARENT_TYPE,CREATED_DATE LIMIT 501;

+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER MY_MT_TABLE.MY_TEST_TABLE_INDEX 
['00Dxx001gFA','001','2012-10-21 00:00:00.001'] - ['00Dxx001gFA','001' |
| SERVER FILTER BY FIRST KEY ONLY AND (true AND true) |
| SERVER 501 ROW LIMIT |
| CLIENT 501 ROW LIMIT |
+--+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

2016-11-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15640883#comment-15640883
 ] 

Jan Fernando commented on PHOENIX-3439:
---

[~giacomotaylor] Yes via a tenant specific connection. Sorry I wasn't clear.

> Query using an RVC based on the base table PK is incorrectly using an index 
> and doing a full scan instead of a point query
> --
>
> Key: PHOENIX-3439
> URL: https://issues.apache.org/jira/browse/PHOENIX-3439
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.1
>Reporter: Jan Fernando
>Assignee: James Taylor
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3439.patch
>
>
> We use Phoenix RVCs to support paginated queries. This performance of this 
> functionality relies on Phoenix predictably generating scans against a table 
> or index with a PK that matches the RVC specified for each page.
> What we do is that on the initial query we use 
> PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
> persist that and use those to generate RVCs for paginated queries.
> We have discovered that for queries where:
> a) the user doesn't specify an ORDER BY
> b) for tables where secondary indexes are present
> Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
> then subsequent queries using the RVCs to paginate execute against a 
> secondary index doing a full scan.
> We have a table with a secondary index where this is an issue. The base table 
> has a PK of PKCOL1, PKCOL2, PKCOL3 and 
> PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
> PKCOL2, PKCOL4.
> Here's what happens:
> Here is our query we run to get the Query plan from which we generate the 
> RVCs to be used for paging:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> LIMIT 501;
> I get the following explain:
> CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
> ['00Dxx001gFA']
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 
> from MY_TABLES.MY_TABLE
> However when I generate the RVC query to page through the data:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> (pkcol1, pkcol2, pkcol3,pkcol4) > 
> ('001','001xx03DHml',to_date('2015-10-21 09 
> (tel:2015102109):50:55.0'),'017xx022FuI')
> LIMIT 501;
> I get the follow explain plan:
> CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN 
> RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx001gFA','001'] - 
> ['00Dxx001gFA',*]
> SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
> TO_CHAR('001xx03DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', 
> TO_CHAR('017xx022FuI'))
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> We expected that the second query with RVCs above would execute against the 
> base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the 
> index PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

2016-11-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15640687#comment-15640687
 ] 

Jan Fernando edited comment on PHOENIX-3439 at 11/5/16 10:21 PM:
-

[~giacomotay...@gmail.com] [~samarthjain] I test this out with the Phoenix 4.9 
jar and it doesn't seems to be completely fixed. It looks like the fix works 
for standard tables but not for queries against multi-tenant tables.

1. Create the following tables and index
CREATE TABLE IF NOT EXISTS MY_TABLES.MY_TABLE (ORGANIZATION_ID CHAR(15) NOT 
NULL, PKCOL1 CHAR(15) NOT NULL,PKCOL2 CHAR(15) NOT NULL,PKCOL3 CHAR(15) NOT 
NULL,PKCOL4 CHAR(15) NOT NULL,COL1 CHAR(15),COL2 CHAR(15)CONSTRAINT PK PRIMARY 
KEY (ORGANIZATION_ID,PKCOL1,PKCOL2,PKCOL3,PKCOL4)) MULTI_TENANT=true

CREATE INDEX IF NOT EXISTS MY_TABLE_INDEX 
ON MY_TABLES.MY_TABLE (PKCOL1, PKCOL3, PKCOL2, PKCOL4)
INCLUDE (COL1, COL2);

2. Create a MT view as follows:
CREATE VIEW IF NOT EXISTS MY_TABLE_MT_VIEW AS SELECT * FROM MY_TABLES.MY_TABLE;

3. Insert a record into the MT view:
upsert into my_table_mt_view (pkcol1, pkcol2, pkcol3, pkcol4) values ('1', '2', 
'3', '4');

4. Run explain:
explain select * from my_table_mt_view where (pkcol1, pkcol2, pkcol3, pkcol4) > 
('0', '0', '0', '0');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER 
MY_TABLES.MY_TABLE_INDEX ['00Dxx001gFA','0  '] - 
['00Dxx001gFA',*] |
| SERVER FILTER BY ("PKCOL1", "PKCOL2", "PKCOL3", "PKCOL4") > 
(TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0')) |
+--+

As you can see the query optimizer still wants to use the secondary index. I 
don't see this behavior when using a non-MT connection and non-multi-tenant 
table.

For the standard table using a non MT connection I see:

0: jdbc:phoenix:localhost>  explain select * from MY_TABLES.MY_TABLE  where 
(organization_id, pkcol1, pkcol2, pkcol3, pkcol4) > ('00Dxx001gFA', '1', 
'2', '3', '4');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_TABLES.MY_TABLE |
+--+



was (Author: jfernando_sfdc):
[~giacomotay...@gmail.com] [~samarthjain] I test this out with the Phoenix 4.9 
jar and it doesn't seems to be completely fixed. It looks like the fix works 
for standard tables but not for queries against multi-tenant tables.

1. Create the following tables and index
CREATE TABLE IF NOT EXISTS MY_TABLES.MY_TABLE (ORGANIZATION_ID CHAR(15) NOT 
NULL, PKCOL1 CHAR(15) NOT NULL,PKCOL2 CHAR(15) NOT NULL,PKCOL3 CHAR(15) NOT 
NULL,PKCOL4 CHAR(15) NOT NULL,COL1 CHAR(15),COL2 CHAR(15)CONSTRAINT PK PRIMARY 
KEY (ORGANIZATION_ID,PKCOL1,PKCOL2,PKCOL3,PKCOL4)) MULTI_TENANT=true

CREATE INDEX IF NOT EXISTS MY_TABLE_INDEX 
ON MY_TABLES.MY_TABLE (PKCOL1, PKCOL3, PKCOL2, PKCOL4)
INCLUDE (COL1, COL2);

2. Create a MT view as follows:
CREATE VIEW IF NOT EXISTS MY_TABLE_MT_VIEW AS SELECT * FROM MY_TABLES.MY_TABLE;

3. Insert a record into the MT view:
upsert into my_table_mt_view (pkcol1, pkcol2, pkcol3, pkcol4) values ('1', '2', 
'3', '4');

4. Run explain:
explain select * from my_table_mt_view where (pkcol1, pkcol2, pkcol3, pkcol4) > 
('0', '0', '0', '0');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER 
MY_TABLES.MY_TABLE_INDEX ['00Dxx001gFA','0  '] - 
['00Dxx001gFA',*] |
| SERVER FILTER BY ("PKCOL1", "PKCOL2", "PKCOL3", "PKCOL4") > 
(TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0')) |
+--+

As you can see the query optimizer still wants to use the secondary index. I 
don't 

For the standard table using a non MT connection I see:

0: jdbc:phoenix:localhost>  explain select * from MY_TABLES.MY_TABLE  where 
(organization_id, pkcol1, pkcol2, pkcol3, pkcol4) > ('00Dxx001gFA', '1', 
'2', '3', '4');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_TABLES.MY_TABLE |
+--+


> Query using an RVC based on the base table PK is incorrectly using an index 
> and doing a full scan instead of a point query
> --
>
> Key: PHOENIX-3439
> URL: https://issues.apache.org/jira/browse/PHOENIX-3439
> Project: Phoenix
>  Issue Type: 

[jira] [Reopened] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

2016-11-05 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando reopened PHOENIX-3439:
---

[~giacomotay...@gmail.com] [~samarthjain] I test this out with the Phoenix 4.9 
jar and it doesn't seems to be completely fixed. It looks like the fix works 
for standard tables but not for queries against multi-tenant tables.

1. Create the following tables and index
CREATE TABLE IF NOT EXISTS MY_TABLES.MY_TABLE (ORGANIZATION_ID CHAR(15) NOT 
NULL, PKCOL1 CHAR(15) NOT NULL,PKCOL2 CHAR(15) NOT NULL,PKCOL3 CHAR(15) NOT 
NULL,PKCOL4 CHAR(15) NOT NULL,COL1 CHAR(15),COL2 CHAR(15)CONSTRAINT PK PRIMARY 
KEY (ORGANIZATION_ID,PKCOL1,PKCOL2,PKCOL3,PKCOL4)) MULTI_TENANT=true

CREATE INDEX IF NOT EXISTS MY_TABLE_INDEX 
ON MY_TABLES.MY_TABLE (PKCOL1, PKCOL3, PKCOL2, PKCOL4)
INCLUDE (COL1, COL2);

2. Create a MT view as follows:
CREATE VIEW IF NOT EXISTS MY_TABLE_MT_VIEW AS SELECT * FROM MY_TABLES.MY_TABLE;

3. Insert a record into the MT view:
upsert into my_table_mt_view (pkcol1, pkcol2, pkcol3, pkcol4) values ('1', '2', 
'3', '4');

4. Run explain:
explain select * from my_table_mt_view where (pkcol1, pkcol2, pkcol3, pkcol4) > 
('0', '0', '0', '0');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER 
MY_TABLES.MY_TABLE_INDEX ['00Dxx001gFA','0  '] - 
['00Dxx001gFA',*] |
| SERVER FILTER BY ("PKCOL1", "PKCOL2", "PKCOL3", "PKCOL4") > 
(TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0'), TO_CHAR('0')) |
+--+

As you can see the query optimizer still wants to use the secondary index. I 
don't 

For the standard table using a non MT connection I see:

0: jdbc:phoenix:localhost>  explain select * from MY_TABLES.MY_TABLE  where 
(organization_id, pkcol1, pkcol2, pkcol3, pkcol4) > ('00Dxx001gFA', '1', 
'2', '3', '4');
+--+
|   PLAN   |
+--+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER MY_TABLES.MY_TABLE |
+--+


> Query using an RVC based on the base table PK is incorrectly using an index 
> and doing a full scan instead of a point query
> --
>
> Key: PHOENIX-3439
> URL: https://issues.apache.org/jira/browse/PHOENIX-3439
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.1
>Reporter: Jan Fernando
>Assignee: James Taylor
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3439.patch
>
>
> We use Phoenix RVCs to support paginated queries. This performance of this 
> functionality relies on Phoenix predictably generating scans against a table 
> or index with a PK that matches the RVC specified for each page.
> What we do is that on the initial query we use 
> PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
> persist that and use those to generate RVCs for paginated queries.
> We have discovered that for queries where:
> a) the user doesn't specify an ORDER BY
> b) for tables where secondary indexes are present
> Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
> then subsequent queries using the RVCs to paginate execute against a 
> secondary index doing a full scan.
> We have a table with a secondary index where this is an issue. The base table 
> has a PK of PKCOL1, PKCOL2, PKCOL3 and 
> PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
> PKCOL2, PKCOL4.
> Here's what happens:
> Here is our query we run to get the Query plan from which we generate the 
> RVCs to be used for paging:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> LIMIT 501;
> I get the following explain:
> CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
> ['00Dxx001gFA']
> SERVER 501 ROW LIMIT
> CLIENT 501 ROW LIMIT
> Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 
> from MY_TABLES.MY_TABLE
> However when I generate the RVC query to page through the data:
> EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
> FROM MY_TABLES."MYTABLE"
> (pkcol1, pkcol2, pkcol3,pkcol4) > 
> ('001','001xx03DHml',to_date('2015-10-21 09 
> (tel:2015102109):50:55.0'),'017xx022FuI')
> LIMIT 501;
> I get the follow explain plan:
> CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN 
> RANGE SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx001gFA','001'] - 
> ['00Dxx001gFA',*]
> SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
> 

[jira] [Reopened] (PHOENIX-3421) Column name lookups fail when on an indexed table

2016-11-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando reopened PHOENIX-3421:
---

[~giacomotaylor] Unfortunately we discovered today that even with this fix we 
have a significant backwards compatibility issue that will impact our 
production code. 

Our code that generates Phoenix SQL works with 4.8.0 client and server but with 
the 4.8.0 client and 4.8.1 or 4.9 server jar our code that generates Phoenix 
queries no longer works due to this issue. Can we have a fix that is backwards 
compatible with the 4.8.0 client and our code line that runs with that. This 
will prevent a breakage while have an older client and new server jar.

 [~cody.mar...@gmail.com] Can you add some info about the specific method whose 
returns values changes that is causing backwards compatibility to be broken?

> Column name lookups fail when on an indexed table
> -
>
> Key: PHOENIX-3421
> URL: https://issues.apache.org/jira/browse/PHOENIX-3421
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Cody Marcel
>Assignee: James Taylor
> Fix For: 4.9.0
>
> Attachments: DebuggerWindow.png, PHOENIX-3421.patch
>
>
> Using an index the lookup for encoded values fails.
> This happens on tables when using an index.
> The conflict is essentially between the two methods below. The pkColsList 
> create by getPkColsDataTypesForSql() returns column names without a ":", but 
> the encodeValues() method does a lookup on PTable for the column and cannot 
> find it.
> PhoenixRuntime.getPkColsDataTypesForSql(pkColsList, dataTypesList, queryPlan, 
> connection, true);
> PhoenixRuntime.encodeValues(connection, queryPlanTableName, objects , 
> pkColsList);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3439) Query using an RVC based on the base table PK is incorrectly using an index and doing a full scan instead of a point query

2016-11-03 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-3439:
-

 Summary: Query using an RVC based on the base table PK is 
incorrectly using an index and doing a full scan instead of a point query
 Key: PHOENIX-3439
 URL: https://issues.apache.org/jira/browse/PHOENIX-3439
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Jan Fernando


We use Phoenix RVCs to support paginated queries. This performance of this 
functionality relies on Phoenix predictably generating scans against a table or 
index with a PK that matches the RVC specified for each page.

What we do is that on the initial query we use 
PhoenixRuntime.getPkColsDataTypesForSql() to get the list of PK columns and 
persist that and use those to generate RVCs for paginated queries.

We have discovered that for queries where:
a) the user doesn't specify an ORDER BY
b) for tables where secondary indexes are present

Phoenix returns pk cols for the base table via getPkColsDataTypesForSql() but 
then subsequent queries using the RVCs to paginate execute against a secondary 
index doing a full scan.

We have a table with a secondary index where this is an issue. The base table 
has a PK of PKCOL1, PKCOL2, PKCOL3 and 
PKCOL4. We have an immutable secondary index where the PK is PKCOL1, PKCOL3, 
PKCOL2, PKCOL4.

Here's what happens:

Here is our query we run to get the Query plan from which we generate the RVCs 
to be used for paging:

EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
FROM MY_TABLES."MYTABLE"
LIMIT 501;

I get the following explain:
CLIENT 6-CHUNK SERIAL 1-WAY ROUND ROBIN RANGE SCAN OVER MY_TABLES.MY_TABLE 
['00Dxx001gFA']
SERVER 501 ROW LIMIT
CLIENT 501 ROW LIMIT

Therefore the columns we record for RVCs for paging are PK1, PK2, PK3, PK4 from 
MY_TABLES.MY_TABLE

However when I generate the RVC query to page through the data:

EXPLAIN SELECT pkcol1, pkcol2, pkcol3, pkcol4, col1, col2
FROM MY_TABLES."MYTABLE"
(pkcol1, pkcol2, pkcol3,pkcol4) > ('001','001xx03DHml',to_date('2015-10-21 
09 (tel:2015102109):50:55.0'),'017xx022FuI')
LIMIT 501;

I get the follow explain plan:
CLIENT 24-CHUNK 7811766 ROWS 6291457403 BYTES PARALLEL 1-WAY ROUND ROBIN RANGE 
SCAN OVER MY_TABLES.MY_SECONDARY_INDEX ['00Dxx001gFA','001'] - 
['00Dxx001gFA',*]
SERVER FILTER BY ("PKCOL1", "PKCOL2, "PKCOL3", "PKCOL4") > (TO_CHAR('001'), 
TO_CHAR('001xx03DHml'), DATE '2015-10-21 09 (tel:2015102109):50:55.000', 
TO_CHAR('017xx022FuI'))
SERVER 501 ROW LIMIT
CLIENT 501 ROW LIMIT

We expected that the second query with RVCs above would execute against the 
base table as the base table PK is PKCOL1, PKCOL2, PKCOL3, PKCOL4 and the index 
PK is PKCOL1, PKCOL3, PKCOL2, PKCOL4.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3396) Valid Multi-byte strings whose total byte size is greater than the max char limit cannot be inserted into VARCHAR fields in the PK

2016-10-24 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602805#comment-15602805
 ] 

Jan Fernando commented on PHOENIX-3396:
---

[~giacomotaylor] Was there a reason this works differently with non-PK and PK 
fields. I only see this issue if a fields is part of the PK. I think the 
behavior should be consistent in all cases.

> Valid Multi-byte strings whose total byte size is greater than the max char 
> limit cannot be inserted into VARCHAR fields in the PK 
> ---
>
> Key: PHOENIX-3396
> URL: https://issues.apache.org/jira/browse/PHOENIX-3396
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jan Fernando
>
> We allow users to insert multi-byte characters into  VARCHAR columns that are 
> part of a table or view's PK. We noticed that Strings that had a valid number 
> of characters (i.e. were less than than max char length) were causing upserts 
> to fail with with the following exception:
> Caused by: java.sql.SQLException: ERROR 206 (22003): The data exceeds the max 
> capacity for the data type. MYTABLE may not exceed 100 bytes 
> ('緓嗝加슪䐤㵞靹疸芬꽣汚佃䘯茵䖻埾巆蕤ⱅ澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉ķ窯尬룗㚈Ꝝ퍛爃됰灁ᄠࢥ')
> There appears to be an issue in PTableImpl.newKey() where we check the 
> maxLength in chars against the byte length in this check:
> maxLength != null && !type.isArrayType() && byteValue.length > maxLength
> To reproduce you can run the following:
> CREATE TABLE TEXT_FIELD_VALIDATION_PK (TEXT VARCHAR(20), TEXT1 VARCHAR(20) 
> CONSTRAINT PK PRIMARY KEY (TEXT));
> UPSERT INTO TEXT_FIELD_VALIDATION_PK VALUES ('澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉碉', 'test');
> The string we insert into the column TEXT is 20 chars, but greater than 20 
> bytes. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3396) Valid Multi-byte strings whose total byte size is greater than the max char limit cannot be inserted into VARCHAR fields in the PK

2016-10-24 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602805#comment-15602805
 ] 

Jan Fernando edited comment on PHOENIX-3396 at 10/24/16 6:25 PM:
-

[~giacomotaylor] Was there a reason this works differently with non-PK and PK 
fields? I only see this issue if a field is part of the PK. I think the 
behavior should be consistent in all cases.


was (Author: jfernando_sfdc):
[~giacomotaylor] Was there a reason this works differently with non-PK and PK 
fields. I only see this issue if a fields is part of the PK. I think the 
behavior should be consistent in all cases.

> Valid Multi-byte strings whose total byte size is greater than the max char 
> limit cannot be inserted into VARCHAR fields in the PK 
> ---
>
> Key: PHOENIX-3396
> URL: https://issues.apache.org/jira/browse/PHOENIX-3396
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jan Fernando
>
> We allow users to insert multi-byte characters into  VARCHAR columns that are 
> part of a table or view's PK. We noticed that Strings that had a valid number 
> of characters (i.e. were less than than max char length) were causing upserts 
> to fail with with the following exception:
> Caused by: java.sql.SQLException: ERROR 206 (22003): The data exceeds the max 
> capacity for the data type. MYTABLE may not exceed 100 bytes 
> ('緓嗝加슪䐤㵞靹疸芬꽣汚佃䘯茵䖻埾巆蕤ⱅ澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉ķ窯尬룗㚈Ꝝ퍛爃됰灁ᄠࢥ')
> There appears to be an issue in PTableImpl.newKey() where we check the 
> maxLength in chars against the byte length in this check:
> maxLength != null && !type.isArrayType() && byteValue.length > maxLength
> To reproduce you can run the following:
> CREATE TABLE TEXT_FIELD_VALIDATION_PK (TEXT VARCHAR(20), TEXT1 VARCHAR(20) 
> CONSTRAINT PK PRIMARY KEY (TEXT));
> UPSERT INTO TEXT_FIELD_VALIDATION_PK VALUES ('澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉碉', 'test');
> The string we insert into the column TEXT is 20 chars, but greater than 20 
> bytes. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3396) Valid Multi-byte strings whose total byte size is greater than the max char limit cannot be inserted into VARCHAR fields in the PK

2016-10-21 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-3396:
-

 Summary: Valid Multi-byte strings whose total byte size is greater 
than the max char limit cannot be inserted into VARCHAR fields in the PK 
 Key: PHOENIX-3396
 URL: https://issues.apache.org/jira/browse/PHOENIX-3396
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando


We allow users to insert multi-byte characters into  VARCHAR columns that are 
part of a table or view's PK. We noticed that Strings that had a valid number 
of characters (i.e. were less than than max char length) were causing upserts 
to fail with with the following exception:

Caused by: java.sql.SQLException: ERROR 206 (22003): The data exceeds the max 
capacity for the data type. MYTABLE may not exceed 100 bytes 
('緓嗝加슪䐤㵞靹疸芬꽣汚佃䘯茵䖻埾巆蕤ⱅ澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉ķ窯尬룗㚈Ꝝ퍛爃됰灁ᄠࢥ')

There appears to be an issue in PTableImpl.newKey() where we check the 
maxLength in chars against the byte length in this check:

maxLength != null && !type.isArrayType() && byteValue.length > maxLength

To reproduce you can run the following:

CREATE TABLE TEXT_FIELD_VALIDATION_PK (TEXT VARCHAR(20), TEXT1 VARCHAR(20) 
CONSTRAINT PK PRIMARY KEY (TEXT));

UPSERT INTO TEXT_FIELD_VALIDATION_PK VALUES ('澴粖蟤य褻酃岤豦팑薰鄩脼ժ끦碉碉', 'test');

The string we insert into the column TEXT is 20 chars, but greater than 20 
bytes. 

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2640) Make it possible to kill running queries (and ideally list them too)

2016-01-28 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122360#comment-15122360
 ] 

Jan Fernando commented on PHOENIX-2640:
---

>From my perspective, the more general purpose solution that would allow for 
>building a UI and killing queries from there is much more useful. 

> Make it possible to kill running queries (and ideally list them too)
> 
>
> Key: PHOENIX-2640
> URL: https://issues.apache.org/jira/browse/PHOENIX-2640
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: John Wilkinson
>
> It would be useful to be able to find and kill queries that have gone on
> for a long time.
> Right now, it looks like there is no way other than killing the client, and no
> way to find running queries other than to have the client track them.
> One useful step would be to implement Statement.cancel(), which would allow 
> client code to cleanly kill queries.
> Even better from a dev-ops/support point of view would be the ability to see 
> running queries via a UI or API of some sort and kill them from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2640) Make it possible to kill running queries (and ideally list them too)

2016-01-28 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122465#comment-15122465
 ] 

Jan Fernando commented on PHOENIX-2640:
---

Absolutely agree the Phoenix community should not be building UIs and advanced 
tooling at this time! What I do think would be useful is to have a way to kill 
a query from a client running on a different JVM. The use case is where we have 
many stateless app servers all running Phoenix and a central administration 
console. If PhoenixRuntime could expose all currently running statements it 
would allow people to build tooling on top of a REST API, for example, to view 
and kill queries from a central administration console. I see this as small 
additional layer on top of Statement.cancel().  My 2 cents :)

> Make it possible to kill running queries (and ideally list them too)
> 
>
> Key: PHOENIX-2640
> URL: https://issues.apache.org/jira/browse/PHOENIX-2640
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: John Wilkinson
>
> It would be useful to be able to find and kill queries that have gone on
> for a long time.
> Right now, it looks like there is no way other than killing the client, and no
> way to find running queries other than to have the client track them.
> One useful step would be to implement Statement.cancel(), which would allow 
> client code to cleanly kill queries.
> Even better from a dev-ops/support point of view would be the ability to see 
> running queries via a UI or API of some sort and kill them from there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch

2015-12-20 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando resolved PHOENIX-2367.
---
Resolution: Fixed

I have committed the patch to master, 4.x-HBase-0.98, 4.x-HBase-1.0 and 
4.x-HBase-1.1.

> Change PhoenixRecordWriter to use execute instead of executeBatch
> -
>
> Key: PHOENIX-2367
> URL: https://issues.apache.org/jira/browse/PHOENIX-2367
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2367.patch, PHOENIX-2367_2.patch
>
>
> Hey All,
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
> similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
> did some work on the apache pig code that allows Storers to have a notion of 
> Customizable/Configurable Errors PIG-4704.
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
> changes for the same.
> Current Behavior/Problem:
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch 
> size is reached. If there are any client side validation/syntactical errors 
> like data not fitting the column size, executeBatch() throws an exception and 
> there is no-way to retrieve the valid rows from the batch and retry them. We 
> discard the whole batch or fail the job without errorhandling.
> With auto commit set to false execute() also servers the purpose of not 
> making any rpc calls  but does a bunch of validation client side and adds it 
> to the client cache of mutation.
> On conn.commit() we make a rpc call.
> Proposed Change
> To be able to use Configurable ErrorHandling and ignore only the failed 
> records instead of discarding the whole batch I want to propose changing the 
> behavior in PhoenixRecordWriter from execute to executeBatch() or having a 
> configuration to toggle between the 2 behaviors 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch

2015-12-16 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061327#comment-15061327
 ] 

Jan Fernando commented on PHOENIX-2367:
---

[~giacomotaylor]I can get to it this weekend. I am serving on a jury at the 
moment so am rather slammed this week. Trial should be done this week. Does 
that work?

> Change PhoenixRecordWriter to use execute instead of executeBatch
> -
>
> Key: PHOENIX-2367
> URL: https://issues.apache.org/jira/browse/PHOENIX-2367
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2367.patch, PHOENIX-2367_2.patch
>
>
> Hey All,
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
> similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
> did some work on the apache pig code that allows Storers to have a notion of 
> Customizable/Configurable Errors PIG-4704.
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
> changes for the same.
> Current Behavior/Problem:
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch 
> size is reached. If there are any client side validation/syntactical errors 
> like data not fitting the column size, executeBatch() throws an exception and 
> there is no-way to retrieve the valid rows from the batch and retry them. We 
> discard the whole batch or fail the job without errorhandling.
> With auto commit set to false execute() also servers the purpose of not 
> making any rpc calls  but does a bunch of validation client side and adds it 
> to the client cache of mutation.
> On conn.commit() we make a rpc call.
> Proposed Change
> To be able to use Configurable ErrorHandling and ignore only the failed 
> records instead of discarding the whole batch I want to propose changing the 
> behavior in PhoenixRecordWriter from execute to executeBatch() or having a 
> configuration to toggle between the 2 behaviors 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch

2015-12-15 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059329#comment-15059329
 ] 

Jan Fernando commented on PHOENIX-2367:
---

+1 Looks good to me.

> Change PhoenixRecordWriter to use execute instead of executeBatch
> -
>
> Key: PHOENIX-2367
> URL: https://issues.apache.org/jira/browse/PHOENIX-2367
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2367.patch, PHOENIX-2367_2.patch
>
>
> Hey All,
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
> similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
> did some work on the apache pig code that allows Storers to have a notion of 
> Customizable/Configurable Errors PIG-4704.
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
> changes for the same.
> Current Behavior/Problem:
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch 
> size is reached. If there are any client side validation/syntactical errors 
> like data not fitting the column size, executeBatch() throws an exception and 
> there is no-way to retrieve the valid rows from the batch and retry them. We 
> discard the whole batch or fail the job without errorhandling.
> With auto commit set to false execute() also servers the purpose of not 
> making any rpc calls  but does a bunch of validation client side and adds it 
> to the client cache of mutation.
> On conn.commit() we make a rpc call.
> Proposed Change
> To be able to use Configurable ErrorHandling and ignore only the failed 
> records instead of discarding the whole batch I want to propose changing the 
> behavior in PhoenixRecordWriter from execute to executeBatch() or having a 
> configuration to toggle between the 2 behaviors 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-10 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando resolved PHOENIX-2373.
---
Resolution: Fixed

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-10 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999208#comment-14999208
 ] 

Jan Fernando commented on PHOENIX-2373:
---

Thanks for the contribution [~siddhimehta]! I have committed the changes to 
master, 4.x-HBase-0.98, and 4.x-HBase-1.0.

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-09 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997343#comment-14997343
 ] 

Jan Fernando commented on PHOENIX-2373:
---

+ 1 changes look good to me!

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-09 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997753#comment-14997753
 ] 

Jan Fernando commented on PHOENIX-2373:
---

[~jamestaylor] In terms of branches branches this one should go in master, 
4.x-HBase-0.98, 4.x-HBase-1.0 I assume?

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2373) Change ReserveNSequence Udf to take in zookeeper and tentantId as param

2015-11-09 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997741#comment-14997741
 ] 

Jan Fernando commented on PHOENIX-2373:
---

[~giacomotaylor] If you are good on this patch I can go ahead and commit it. 
Let me know.

> Change ReserveNSequence Udf to take in zookeeper and tentantId as param
> ---
>
> Key: PHOENIX-2373
> URL: https://issues.apache.org/jira/browse/PHOENIX-2373
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>Priority: Minor
> Attachments: PHOENIX-2373.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently the UDF reads zookeeper quorum for tuple value and tenantId is 
> passed in from the jobConf.
> Instead wanted to make a change for the UDF to take both zookeeper quorum and 
> tenantId as params passed to the UDF explicitly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2299) Support CURRENT_DATE() in Pherf data upserts

2015-11-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992502#comment-14992502
 ] 

Jan Fernando commented on PHOENIX-2299:
---

[~cody.mar...@gmail.com] brings up a really great point. I can see this would 
be very useful when using Pherf to load/seed data but could really be dangerous 
if calculating query perf stats that uses date filtering. What about if NOW() 
sets the date at the start of the load, stores it with the scenario and the 
queries can refer to this calculated date using a special variable? That way 
the tests would be repeatable and you wouldn't suddenly think your queries were 
blazing faster because they were returning zero data.

> Support CURRENT_DATE() in Pherf data upserts
> 
>
> Key: PHOENIX-2299
> URL: https://issues.apache.org/jira/browse/PHOENIX-2299
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Karan Singhal
>
> Just replace the actual date with "NOW" in the xml. Then check the string for 
> that value in the generator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2367) Change PhoenixRecordWriter to use execute instead of executeBatch

2015-11-04 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990976#comment-14990976
 ] 

Jan Fernando commented on PHOENIX-2367:
---

I don't believe there is any performance overhead currently since addBatch() 
and executeBatch() don't do anything special in Phoenix. Based on that I am not 
sure it's worth adding the configuration, unless we feel there are some 
optimizations coming for executeBatch() in the near future.

> Change PhoenixRecordWriter to use execute instead of executeBatch
> -
>
> Key: PHOENIX-2367
> URL: https://issues.apache.org/jira/browse/PHOENIX-2367
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Siddhi Mehta
>Assignee: Siddhi Mehta
>
> Hey All,
> I wanted to add a notion of skipping invalid rows for PhoenixHbaseStorage 
> similar to how the CSVBulkLoad tool has an option of ignoring the bad rows.I 
> did some work on the apache pig code that allows Storers to have a notion of 
> Customizable/Configurable Errors PIG-4704.
> I wanted to plug this behavior for PhoenixHbaseStorage and propose certain 
> changes for the same.
> Current Behavior/Problem:
> PhoenixRecordWriter makes use of executeBatch() to process rows once batch 
> size is reached. If there are any client side validation/syntactical errors 
> like data not fitting the column size, executeBatch() throws an exception and 
> there is no-way to retrieve the valid rows from the batch and retry them. We 
> discard the whole batch or fail the job without errorhandling.
> With auto commit set to false execute() also servers the purpose of not 
> making any rpc calls  but does a bunch of validation client side and adds it 
> to the client cache of mutation.
> On conn.commit() we make a rpc call.
> Proposed Change
> To be able to use Configurable ErrorHandling and ignore only the failed 
> records instead of discarding the whole batch I want to propose changing the 
> behavior in PhoenixRecordWriter from execute to executeBatch() or having a 
> configuration to toggle between the 2 behaviors 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-08 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2310:
--
Attachment: PHOENIX-2310-4.5.patch

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-4.5.patch, PHOENIX-2310-v1.patch, 
> PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-08 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948999#comment-14948999
 ] 

Jan Fernando commented on PHOENIX-2310:
---

Committed to 4.5 and 4.x branches as well.

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-4.5.patch, PHOENIX-2310-v1.patch, 
> PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-08 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando resolved PHOENIX-2310.
---
Resolution: Fixed

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Fix For: 4.5.3
>
> Attachments: PHOENIX-2310-4.5.patch, PHOENIX-2310-v1.patch, 
> PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-08 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2310:
--
Fix Version/s: 4.5.3

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Fix For: 4.5.3
>
> Attachments: PHOENIX-2310-4.5.patch, PHOENIX-2310-v1.patch, 
> PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-07 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947950#comment-14947950
 ] 

Jan Fernando commented on PHOENIX-2310:
---

Committed to master, but the patch is not applying cleanly to 4.5 and 4.x. I 
will regenerate a new patch tomorrow and commit to 4.5 and 4.x branches.

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-v1.patch, PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-07 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2310:
--
Attachment: PHOENIX-2310_v2.patch

[~jamestaylor] I added the toString() method to DelegateTable. Right now I 
don't have cycles to check on all subclasses of Expression. If uploaded v2 of 
the patch. Let me know if this looks good to you and I'll commit the changes if 
so.

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-v1.patch, PHOENIX-2310_v2.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944502#comment-14944502
 ] 

Jan Fernando commented on PHOENIX-2310:
---

[~jamestaylor] Can you take a look at this patch please? I think the fix is 
really simple and just requires us implementing toString() on 
DelegateColumn.java. By doing this we don't invoke Object.toString() which was 
what was causing the invalid column names to be inserted into the generated 
UPSERT statement.

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-v1.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944510#comment-14944510
 ] 

Jan Fernando commented on PHOENIX-2310:
---

Here's a stack trace of when this bug occurs for reference:

ava.lang.Exception: java.lang.RuntimeException: 
org.apache.phoenix.exception.BatchUpdateExecution: ERROR 1106 (XCL06): 
Exception while executing batch.
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.RuntimeException: 
org.apache.phoenix.exception.BatchUpdateExecution: ERROR 1106 (XCL06): 
Exception while executing batch.
at 
org.apache.phoenix.mapreduce.PhoenixRecordWriter.close(PhoenixRecordWriter.java:62)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.close(PigOutputFormat.java:153)
at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:550)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:629)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.phoenix.exception.BatchUpdateExecution: ERROR 1106 
(XCL06): Exception while executing batch.
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeBatch(PhoenixStatement.java:1264)
at 
org.apache.phoenix.mapreduce.PhoenixRecordWriter.close(PhoenixRecordWriter.java:58)
... 10 more
Caused by: org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 
(42I01): Undefined column family. familyName=org.null
at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:787)
at 
org.apache.phoenix.compile.FromCompiler$SingleTableColumnResolver.resolveColumn(FromCompiler.java:361)
at 
org.apache.phoenix.compile.UpsertCompiler.compile(UpsertCompiler.java:344)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:546)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableUpsertStatement.compilePlan(PhoenixStatement.java:534)
at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:314)
at 
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:307)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:305)
at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:235)
at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeBatch(PhoenixStatement.java:1258)
... 11 more

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-v1.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM 

[jira] [Commented] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944513#comment-14944513
 ] 

Jan Fernando commented on PHOENIX-2310:
---

[~jamestaylor] One other thought, do you think we should also add the 
toString() method to DelegateTable.java? 

> PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce 
> integration generates incorrect upsert statement for view immediately after 
> issue view DDL
> --
>
> Key: PHOENIX-2310
> URL: https://issues.apache.org/jira/browse/PHOENIX-2310
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2310-v1.patch
>
>
> We ran into what I believe is a corner case that was causing a M/R job using 
> the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT 
> statement being generated. 
> The issue was intermittent. The bug is that the UPSERT statement generated by 
> PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
> PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
> certain columns the result class name + hashcode as generated by Java's 
> Object.toString(). Since this was not a valid column name the Pig Script 
> would blow-up.
> This only occurs if we are attempting to insert data into a Phoenix View and 
> the DDL for the Phoenix View was issued recently such that the MetadataClient 
> cache was for this view was populated by MetaDataClient.createTable(). 
> What is occurring is in this case we wrap the PColumn in a Delegate at lines 
> 1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't 
> implement toString() and so the default Object toString() is used. If you 
> restart the JVM and force Phoenix to re-read the metadata from SYSTEM.CATALOG 
> this doesn't occur as in this case we don't wrap the PColumn instance.
> I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2310) PhoenixConfigurationUtil.getUpsertColumnMetadataList() in Phoenix Mapreduce integration generates incorrect upsert statement for view immediately after issue view DDL

2015-10-05 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-2310:
-

 Summary: PhoenixConfigurationUtil.getUpsertColumnMetadataList() in 
Phoenix Mapreduce integration generates incorrect upsert statement for view 
immediately after issue view DDL
 Key: PHOENIX-2310
 URL: https://issues.apache.org/jira/browse/PHOENIX-2310
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.5.2
Reporter: Jan Fernando
Assignee: Jan Fernando


We ran into what I believe is a corner case that was causing a M/R job using 
the Phoenix / Pig integration to fail by virtue of a incorrect UPSERT statement 
being generated. 

The issue was intermittent. The bug is that the UPSERT statement generated by 
PhoenixConfigurationUtil.getUpsertColumnMetadataList() when invoked from 
PhoenixRecordWriter would, instead of the cf.column_name, would contain for 
certain columns the result class name + hashcode as generated by Java's 
Object.toString(). Since this was not a valid column name the Pig Script would 
blow-up.

This only occurs if we are attempting to insert data into a Phoenix View and 
the DDL for the Phoenix View was issued recently such that the MetadataClient 
cache was for this view was populated by MetaDataClient.createTable(). 

What is occurring is in this case we wrap the PColumn in a Delegate at lines 
1898 and 1909. The DelegateColumn class used to wrap PColumn doesn't implement 
toString() and so the default Object toString() is used. If you restart the JVM 
and force Phoenix to re-read the metadata from SYSTEM.CATALOG this doesn't 
occur as in this case we don't wrap the PColumn instance.

I have a test to repro and a possible patch I'll attach shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-10-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando resolved PHOENIX-2285.
---
   Resolution: Fixed
Fix Version/s: 4.5.3

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Fix For: 4.5.3
>
> Attachments: PHOENIX-2285-v1.txt, PHOENIX-2285-v2.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-10-02 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941724#comment-14941724
 ] 

Jan Fernando commented on PHOENIX-2285:
---

[~jamestaylor] I committed the patch to master, all 4.5 and and all 4.x 
branches. Is everything okay with the Jenkins builds? There don't seem to be 
test failures and the build seems to have been red before I committed. Let me 
know if this related to me commit and I should take action.

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt, PHOENIX-2285-v2.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-10-01 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940685#comment-14940685
 ] 

Jan Fernando commented on PHOENIX-2285:
---

I'll get the repo setup and tackle this tomorrow.

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt, PHOENIX-2285-v2.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-09-28 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933522#comment-14933522
 ] 

Jan Fernando edited comment on PHOENIX-2285 at 9/28/15 4:23 PM:


[~jamestaylor] I uploaded a second patch file with the unnecessary throws 
clause removed. Let me know if that works. Thanks! 


was (Author: jfernando_sfdc):
[~jamestaylor] I uploaded a second path file the unnecessary throws clause 
removed. Let me know if that works. Thanks! 

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt, PHOENIX-2285-v2.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-09-28 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2285:
--
Attachment: PHOENIX-2285-v2.txt

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt, PHOENIX-2285-v2.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-09-24 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2285:
--
Attachment: PHOENIX-2285-v1.txt

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-09-24 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907067#comment-14907067
 ] 

Jan Fernando commented on PHOENIX-2285:
---

[~jamestaylor] I've attached a potential patch that will honor the 
phoenix.query.timeMs value to millisecond level granularity. The main changes 
are in PhoenixStatement, where I made changes to store the timeout value in 
milliseconds and not seconds. To honor the JDBC contract for setting query 
timeouts I now do the conversion on the set and get methods instead of at 
initialization time. BaseResultIterators no longer needs to convert the query 
time out to millis and just uses the millisecond value stored on the statement.

In terms of testing, I couldn't find any tests specific to the behavior of 
phoenix.query.timeMs and so went ahead added both unit and IT tests that are in 
the proposed patch. 

Let me know what you think.

cc [~samarthjain]

> phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 
> 1 second
> --
>
> Key: PHOENIX-2285
> URL: https://issues.apache.org/jira/browse/PHOENIX-2285
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.2
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2285-v1.txt
>
>
> When creating a Phoenix JDBC connection I have a use case where I want to 
> override the default value of phoenix.query.timeoutMs to a value of 200 ms. 
> Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the 
> timeout gets rounded up to 1000ms. This is because in 
> PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
> phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
> BaseResultIterators we then convert it back to millis. As a result of the 
> conversion we loose the millisecond fidelity.
> A possible solution is to store the timeout value stored on the 
> PhoenixStatement in both seconds and milliseconds. Then, in 
> BaseResultIterators when we read the value from the statement we can check if 
> the value exists in millisecond fidelity and if so use that value. Otherwise 
> we would use the value in second granularity and convert. 
> This would allow Phoenix to remain JDBC compatible with second level 
> granularity for setting query timeouts on statements, but allow millisecond 
> granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
> connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2285) phoenix.query.timeoutMs doesn't allow callers to set the timeout to less than 1 second

2015-09-22 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-2285:
-

 Summary: phoenix.query.timeoutMs doesn't allow callers to set the 
timeout to less than 1 second
 Key: PHOENIX-2285
 URL: https://issues.apache.org/jira/browse/PHOENIX-2285
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.5.2
Reporter: Jan Fernando


When creating a Phoenix JDBC connection I have a use case where I want to 
override the default value of phoenix.query.timeoutMs to a value of 200 ms. 

Currently if you set phoenix.query.timeoutMs to less than 1000 ms, the timeout 
gets rounded up to 1000ms. This is because in 
PhoenixStatement.getDefaultQueryTimeout() we convert the value of 
phoenix.query.timeoutMs to seconds in order to be compliant with JDBC. In 
BaseResultIterators we then convert it back to millis. As a result of the 
conversion we loose the millisecond fidelity.

A possible solution is to store the timeout value stored on the 
PhoenixStatement in both seconds and milliseconds. Then, in BaseResultIterators 
when we read the value from the statement we can check if the value exists in 
millisecond fidelity and if so use that value. Otherwise we would use the value 
in second granularity and convert. 

This would allow Phoenix to remain JDBC compatible with second level 
granularity for setting query timeouts on statements, but allow millisecond 
granularity of timeouts by explicitly setting phoenix.query.timeoutMs on 
connection properties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2234) [Pherf] Add attribute to dataSequence column attribute to support encoding sequence in different formats e.g. base62

2015-09-04 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-2234:
-

 Summary: [Pherf] Add attribute to dataSequence column attribute to 
support encoding sequence in different formats e.g. base62
 Key: PHOENIX-2234
 URL: https://issues.apache.org/jira/browse/PHOENIX-2234
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando


It would be really useful to be able to specify an encoding format for the 
generated sequence value for a Pherf  in the  section. 
Pherf would still generate an integer as it does now, but before appending it 
to the column it would encode per the configured format.

For example:


CHAR
true
SEQUENTIAL
10
ABCDEFGH
MY_PHERF_GENERATED_ID


So for the 100th sequence Pherf would end up generating like: ABCDEFGH  1c




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2235) [Pherf] Allow users to specify a postfix as well as a prefix for CHAR data types and allow sequence placement to be configurable

2015-09-04 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-2235:
-

 Summary: [Pherf] Allow users to specify a postfix as well as a 
prefix for CHAR data types and allow sequence placement to be configurable
 Key: PHOENIX-2235
 URL: https://issues.apache.org/jira/browse/PHOENIX-2235
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando


When building char data with Pherf it would be with a prefix and sequential 
values currently Pherf appends the sequence value to the prefix and generates a 
random string to pad the string to the required length.

e.g. myprefix123rndstring

1) It would be great to be able to define a postfix that would be used instead 
of a random string.


CHAR
true
SEQUENTIAL
8
ID
myprefix
0


This would generate the following for example: myprefix1230
The postfix would repeated and/or truncated to pad the string correctly.

2) It would really useful additional to control whether the sequence appears 
after the prefix or postfix. 

e.g. to generate myprefixrndstring123 instead we could configure the column to 
use an additional position attribute on datasequence.

 
CHAR
true
SEQUENTIAL
8
ID
myprefix





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-2227) [Pherf] Add ability to execute a DDL statement at the scenario level before any workloads are run

2015-09-03 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-2227:
-

 Summary: [Pherf] Add ability to execute a DDL statement at the 
scenario level before any workloads are run
 Key: PHOENIX-2227
 URL: https://issues.apache.org/jira/browse/PHOENIX-2227
 Project: Phoenix
  Issue Type: Improvement
Reporter: Jan Fernando
Assignee: Jan Fernando


In order to support Pherf scenarios that write and read from multi-tenant views 
it would be useful to add ddl attribute at the scenario level. This DDL would 
get run before the workload ran. This would provide a nice hook to create a 
mulit-tenant view that we are going to read and write from. This is analogous 
to how Pherf allows ddl to be created at the querySet level for enabling 
multi-tenant reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2227) [Pherf] Add ability to execute a DDL statement at the scenario level before any workloads are run

2015-09-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2227:
--
Attachment: PHOENIX-2227.patch

[~cody.mar...@gmail.com] Here's a possible patch for what I had in mind for 
this. Let me know what you think.

> [Pherf] Add ability to execute a DDL statement at the scenario level before 
> any workloads are run
> -
>
> Key: PHOENIX-2227
> URL: https://issues.apache.org/jira/browse/PHOENIX-2227
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Jan Fernando
>Assignee: Jan Fernando
> Attachments: PHOENIX-2227.patch
>
>
> In order to support Pherf scenarios that write and read from multi-tenant 
> views it would be useful to add ddl attribute at the scenario level. This DDL 
> would get run before the workload ran. This would provide a nice hook to 
> create a mulit-tenant view that we are going to read and write from. This is 
> analogous to how Pherf allows ddl to be created at the querySet level for 
> enabling multi-tenant reads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2194) order by should not require all PK fields with = constraint

2015-09-02 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2194:
--
Attachment: PHOENIX-2194-tests2.patch

Added another set of tests, this patch supersedes the first test patch file. 
Adds tests that have more than one predicate in the view definition.

> order by should not require all PK fields with = constraint
> ---
>
> Key: PHOENIX-2194
> URL: https://issues.apache.org/jira/browse/PHOENIX-2194
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.0
> Environment: linux
>Reporter: Gary Horen
>  Labels: AtMention, SFDC
> Attachments: PHOENIX-2194-tests.patch, PHOENIX-2194-tests2.patch, 
> PHOENIX-2194.patch, PHOENIX-2194_v2.patch
>
>
> Here is a table:
> CREATE TABLE IF NOT EXISTS FEEDS.STUFF
> (
> STUFF CHAR(15) NOT NULL,
> NONSENSE CHAR(15) NOT NULL
> CONSTRAINT PK PRIMARY KEY
> (
> STUFF,
> NONSENSE
> 
> )
> ) VERSIONS=1,MULTI_TENANT=TRUE,REPLICATION_SCOPE=1
> Here is a query:
> explain SELECT * FROM feeds.stuff
> where stuff = ' '
> and nonsense > ' '
> order by nonsense
> Here is the plan:
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN  
> SERVER FILTER BY FIRST KEY ONLY   
> SERVER TOP 100 ROWS SORTED BY [NONSE  
> CLIENT MERGE SORT   
> If I change to ORDER BY STUFF, NONSENSE I get:
> CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN O  
> SERVER FILTER BY FIRST KEY ONLY AND   
> SERVER 100 ROW LIMIT  
> CLIENT 100 ROW LIMIT  
> Since the leading constraint is =,  ORDER BY will be unaffected by it, so 
> ORDER BY should not need the leading constraint; it should only require the 
> columns whose values would vary (which, since they are ordered by the key, 
> should (and do) result in the client side sort being optimized out.) Having 
> to include the leading = constraints in the ORDER BY clause is very 
> counter-intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-2194) order by should not require all PK fields with = constraint

2015-09-02 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728087#comment-14728087
 ] 

Jan Fernando edited comment on PHOENIX-2194 at 9/2/15 9:49 PM:
---

Attaching some additional IT tests that repro the issues.


was (Author: jfernando_sfdc):
Attaching some addition IT tests that repro the issues.

> order by should not require all PK fields with = constraint
> ---
>
> Key: PHOENIX-2194
> URL: https://issues.apache.org/jira/browse/PHOENIX-2194
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.0
> Environment: linux
>Reporter: Gary Horen
>  Labels: AtMention, SFDC
> Attachments: PHOENIX-2194-tests.patch, PHOENIX-2194.patch, 
> PHOENIX-2194_v2.patch
>
>
> Here is a table:
> CREATE TABLE IF NOT EXISTS FEEDS.STUFF
> (
> STUFF CHAR(15) NOT NULL,
> NONSENSE CHAR(15) NOT NULL
> CONSTRAINT PK PRIMARY KEY
> (
> STUFF,
> NONSENSE
> 
> )
> ) VERSIONS=1,MULTI_TENANT=TRUE,REPLICATION_SCOPE=1
> Here is a query:
> explain SELECT * FROM feeds.stuff
> where stuff = ' '
> and nonsense > ' '
> order by nonsense
> Here is the plan:
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN  
> SERVER FILTER BY FIRST KEY ONLY   
> SERVER TOP 100 ROWS SORTED BY [NONSE  
> CLIENT MERGE SORT   
> If I change to ORDER BY STUFF, NONSENSE I get:
> CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN O  
> SERVER FILTER BY FIRST KEY ONLY AND   
> SERVER 100 ROW LIMIT  
> CLIENT 100 ROW LIMIT  
> Since the leading constraint is =,  ORDER BY will be unaffected by it, so 
> ORDER BY should not need the leading constraint; it should only require the 
> columns whose values would vary (which, since they are ordered by the key, 
> should (and do) result in the client side sort being optimized out.) Having 
> to include the leading = constraints in the ORDER BY clause is very 
> counter-intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2194) order by should not require all PK fields with = constraint

2015-09-02 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2194:
--
Attachment: PHOENIX-2194-tests.patch

Attaching some addition IT tests that repro the issues.

> order by should not require all PK fields with = constraint
> ---
>
> Key: PHOENIX-2194
> URL: https://issues.apache.org/jira/browse/PHOENIX-2194
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.5.0
> Environment: linux
>Reporter: Gary Horen
>  Labels: AtMention, SFDC
> Attachments: PHOENIX-2194-tests.patch, PHOENIX-2194.patch, 
> PHOENIX-2194_v2.patch
>
>
> Here is a table:
> CREATE TABLE IF NOT EXISTS FEEDS.STUFF
> (
> STUFF CHAR(15) NOT NULL,
> NONSENSE CHAR(15) NOT NULL
> CONSTRAINT PK PRIMARY KEY
> (
> STUFF,
> NONSENSE
> 
> )
> ) VERSIONS=1,MULTI_TENANT=TRUE,REPLICATION_SCOPE=1
> Here is a query:
> explain SELECT * FROM feeds.stuff
> where stuff = ' '
> and nonsense > ' '
> order by nonsense
> Here is the plan:
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN  
> SERVER FILTER BY FIRST KEY ONLY   
> SERVER TOP 100 ROWS SORTED BY [NONSE  
> CLIENT MERGE SORT   
> If I change to ORDER BY STUFF, NONSENSE I get:
> CLIENT 1-CHUNK SERIAL 1-WAY RANGE SCAN O  
> SERVER FILTER BY FIRST KEY ONLY AND   
> SERVER 100 ROW LIMIT  
> CLIENT 100 ROW LIMIT  
> Since the leading constraint is =,  ORDER BY will be unaffected by it, so 
> ORDER BY should not need the leading constraint; it should only require the 
> columns whose values would vary (which, since they are ordered by the key, 
> should (and do) result in the client side sort being optimized out.) Having 
> to include the leading = constraints in the ORDER BY clause is very 
> counter-intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

2015-08-11 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682234#comment-14682234
 ] 

Jan Fernando commented on PHOENIX-2154:
---

+1 to [~elilevine]'s comments. Resiliency over speed is the right trade off 
IMHO.

 Failure of one mapper should not affect other mappers in MR index build
 ---

 Key: PHOENIX-2154
 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor

 Once a mapper in the MR index job succeeds, it should not need to be re-done 
 in the event of the failure of one of the other mappers. The initial 
 population of an index is based on a snapshot in time, so new rows getting 
 *after* the index build has started and/or failed do not impact it.
 Also, there's a 1:1 correspondence between index rows and table rows, so 
 there's really no need to dedup. However, the index rows will have a 
 different row key than the data table, so I'm not sure how the HFiles are 
 split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1791) Pherf - Support data creation to multi-tenant views

2015-08-10 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680325#comment-14680325
 ] 

Jan Fernando commented on PHOENIX-1791:
---

[~cody.mar...@gmail.com] Is this all good now or do you need me to regenerate 
the patch?

 Pherf - Support data creation to multi-tenant views
 ---

 Key: PHOENIX-1791
 URL: https://issues.apache.org/jira/browse/PHOENIX-1791
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando
Assignee: Jan Fernando
Priority: Minor
  Labels: Newbie
 Attachments: PHOENIX-1791.patch


 Pherf currently doesn't allow the data creation phase to use a multi-tenant 
 connection write directly to a multi-tenant view. This enhancement would 
 allow users to write data directly to multi-tenant views that have the same 
 underlying table but where each view has a different set of custom columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1791) Pherf - Support data creation to multi-tenant views

2015-08-06 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1791:
--
Attachment: PHOENIX-1791.patch

 Pherf - Support data creation to multi-tenant views
 ---

 Key: PHOENIX-1791
 URL: https://issues.apache.org/jira/browse/PHOENIX-1791
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando
Assignee: Jan Fernando
Priority: Minor
  Labels: Newbie
 Attachments: PHOENIX-1791.patch


 Pherf currently doesn't allow the data creation phase to use a multi-tenant 
 connection write directly to a multi-tenant view. This enhancement would 
 allow users to write data directly to multi-tenant views that have the same 
 underlying table but where each view has a different set of custom columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1791) Pherf - Support data creation to multi-tenant views

2015-08-06 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660849#comment-14660849
 ] 

Jan Fernando commented on PHOENIX-1791:
---

[~cody.mar...@gmail.com] I've created a pull request with changes to support 
loading data into mulit-tenant views with Pherf.

 Pherf - Support data creation to multi-tenant views
 ---

 Key: PHOENIX-1791
 URL: https://issues.apache.org/jira/browse/PHOENIX-1791
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando
Assignee: Jan Fernando
Priority: Minor
  Labels: Newbie

 Pherf currently doesn't allow the data creation phase to use a multi-tenant 
 connection write directly to a multi-tenant view. This enhancement would 
 allow users to write data directly to multi-tenant views that have the same 
 underlying table but where each view has a different set of custom columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-07-27 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643518#comment-14643518
 ] 

Jan Fernando commented on PHOENIX-2149:
---

[~tdsilva] Thanks. Quick question, I thought with PHOENIX-922 you didn't need 
the FROM clause anymore?

 MAX Value of Sequences not honored when closing Connection between calls to 
 NEXT VALUE FOR
 --

 Key: PHOENIX-2149
 URL: https://issues.apache.org/jira/browse/PHOENIX-2149
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Jan Fernando
Assignee: Jan Fernando
 Fix For: 4.4.1

 Attachments: PHOENIX-2149-v2.patch, PHOENIX-2149.patch


 There appears to be an issue be related to closing connections between calls 
 to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
 found scenarios when I am allocating sequences near the MAX whereby the MAX 
 is not honored and value greater than the max are returned by NEXT VALUE FOR.
 It appears to be related to the logic to return all sequences on connection 
 close. It looks like if you close the connection between each invocation when 
 you hit the max value instead of the expected error being thrown sequence 
 values continue to be doled out. It looks like for some reason the 
 limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
 for the sequence in this case.
 I added the test below to SequenceBulkAllocationIT that repros the issue.
 If I either a) remove the nextConnection() call that keeps recycling 
 connections in the test below or b) comment our the code in 
 PhoenixConnection.close() that calls services.removeConnection() the test 
 below starts to pass.
 I wasn't able to repro in Squirrel because I guess it doesn't recycle 
 connections.
 {code}
 @Test
 public void testNextValuesForSequenceClosingConnections() throws 
 Exception {
 final SequenceProperties props =
 new 
 SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
 .numAllocated(4989).build();
 
 // Create Sequence
 nextConnection();
 createSequenceWithMinMax(props);
 nextConnection();
 
 // Try and get next value
 try {
 long val = 0L;
 for (int i = 0; i = 11; i++) {
 ResultSet rs = 
 conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
 bulkalloc.alpha));
 rs.next();
 val = rs.getLong(1);
 nextConnection();
 }
 fail(Expect to fail as this value is greater than seq max  + 
 val);
 } catch (SQLException e) {
 
 assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
 e.getErrorCode());
 assertTrue(e.getNextException() == null);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-07-26 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641996#comment-14641996
 ] 

Jan Fernando commented on PHOENIX-2149:
---

I think the bug is caused by the fact we always set the LIMIT_REACHED_FLAG to 
false when return sequences and therefore clobber the value if it was set to 
true on the last allocation.

Here's the offending snippet from Sequence.newReturn()

{code}
familyMap.put(PhoenixDatabaseMetaData.SEQUENCE_FAMILY_BYTES, 
Arrays.CellasList(
(Cell)KeyValueUtil.newKeyValue(key, 
PhoenixDatabaseMetaData.SEQUENCE_FAMILY_BYTES, 
PhoenixDatabaseMetaData.CURRENT_VALUE_BYTES, value.timestamp, 
PLong.INSTANCE.toBytes(value.currentValue))
// set LIMIT_REACHED flag to false since we are 
returning unused sequence values
,(Cell)KeyValueUtil.newKeyValue(key, 
PhoenixDatabaseMetaData.SEQUENCE_FAMILY_BYTES, 
PhoenixDatabaseMetaData.LIMIT_REACHED_FLAG_BYTES, value.timestamp, 
PDataType.FALSE_BYTES)
));
{code}

 MAX Value of Sequences not honored when closing Connection between calls to 
 NEXT VALUE FOR
 --

 Key: PHOENIX-2149
 URL: https://issues.apache.org/jira/browse/PHOENIX-2149
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Jan Fernando
Assignee: Thomas D'Silva

 There appears to be an issue be related to closing connections between calls 
 to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
 found scenarios when I am allocating sequences near the MAX whereby the MAX 
 is not honored and value greater than the max are returned by NEXT VALUE FOR.
 It appears to be related to the logic to return all sequences on connection 
 close. It looks like if you close the connection between each invocation when 
 you hit the max value instead of the expected error being thrown sequence 
 values continue to be doled out. It looks like for some reason the 
 limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
 for the sequence in this case.
 I added the test below to SequenceBulkAllocationIT that repros the issue.
 If I either a) remove the nextConnection() call that keeps recycling 
 connections in the test below or b) comment our the code in 
 PhoenixConnection.close() that calls services.removeConnection() the test 
 below starts to pass.
 I wasn't able to repro in Squirrel because I guess it doesn't recycle 
 connections.
 {code}
 @Test
 public void testNextValuesForSequenceClosingConnections() throws 
 Exception {
 final SequenceProperties props =
 new 
 SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
 .numAllocated(4989).build();
 
 // Create Sequence
 nextConnection();
 createSequenceWithMinMax(props);
 nextConnection();
 
 // Try and get next value
 try {
 long val = 0L;
 for (int i = 0; i = 11; i++) {
 ResultSet rs = 
 conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
 bulkalloc.alpha));
 rs.next();
 val = rs.getLong(1);
 nextConnection();
 }
 fail(Expect to fail as this value is greater than seq max  + 
 val);
 } catch (SQLException e) {
 
 assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
 e.getErrorCode());
 assertTrue(e.getNextException() == null);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-07-26 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-2149:
--
Attachment: PHOENIX-2149.patch

[~jamestaylor] Here is a patch file with the fix. I think the fix is pretty 
simple and is simply a matter of sending the actual client side value for the 
LIMIT_REACHED_FLAG to the server in the HBase put instead of always sending it 
as false. If you think it might require more involved changes, I probably can't 
get to those for the next few days.

 MAX Value of Sequences not honored when closing Connection between calls to 
 NEXT VALUE FOR
 --

 Key: PHOENIX-2149
 URL: https://issues.apache.org/jira/browse/PHOENIX-2149
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Jan Fernando
Assignee: Jan Fernando
 Attachments: PHOENIX-2149.patch


 There appears to be an issue be related to closing connections between calls 
 to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
 found scenarios when I am allocating sequences near the MAX whereby the MAX 
 is not honored and value greater than the max are returned by NEXT VALUE FOR.
 It appears to be related to the logic to return all sequences on connection 
 close. It looks like if you close the connection between each invocation when 
 you hit the max value instead of the expected error being thrown sequence 
 values continue to be doled out. It looks like for some reason the 
 limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
 for the sequence in this case.
 I added the test below to SequenceBulkAllocationIT that repros the issue.
 If I either a) remove the nextConnection() call that keeps recycling 
 connections in the test below or b) comment our the code in 
 PhoenixConnection.close() that calls services.removeConnection() the test 
 below starts to pass.
 I wasn't able to repro in Squirrel because I guess it doesn't recycle 
 connections.
 {code}
 @Test
 public void testNextValuesForSequenceClosingConnections() throws 
 Exception {
 final SequenceProperties props =
 new 
 SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
 .numAllocated(4989).build();
 
 // Create Sequence
 nextConnection();
 createSequenceWithMinMax(props);
 nextConnection();
 
 // Try and get next value
 try {
 long val = 0L;
 for (int i = 0; i = 11; i++) {
 ResultSet rs = 
 conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
 bulkalloc.alpha));
 rs.next();
 val = rs.getLong(1);
 nextConnection();
 }
 fail(Expect to fail as this value is greater than seq max  + 
 val);
 } catch (SQLException e) {
 
 assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
 e.getErrorCode());
 assertTrue(e.getNextException() == null);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-2149) MAX Value of Sequences not honored when closing Connection between calls to NEXT VALUE FOR

2015-07-26 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando reassigned PHOENIX-2149:
-

Assignee: Jan Fernando  (was: Thomas D'Silva)

 MAX Value of Sequences not honored when closing Connection between calls to 
 NEXT VALUE FOR
 --

 Key: PHOENIX-2149
 URL: https://issues.apache.org/jira/browse/PHOENIX-2149
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.4.0
Reporter: Jan Fernando
Assignee: Jan Fernando

 There appears to be an issue be related to closing connections between calls 
 to NEXT VALUE FOR that causes the MAX sequence value to be ignored. I have 
 found scenarios when I am allocating sequences near the MAX whereby the MAX 
 is not honored and value greater than the max are returned by NEXT VALUE FOR.
 It appears to be related to the logic to return all sequences on connection 
 close. It looks like if you close the connection between each invocation when 
 you hit the max value instead of the expected error being thrown sequence 
 values continue to be doled out. It looks like for some reason the 
 limit_reached_flag is not being set correctly on the SYSTEM.SEQUENCE table 
 for the sequence in this case.
 I added the test below to SequenceBulkAllocationIT that repros the issue.
 If I either a) remove the nextConnection() call that keeps recycling 
 connections in the test below or b) comment our the code in 
 PhoenixConnection.close() that calls services.removeConnection() the test 
 below starts to pass.
 I wasn't able to repro in Squirrel because I guess it doesn't recycle 
 connections.
 {code}
 @Test
 public void testNextValuesForSequenceClosingConnections() throws 
 Exception {
 final SequenceProperties props =
 new 
 SequenceProperties.Builder().incrementBy(1).startsWith(4990).cacheSize(10).minValue(4990).maxValue(5000)
 .numAllocated(4989).build();
 
 // Create Sequence
 nextConnection();
 createSequenceWithMinMax(props);
 nextConnection();
 
 // Try and get next value
 try {
 long val = 0L;
 for (int i = 0; i = 11; i++) {
 ResultSet rs = 
 conn.createStatement().executeQuery(String.format(SELECT_NEXT_VALUE_SQL, 
 bulkalloc.alpha));
 rs.next();
 val = rs.getLong(1);
 nextConnection();
 }
 fail(Expect to fail as this value is greater than seq max  + 
 val);
 } catch (SQLException e) {
 
 assertEquals(SQLExceptionCode.SEQUENCE_VAL_REACHED_MAX_VALUE.getErrorCode(),
 e.getErrorCode());
 assertTrue(e.getNextException() == null);
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-09 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621645#comment-14621645
 ] 

Jan Fernando commented on PHOENIX-1954:
---

[~jamestaylor] 

#1 Let me look at removing the RESERVE_SEQUENCE enum now and see if there's any 
reason to keep. I agree probably good to get rid of it, since we don't want 
allocations happening on expression evaluation.

#2 The patch file I uploaded had 3 commits in it, you probably looked at 
yesterday's. Latest version seems correct. I thought the multiple commit might 
make things clearer, but I was wrong obviously :) After I look at #1 I'll 
rebase and generate a new patch which should be clearer.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
 PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch, 
 PHOENIX-1954-wip6.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-09 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip7-rebase.patch

[~jamestaylor] Attached rebased patch that includes the remove of 
RESERVE_SEQUENCE. Let me know if anything looks off in the patch based on our 
discussions.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
 PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch, 
 PHOENIX-1954-wip6.patch, PHOENIX-1954-wip7-rebase.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-09 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-4.x-HBase-0.98.patch

Attaching patch for 4.x-HBase-0.98  branch.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-4.x-HBase-0.98.patch, 
 PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
 PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch, 
 PHOENIX-1954-wip6.patch, PHOENIX-1954-wip7-rebase.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-09 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621116#comment-14621116
 ] 

Jan Fernando commented on PHOENIX-1954:
---

[~jamestaylor] Re (1) This is turning out to be a bit more complex that it 
seems on the surface. The complexity comes from overflows and underflows and 
cycles for standard sequences. Here's what I have so far, but this doesn't work 
on sequences that over/underflow with cycles enabled (tests caught this). I'm 
still looking for a solution for this - the issue stems from the fact that when 
we overflow we the old check of value.currentValue == value.nextValue would 
return false when currentValue was at LONG.MAX as nextValue would have 
overflown, now it returns false.

{code}
public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
SequenceValue value = findSequenceValue(timestamp);
if (value == null) {
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}

if (isSequenceCacheExhausted(numToAllocate, value)) {
if (op == ValueOp.VALIDATE_SEQUENCE) {
return value.currentValue;
}
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}
return increment(value, op, numToAllocate);
}

/**
 * This method checks whether there are sufficient values in the 
SequenceValue
 * cached on the client to allocate the requested number of slots. It 
handles
 * decreasing and increasing sequence overflows or underflows cycles. 
 */
private boolean isSequenceCacheExhausted(final long numToAllocate, final 
SequenceValue value) {
// Perform a CheckedAdd to make sure if over/underflow 
// We don't treat this as the cache being exhausted as the current 
value may be valid in the case
// of no cycle, logic in increment() will take care of detecting we've 
hit the limit of the sequence
long targetSequenceValue;
try {
targetSequenceValue = LongMath.checkedAdd(value.currentValue, 
numToAllocate * value.incrementBy);
} catch (ArithmeticException e) {
return false;
}

if (value.incrementBy  0) {
return targetSequenceValue  value.nextValue;
} else {
return  targetSequenceValue  value.nextValue;
}

}
{code}

I am concerned that trying to change from the equality check for non-bulk 
allocations introduces a lot more complexity and therefore risk of regressions 
into the code. 

I think we could still fix the problem of allowing use of the cache for bulk 
allocations where there are enough values in the cache by only using the new 
logic for bulk allocations. Since we don't allow cycles this simplifies things:

{code}
public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
SequenceValue value = findSequenceValue(timestamp);
if (value == null) {
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}

if (SequenceUtil.isBulkAllocation(numToAllocate)  
isSequenceCacheExhausted(numToAllocate, value)) {
return unlessValidateSequenceThrowEmptySequenceCacheException(op, 
value);
} else if (isSequenceCacheExhausted(numToAllocate, value)) {
return unlessValidateSequenceThrowEmptySequenceCacheException(op, 
value);
}
return increment(value, op, numToAllocate);
}

private long unlessValidateSequenceThrowEmptySequenceCacheException(ValueOp 
op,
SequenceValue value) throws EmptySequenceCacheException {
if (op == ValueOp.VALIDATE_SEQUENCE) {
return value.currentValue;
}
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}

/**
 * This method checks whether there are sufficient values in the 
SequenceValue
 * cached on the client to allocate the requested number of slots. It 
handles
 * decreasing and increasing sequence overflows or underflows cycles. 
 */
private boolean isSequenceCacheExhausted(final long numToAllocate, final 
SequenceValue value) {
long targetSequenceValue;
try {
targetSequenceValue = LongMath.checkedAdd(value.currentValue, 
numToAllocate * value.incrementBy);
} catch (ArithmeticException e) {
// Perform a CheckedAdd to make sure if over/underflow 
// We don't treat this as the cache being exhausted as the current 
value may be valid in the case
// of no cycle, logic in increment() will take care of detecting 
we've hit the limit of the sequence
return false;
}

if (value.incrementBy  0) {
return targetSequenceValue  value.nextValue;
} else {
return  targetSequenceValue  value.nextValue;
}
}
{code}

I'm still digging into the cycle and 

[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-09 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip6.patch

[~jamestaylor] Following from our discussions attached is a patch that 
addresses issues (1) and (2).

1) We now support allocating slots in bulk directly from the client cache if 
the cache has sufficient slots available to service the request. I wasn't able 
to get rid of all the special casing as we ran into issues trying to have 
completely share the code for determining if the cache is exhausted across both 
the NEXT VALUE FOR and NEXT n VALUES FOR. The issues were related to cycles 
and overflows and underflows that created subtle differences in behavior in the 
2 flows that I couldn't genericize without making the code very brittle and 
introducing risks of hard to detect regressions. Therefore the existing check 
below remains so that NEXT VALUE FOR flow remains the same which will 
significantly reduce risk for strange regressions like the ones I found.

{code}
value.currentValue == value.nextValue
{code}

2) Added logic to ensure that if NEXT n VALUES FOR performs an illegal 
operation like requesting slots on a sequence with a cycle or exceeding the max 
and min we throw a SQLException during expression evaluation consistently in 
all cases. This involved adding client side validation, in lieu of forcing an 
RPC call each time. I added a few static methods on SequenceUtil to consolidate 
validation logic between server and client.

The attached patch isn't rebased so it's easier for you to see changes. Let me 
know if you'd like a rebased version with all the changes and I'll do that once 
you are ready.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
 PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch, 
 PHOENIX-1954-wip6.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-08 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip5-rebased.patch

[~jamestaylor] Following on from our discussion I am adding a new patch that
1) Removes the need to cache startValues from Bulk Allocations on the client
2) Changes the flow such that Sequences are only validated when the expression 
is executed and the allocation happens now when the client reads the result of 
the expression via rs.next()

One small implication of this change to call out is that previously if 
executing SELECT NEXT n VALUES FOR seq resulted in a SQL Exception, for 
example if we attempted to invoke it on a sequence with the CYCLE flag set we 
would not throw out the currently cached batch of sequences on the client. With 
this change we do throw those out as after the expression runs, the client 
state has been updated to reflect that all have been used. I think this is fine 
and reflected in my tests, but I wanted to note it here.

Finally, on the client side there are now only 2 places where I was not able to 
remove special handling Bulk Allocation mode.
1) We need to explicitly check whether we are in bulk allocation mode and 
immediately throw a EMPTY_SEQUENCE_CACHE_EXCEPTION in Sequence.incrementValue() 
as when a request for a bulk allocation is made we cannot reply on currentValue 
and nextValue being equal to trigger this.

{code}
public long incrementValue(long timestamp, ValueOp op, long numToAllocate) 
throws SQLException {
SequenceValue value = findSequenceValue(timestamp);
if (value == null || SequenceUtil.isBulkAllocation(numToAllocate)) {
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}

if (value.currentValue == value.nextValue) {
if (op == ValueOp.VALIDATE_SEQUENCE) {
return value.currentValue;
}
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
}
return increment(value, op, numToAllocate);
}
{code}

2) When we calculate the current value we can't just take the max of the 
numToAllocate and the cache size. There is a corner case where a client may 
request a bulk allocation and request a number of slots that is less than the 
cache size. We handle this in this way:

{code}
if (op != ValueOp.VALIDATE_SEQUENCE) {
// We can't just take the max of numToAllocate and cacheSize
// We need to handle a valid edgecase where a client requests 
bulk allocation of 
// a number of slots that are less than cache size of the 
sequence
currentValue -= incrementBy * 
(SequenceUtil.isBulkAllocation(numToAllocate) ? numToAllocate : cacheSize);
}
{code}

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, 
 PHOENIX-1954-wip4.patch, PHOENIX-1954-wip5-rebased.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-07 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617265#comment-14617265
 ] 

Jan Fernando commented on PHOENIX-1954:
---

I'm don't think that will work but maybe I am missing something :) 

The issue stems from the fact that there are 2 incrementValue() methods on 
Sequence.java that that executed in succession and invoked from different JDBC 
methods. The flow is:

 1) Execute expression SELECT NEXT n VALUES FOR seq and update sequence 
on the server  via stmt.executeQuery()
 2) Read back values from expression NEXT n VALUES FOR seq via 
rs.next() and rs.getInt()

The calls are:

1)  There is incrementValue() that processes the results of the RPC call that 
runs within the code synchronized on Sequence.lock() and is part of the flow 
that executes the expression. 

{code}
public long incrementValue(Result result, ValueOp op, long numToAllocate)
{code}

2) There is incrementValue() that is invoked when the client code executes 
rs.next() and called via SequenceTuple.incrementSequenceValues() and is purely 
client side code and it is here were we actually need to read the start value 
for the bulk allocated range.

{code}
public long incrementValue(long timestamp, ValueOp op, long numToAllocate)
{code}

For this case we need to return the start value from the allocation that 
happened which may not be the same as currentValue if NEXT VALUE FOR has been 
executed in between. Since it's here that we need to return the start value we 
have to cache it on the client separately from currentValue. At the time we get 
the value back from the server with #1, since we are still within the 
synchronized code, current value on the Result object is correct. However we 
read it back asynchronously in #2 we have to store it on the client.

[~jamestaylor] Thoughts?


 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, PHOENIX-1954-wip4.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-06 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip4.patch

Attaching patch that has all the changes I think we need. 

Per my last comment there was indeed a race condition in the code I added. I 
added several multi-threaded tests to verify this no longer an issue. The issue 
was that we can't rely on the state value.currentValue in Sequence.java when 
executing rs.next() to get the value to return from a NEXT n VALUES FOR seq 
expression. We need to store the Bulk Allocation start values in Sequence.java 
the same way we store the actual values. We store them by requested slots 
allocated so that clients that requested n slots are guaranteed to get a 
start value for range values that  reserved that number of slots. See the tests 
and code changes for more details.  

I also made the tests run in both regular and multi-tenant mode to verify there 
were not MT issues.

Tomorrow I'll do a bit more manual verification including some backwards 
compatibility checks locally.

[~jamestaylor] [~tdsilva] [~elilevine]

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch.txt, 
 PHOENIX-1954-wip3.patch, PHOENIX-1954-wip4.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-06 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-rebased.patch

Added rebased patch file with all my commits for this change merged.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-rebased.patch, PHOENIX-1954-wip.patch, 
 PHOENIX-1954-wip2.patch.txt, PHOENIX-1954-wip3.patch, PHOENIX-1954-wip4.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-05 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip3.patch

[~jamestaylor] [~tdsilva] Uploading a new patch that additionally has new 
changes I have made to reintroduce the the sorting of the nextSequences in 
SequenceManager. 

I removed the numAllocations array that we were maintaining on SequenceManager 
and introduced a new SequenceAllocation class that combines a SequenceKey and 
the num of slots to allocate. SequenceManager uses a ListSequenceAllocation 
instead of ListSequenceKey to store the ordered list of nextSequences . This 
allows us to preserve the correct number of slots to allocate for the correct 
sequence.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch.txt, 
 PHOENIX-1954-wip3.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614494#comment-14614494
 ] 

Jan Fernando commented on PHOENIX-1954:
---

In the following method in Sequence.java:

{code}
public long incrementValue(long timestamp, ValueOp op, long numToAllocate)
{code}

I added the following code:

{code}
if (SequenceUtil.isBulkAllocation(numToAllocate)) {
if (op == ValueOp.INCREMENT_SEQUENCE) {
// On calling rs.next() for NEXT n VALUES FOR seq, we need 
to calc value to return
// as currentValue was not adjusted for Bulk Allocation Flow, 
since we essentially reserve
// and use them at the same time
return value.currentValue - (numToAllocate * value.incrementBy);
} else {
if (op == ValueOp.RESERVE_SEQUENCE) {
throw EMPTY_SEQUENCE_CACHE_EXCEPTION;
} else {
return value.currentValue;
}
}
{code}

This handles returning the correct value back from the expression NEXT n 
VALUES FOR SEQUENCE seq when rs.next() is called after we bulk allocate a set 
number of slots. I was thinking about this a bit and I wonder whether there is 
a race condition here. When other clients get values via NEXT VALUE FOR on the 
same sequence concurrently can they move currentValue forward and cause the 
wrong value to be returned? I'll look some more deeply tomorrow but curious if 
you guys have any thoughts on whether this is the case [~jamestaylor] 
[~tdsilva]?

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch.txt, 
 PHOENIX-1954-wip3.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip2.patch

[~jamestaylor] [~tdsilva] I have uploaded a second WIP patch that I think 
implements a working version of NEXT n VALUES FOR seq based on the initial 
patch James provided. I've added a new IT test class 
SequenceBulkAllocationIT.java that covers interaction with various sequence 
features including MIN, MAX, CYCLE, different values of INCREMENT BY both 
negative and positive, interplay with NEXT VALUE FOR and CURRENT VALUE. So far 
seems good :) I have been running SequenceIT.java to look for regressions.

Some specific behaviors implemented in this patch I want to call out:

1) When performing a bulk allocation if we hit the MIN or MAX or 
Overflow/Underflow conditions we throw an exception. We don't support partial 
allocations. You either get all the values you request or none.

2) We currently don't support the new syntax on sequences that have the CYCLE 
flag set. If you try and execute NEXT n VALUES FOR seq on a sequence an 
exception is thrown. [~tdsilva] and I chatted about this offline and felt bulk 
allocation and cycles together raise a whole host of weird behaviors that we 
have to handle - for example we probably wouldn't want to cycle across a bulk 
allocation as the semantics across which slots were allocated becomes very 
complex. I propose deferring this work to another separate JIRA.

3) When including multiple NEXT n VALUES FOR and NEXT VALUE expressions in a 
statement we honor the expression which allocates the highest number of values 
and all expressions (including any CURRENT VALUE FOR) expressions return the 
value for that expression.

4) [~jamestaylor] One specific change I want to call out that I had to make to 
support the parallel numAllocations array in the SequenceManager was in 
SequenceManager.validateSequences() at line 206 in the patch we were invoking 
Collections.sort(nextSequences). This caused problems when we had multiple 
expressions in for different sequences in a statement. The number of slots to 
allocate would get out of sync with the sequences and would be applied to 
incorrect sequences. I commented out this sort out and it looks to me it's not 
needed. What do you think? Was there a reason we were doing this that the tests 
in SequenceIT.java don't cover?

I still need to do some more testing in addition to the IT tests I added in a 
SequenceBulkAllocationIT.java. Specifically I still need to validate:
1) Backwards compatibility with older clients and upgraded server
2) Test Sequences with Multi-Tenant views to make sure there are no regressions 
there and the new semantics works
3) Look at the sequence reclaiming logic we have and make sure there is no 
additional changes needed there to support the new semantics.
4) Run more existing IT tests.

Please take a look when you get a chance.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: (was: PHOENIX-1954-wip2.patch)

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-03 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Attachment: PHOENIX-1954-wip2.patch.txt

Okay uploading the correct path this time. Please ignore previous one - it was 
half-baked patch I used to move between machines.

[~jamestaylor] [~tdsilva]

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch.txt


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-07-03 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613539#comment-14613539
 ] 

Jan Fernando commented on PHOENIX-1954:
---

Got it. Okay, I'll go ahead and readd the sort and introduce a new SequenceInfo 
or some such class. I agree that will make things clearer. I'll upload another 
patch when I'm done with that.

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando
 Attachments: PHOENIX-1954-wip.patch, PHOENIX-1954-wip2.patch.txt


 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1954) Reserve chunks of numbers for a sequence

2015-06-30 Thread Jan Fernando (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Fernando updated PHOENIX-1954:
--
Assignee: Jan Fernando  (was: Thomas D'Silva)

 Reserve chunks of numbers for a sequence
 

 Key: PHOENIX-1954
 URL: https://issues.apache.org/jira/browse/PHOENIX-1954
 Project: Phoenix
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jan Fernando

 In order to be able to generate many ids in bulk (for example in map reduce 
 jobs) we need a way to generate or reserve large sets of ids. We also need to 
 mix ids reserved with incrementally generated ids from other clients. 
 For this we need to atomically increment the sequence and return the value it 
 had when the increment happened.
 If we're OK to throw the current cached set of values away we can do
 {{NEXT VALUE FOR seq(,N)}}, that needs to increment value and return the 
 value it incremented from (i.e. it has to throw the current cache away, and 
 return the next value it found at the server).
 Or we can invent a new syntax {{RESERVE VALUES FOR seq, N}} that does the 
 same, but does not invalidate the cache.
 Note that in either case we won't retrieve the reserved set of values via 
 {{NEXT VALUE FOR}} because we'd need to be idempotent in our case, all we 
 need to guarantee is that after a call to {{RESERVE VALUES FOR seq, N}}, 
 which returns a value M is that the range [M, M+N) won't be used by any 
 other user of the sequence. My might need reserve 1bn ids this way ahead of a 
 map reduce run.
 Any better ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1819) Report resource consumption per phoenix statement

2015-05-28 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563927#comment-14563927
 ] 

Jan Fernando commented on PHOENIX-1819:
---

[~samarthjain] This looks fantastic! Really great work. 

As we discussed I think there are 2 small changes to help make it easier for 
clients to consume the metrics:
1) Change:

{code}
public static MapString, ListPairString, Long 
getReadMetricsForLastCommit(Connection conn)
-
public static MapString, MapString, Long 
getReadMetricsForLastCommit(Connection conn)
{code}

and 

{code}
public static MapString, ListPairString, Long 
getMutationMetricsForLastCommit(Connection conn)
-
public static MapString, MapString, Long 
getMutationMetricsForLastCommit(Connection conn)
{code}

in PhoenixRuntime to make it easier for clients to look up specific metrics by 
name and make the API consistent with the ResultSet level metric methods.

2) Figure out how best to expose the metric names to clients so that they have 
a stable way to look up metrics versus right now relying on the Enum.name().


 Report resource consumption per phoenix statement
 -

 Key: PHOENIX-1819
 URL: https://issues.apache.org/jira/browse/PHOENIX-1819
 Project: Phoenix
  Issue Type: New Feature
Reporter: Samarth Jain
Assignee: Samarth Jain
 Fix For: 5.0.0, 4.4.1

 Attachments: PHOENIX-1819-rebased.patch, PHOENIX-1819.patch


 In order to get insight into what phoenix is doing and how much it is doing 
 per request, it would be ideal to get a single log line per phoenix request. 
 The log line could contain request level metrics like:
 1) Number of spool files created.
 2) Number of parallel scans.
 3) Number of serial scans.
 4) Query failed - boolean 
 5) Query time out - boolean 
 6) Query time.
 7) Mutation time.
 8) Mutation size in bytes.
 9) Number of mutations.
 10) Bytes allocated by the memory manager.
 11) Time spent by threads waiting for the memory to be allocated.
 12) Number of tasks submitted to the pool.
 13) Number of tasks rejected.
 14) Time spent by tasks in the queue.
 15) Time taken by tasks to complete - from construction to execution 
 completion.
 16) Time taken by tasks to execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-1791) Pherf - Support data creation to multi-tenant views

2015-03-30 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-1791:
-

 Summary: Pherf - Support data creation to multi-tenant views
 Key: PHOENIX-1791
 URL: https://issues.apache.org/jira/browse/PHOENIX-1791
 Project: Phoenix
  Issue Type: Bug
Reporter: Jan Fernando
Assignee: Jan Fernando
Priority: Minor


Pherf currently doesn't allow the data creation phase to use a multi-tenant 
connection write directly to a multi-tenant view. This enhancement would allow 
users to write data directly to multi-tenant views that have the same 
underlying table but where each view has a different set of custom columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-03-05 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349136#comment-14349136
 ] 

Jan Fernando commented on PHOENIX-1452:
---

[~samarthjain] Looks really great to me! Awesome work!

Just 2 nits:
1) I think we should make the default ON for metrics collection. Looked to me 
in QueryServicesOptions that the default was false. If we run into perf issues 
we could flip the default or folks can opt out if they have some specific needs.
2) Can you add few sanity tests to make sure the framework behaves as expected?

This is going to be a really fantastic addition to Phoenix. Thanks for doing 
this!

 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: PHOENIX-1452.patch, PHOENIX-1452_v2.patch, 
 PHOENIX-1452_v3.patch, wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 In addition to this it would also be really valuable to have a single log 
 line per query that provides statistics about the level of parallelism i.e. 
 number of parallel scans being executed. I don't full explain plan level of 
 data but a good heuristic to be able to track over time how queries are 
 utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-03-04 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347364#comment-14347364
 ] 

Jan Fernando edited comment on PHOENIX-1452 at 3/4/15 6:51 PM:
---

This is shaping up nicely, [~samarthjain]!

 A few concrete pieces of feedback:

1) Instead of hardcoding the metric names in individual classes could maintain 
a global list of metric names? I think it will make this easier to maintain as 
we manage and add more metrics over time.

2) Can we add a cumulative metric for the # queries executed? I think that 
would be really useful and didn't see it on the list or in the patch.

3) One thing that concerns about me the overall approach, and this is related 
to the fact I think we must have the ability to toggle metrics collection on 
and off,  is the fact that all the metrics and calculation logic is sprinkled 
throughout all the classes. 

For example with ScanningResultIterator we have the calculateSize() method and 
static member scanBytesRead. This approach makes it hard to implement 
toggleability. I think a better approach is to push all the calculations, 
metrics and collection down to a single class.

Let's use ScanningResultIterator as an example. The way ScanningResultIterator 
might work is a single call such as:
{code}
PhoenixMetrics.captureScanBytesRead(Result result).
{code}
We can then remove  the static member scanBytesRead for the metric and the 
calculateSize() method from the ScanningResultIterator and push this work down 
into PhoenixMetrics. The instrumented class just has to have a single line 
added and the magic all happens in PhoenixMetrics. That way you can instrument 
each method in PhoenixMetrics with a quick boolean check as to whether metrics 
are enabled before doing any collection. I think this is more scaleable than 
proliferating boolean checks throughout the code and statistics throughout the 
code and allows easy on-off toggleability.




was (Author: jfernando_sfdc):
This is shaping up nicely, [~samarthjain]!

 A few concrete pieces of feedback:

1) Instead of hardcoding the metric names in individual classes could maintain 
a global list of metric names? I think it will make this easier to maintain as 
manage as we add more metrics over time.

2) Can we add a cumulative metric for the # queries executed? I think that 
would be really useful and didn't see it on the list or in the patch.

3) One thing that concerns about me the overall approach, and this is related 
to the fact I think we must have the ability to toggle metrics collection on 
and off,  is the fact that all the metrics and calculation logic is sprinkled 
throughout all the classes. 

For example with ScanningResultIterator we have the calculateSize() method and 
static member scanBytesRead. This approach makes it hard to implement 
toggleability. I think a better approach is to push all the calculations, 
metrics and collection down to a single class.

Let's use ScanningResultIterator as an example. The way ScanningResultIterator 
might work is a single call such as:
{code}
PhoenixMetrics.captureScanBytesRead(Result result).
{code}
We can them remove  the static member scanBytesRead for the metric and the 
calculateSize() method from the ScanningResultIterator and push this work down 
into PhoenixMetrics. Instrumented class just have a single line added and the 
magic all happens in PhoenixMetrics. That way you can instrument each method 
with a quick boolean check as to whether metrics is enabled before doing any 
collection. I think this is more scaleable than proliferating boolean checks 
throughout the code and statistics throughout the code and allows easy on-off 
toggleability.



 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: PHOENIX-1452.patch, PHOENIX-1452_v2.patch, wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 

[jira] [Commented] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-03-04 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347364#comment-14347364
 ] 

Jan Fernando commented on PHOENIX-1452:
---

This is shaping up nicely, [~samarthjain]!

 A few concrete pieces of feedback:

1) Instead of hardcoding the metric names in individual classes could maintain 
a global list of metric names? I think it will make this easier to maintain as 
manage as we add more metrics over time.

2) Can we add a cumulative metric for the # queries executed? I think that 
would be really useful and didn't see it on the list or in the patch.

3) One thing that concerns about me the overall approach, and this is related 
to the fact I think we must have the ability to toggle metrics collection on 
and off,  is the fact that all the metrics and calculation logic is sprinkled 
throughout all the classes. 

For example with ScanningResultIterator we have the calculateSize() method and 
static member scanBytesRead. This approach makes it hard to implement 
toggleability. I think a better approach is to push all the calculations, 
metrics and collection down to a single class.

Let's use ScanningResultIterator as an example. The way ScanningResultIterator 
might work is a single call such as:
{code}
PhoenixMetrics.captureScanBytesRead(Result result).
{code}
We can them remove  the static member scanBytesRead for the metric and the 
calculateSize() method from the ScanningResultIterator and push this work down 
into PhoenixMetrics. Instrumented class just have a single line added and the 
magic all happens in PhoenixMetrics. That way you can instrument each method 
with a quick boolean check as to whether metrics is enabled before doing any 
collection. I think this is more scaleable than proliferating boolean checks 
throughout the code and statistics throughout the code and allows easy on-off 
toggleability.



 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: PHOENIX-1452.patch, PHOENIX-1452_v2.patch, wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 In addition to this it would also be really valuable to have a single log 
 line per query that provides statistics about the level of parallelism i.e. 
 number of parallel scans being executed. I don't full explain plan level of 
 data but a good heuristic to be able to track over time how queries are 
 utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-02-27 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340363#comment-14340363
 ] 

Jan Fernando edited comment on PHOENIX-1452 at 2/27/15 4:49 PM:


[~samarthjain] In terms of the thread pool, the simple thing that would be 
really useful would just be know what the max number of threads and the max 
queue depth size was for a period of time - that would provide me with a very 
useful high level overview of usage relative to configuration. I was thinking 
of something like every time a task was enqueued you checked the size of the 
queue and threads in use and if  max set size on a counter. Do you think 
something like that is lightweight and easy to do? If so it would be a really 
useful heuristic.


was (Author: jfernando_sfdc):
[~samarthjain] In terms of the thread pool, the simple thing that would be 
really useful would just be know what the max number of threads and the max 
queue depth size was for - that would provide me with a very useful highlevel 
overview of usage relative to configuration. I was thinking of something like 
every time a task was enqueued you checked the size of the queue and threads in 
use and if  max set size on a counter. Do you think something like that is 
lightweight and easy to do? If so it would be a really useful heuristic.

 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 In addition to this it would also be really valuable to have a single log 
 line per query that provides statistics about the level of parallelism i.e. 
 number of parallel scans being executed. I don't full explain plan level of 
 data but a good heuristic to be able to track over time how queries are 
 utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-02-27 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14340363#comment-14340363
 ] 

Jan Fernando commented on PHOENIX-1452:
---

[~samarthjain] In terms of the thread pool, the simple thing that would be 
really useful would just be know what the max number of threads and the max 
queue depth size was for - that would provide me with a very useful highlevel 
overview of usage relative to configuration. I was thinking of something like 
every time a task was enqueued you checked the size of the queue and threads in 
use and if  max set size on a counter. Do you think something like that is 
lightweight and easy to do? If so it would be a really useful heuristic.

 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 In addition to this it would also be really valuable to have a single log 
 line per query that provides statistics about the level of parallelism i.e. 
 number of parallel scans being executed. I don't full explain plan level of 
 data but a good heuristic to be able to track over time how queries are 
 utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1452) Add Phoenix client-side logging and capture resource utilization metrics

2015-02-25 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337901#comment-14337901
 ] 

Jan Fernando commented on PHOENIX-1452:
---

[~samarthjain] I took a quick look and the approach looks great! This is 
exactly what the kind of thing I had in mind! One small thought, it might nice 
to let clients decide whether the to run in StrictlyCovariance mode or not. I 
think having the default be false as you do now is the right approach. In my 
case this data does not need to be consistent, I want to stats collection to be 
as fast as possible and I can deal with small drift.. 

 Add Phoenix client-side logging and capture resource utilization metrics
 

 Key: PHOENIX-1452
 URL: https://issues.apache.org/jira/browse/PHOENIX-1452
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0, 4.2
Reporter: Jan Fernando
Assignee: Samarth Jain
 Attachments: wip.patch


 For performance testing and tuning of features that use Phoenix and for 
 production monitoring it would be really helpful to easily be able to extract 
 statistics about Phoenix's client-side Thread Pool and Queue Depth usage to 
 help with tuning and being able to correlate the impact of tuning these 2 
 parameters to query performance.
 For global per JVM logging one of the following would meet my needs, with a 
 preference for #2:
 1. A simple log line that that logs the data in ThreadPoolExecutor.toString() 
 at a configurable interval
 2. Exposing the ThreadPoolExecutor metrics in PhoenixRuntime or other global 
 client exposed class and allow client to do their own logging.
 In addition to this it would also be really valuable to have a single log 
 line per query that provides statistics about the level of parallelism i.e. 
 number of parallel scans being executed. I don't full explain plan level of 
 data but a good heuristic to be able to track over time how queries are 
 utilizing the thread pool as data size grows etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

2015-02-18 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326206#comment-14326206
 ] 

Jan Fernando commented on PHOENIX-1609:
---

+1 on the ASYNC keyword. For our use cases we have a good sense of data size at 
the outset and want to be explicit about which approach we are taking.

This is important as it lets us plan both the feature and the rollout based on 
one index creation path. If the index build fails it fails and we need to plan 
for and handle that (i.e. how our feature behaves, how we retry etc.).  I 
wouldn't want index creation automatically falling back to the current approach 
if the async M/R approach failed - this could create unexpected surprises on 
the cluster. I'd rather it be a bit more manual as this makes it easier to 
manage and reason about different failure scenarios.

 MR job to populate index tables 
 

 Key: PHOENIX-1609
 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
 Project: Phoenix
  Issue Type: New Feature
Reporter: maghamravikiran
Assignee: maghamravikiran
 Attachments: 0001-PHOENIX_1609.patch


 Often, we need to create new indexes on master tables way after the data 
 exists on the master tables.  It would be good to have a simple MR job given 
 by the phoenix code that users can call to have indexes in sync with the 
 master table. 
 Users can invoke the MR job using the following command 
 hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt 
 INDEX_TABLE -columns a,b,c
 Is this ideal? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1590) Add an Asynchronous/Deferred Delete Option

2015-01-26 Thread Jan Fernando (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292272#comment-14292272
 ] 

Jan Fernando commented on PHOENIX-1590:
---

[~jamestaylor] Re the corner cases:

 #1 I think it is okay to allow the creation of a new VIEW with a different 
name but with the same or overlapping predicate. I think some of the 
responsibility is on the caller of the DDL statements to not issue conflicting 
statements. I think it's hard to prevent people from shooting themselves in the 
foot if they really want to.  I think making sure the order of DDL operations 
guarantees the data and application behavior is consistent is the 
responsibility of the caller.  I could see scenarios with overlapping views 
where changes are made across release boundaries e.g. version 1 of the software 
uses viewX and version 2 users viewY and both need to coexist during a 
deployment and you want to delete data based on viewX after a deployment. I 
think that there are a lot combinations based on unique needs where application 
developers are best equipped to order the DDL operations in these cases as 
opposed to pushing this responsibility to Phoenix.

#2 I think we should only allow a view with the exact same to be created only 
after all the data is deleted. This seems the easiest to reason about and 
forces you to think about a migration strategy.

 Add an Asynchronous/Deferred Delete Option
 --

 Key: PHOENIX-1590
 URL: https://issues.apache.org/jira/browse/PHOENIX-1590
 Project: Phoenix
  Issue Type: New Feature
Reporter: Jan Fernando

 For use cases where we need to delete very large amounts of data from Phoenix 
 tables running a synchronous delete can be problematic. In order to guarantee 
 that the delete completes, handle failure scenarios, and ensure it doesn't 
 put too much load on the HBase cluster and crowd out other queries running we 
 need to build tooling around the longer running delete operations to chunk 
 them up, provide retries in the event of failures, and have ways to throttle 
 delete load if the Region Servers get hot.  
 It would be really great if Phoenix offered a way to invoke a resilient 
 delete that was processed asynchronously and had minimal load on the cluster. 
 An idea mentioned to implement this is to introduce a DEFERRED keyword to the 
 DELETE operation and for such a delete to remove the data at compaction time.
 For our use cases, ideally, we would like to set delete filters that are 
 based on the first 2 elements of the row key (a multi-tenant id and the next 
 item).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-1590) Add an Asynchronous/Deferred Delete Option

2015-01-15 Thread Jan Fernando (JIRA)
Jan Fernando created PHOENIX-1590:
-

 Summary: Add an Asynchronous/Deferred Delete Option
 Key: PHOENIX-1590
 URL: https://issues.apache.org/jira/browse/PHOENIX-1590
 Project: Phoenix
  Issue Type: New Feature
Reporter: Jan Fernando


For use cases where we need to delete very large amounts of data from Phoenix 
tables running a synchronous delete can be problematic. In order to guarantee 
that the delete completes, handle failure scenarios, and ensure it doesn't put 
too much load on the HBase cluster and crowd out other queries running we need 
to build tooling around the longer running delete operations to chunk them up, 
provide retries in the event of failures, and have ways to throttle delete load 
if the Region Servers get hot.  

It would be really great if Phoenix offered a way to invoke a resilient delete 
that was processed asynchronously and had minimal load on the cluster. 
An idea mentioned to implement such a delete have been introducing a DEFERRED 
keyword to the DELETE and for such a delete to remove the data at compaction 
time.

For our use cases, ideally, we would like to work set delete filters that are 
based on the first 2 elements of the row key (a multi-tenant id and the next 
item).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >