[jira] [Commented] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15650093#comment-15650093
 ] 

chenglei commented on PHOENIX-3452:
---

[~jamestaylor], I updated my patch,enhanced my test cases following your 
suggestion. 

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v2.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3452:
--
Attachment: (was: PHOENIX-3452_v1.patch)

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v2.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3452:
--
Attachment: PHOENIX-3452_v2.patch

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v2.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649922#comment-15649922
 ] 

chenglei commented on PHOENIX-3452:
---

[~jamestaylor],ok ,I will add more tests.

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v1.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649908#comment-15649908
 ] 

James Taylor commented on PHOENIX-3452:
---

NULLS FIRST/LAST can matter if there's an ORDER BY. Try adding a similar test 
with an ORDER BY ORGANIZATION_ID, CONTAINER_ID NULL LAST test. For 
orderPreserving to be true, then NULLS FIRST has to be true or we can do a 
reverse scan. It's a bit tricky - needs more tests.

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v1.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649886#comment-15649886
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 5:45 AM:


If I remove the "MULTI_TENANT=TRUE" in the create table DDL as [~jpalmert] 
described, then the final select sql result is ok in Phoenix4.8.0, so this 
issue may be related to  MULTI_TENANT,it is not caused by the same problem as 
PHOENIX-3452.


was (Author: comnetwork):
If I remove the "MULTI_TENANT=TRUE" in the create table DDL as [~jpalmert], 
then the select sql result is ok in Phoenix4.8.0, so this issue may be related 
to  MULTI_TENANT,it is not caused by the same problem as PHOENIX-3452.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649886#comment-15649886
 ] 

chenglei commented on PHOENIX-3451:
---

If I remove the "MULTI_TENANT=TRUE" in the create table DDL as [~jpalmert], 
then the select sql result is ok in Phoenix4.8.0, so this issue may be related 
to  MULTI_TENANT,it is not caused by the same problem as PHOENIX-3452.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Comment: was deleted

(was: I wrote the IT Test as [~jpalmert] described, but It seems in 4.8.0 the 
test result is ok.I can not reproduce the bug.
My explain is(T01 is the data table,and T02 is index) :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649826#comment-15649826
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 5:06 AM:


I wrote the IT Test as [~jpalmert] described, but It seems in 4.8.0 the test 
result is ok.I can not reproduce the bug.
My explain is(T01 is the data table,and T02 is index) :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452


was (Author: comnetwork):
I wrote the IT Test as [~jpalmert], but It seems in 4.8.0 the test result is 
ok.I can not reproduce the bug.
My explain is(T01 is the data table,and T02 is index) :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected out

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649826#comment-15649826
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 5:06 AM:


I wrote the IT Test as [~jpalmert], but It seems in 4.8.0 the test result is 
ok.I can not reproduce the bug.
My explain is(T01 is the data table,and T02 is index) :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452


was (Author: comnetwork):
I wrote the IT Test as [~jpalmert], but It seems in 4.8.0 the test result is 
ok.I can not reproduce the bug.
My explain is :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-34

[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649826#comment-15649826
 ] 

chenglei commented on PHOENIX-3451:
---

I wrote the IT Test as [~jpalmert], but It seems in 4.8.0 the test result is 
ok.I can not reproduce the bug.
My explain is :

{code:borderStyle=solid} 
explain SELECT DISTINCT entity_id, score FROM T01 WHERE organization_id = 
'org2' AND container_id IN ( 'container1','container2','container3' ) ORDER BY 
score DESC LIMIT 2;
+---+
|   PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 3 KEYS OVER T02 ['container1   
  '] - ['container3 ']  |
| SERVER FILTER BY FIRST KEY ONLY AND "ORGANIZATION_ID" = 'org2'
|
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["ENTITY_ID", "SCORE"] 
|
| CLIENT MERGE SORT 
|
| CLIENT TOP 2 ROWS SORTED BY ["SCORE" DESC]
|
+---+

{code} 

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Comment: was deleted

(was: It seems PHOENIX-3451 is ok in Phoenix4.8.0, 
I wrote the IT Test as [~jpalmert] described,but the result is ok in 
Phoenix4.8.0 .
PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452.)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649807#comment-15649807
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 4:52 AM:


It seems PHOENIX-3451 is ok in Phoenix4.8.0, 
I wrote the IT Test as [~jpalmert] described,but the result is ok in 
Phoenix4.8.0 .
PHOENIX-3452 indeed can reproduce, I fixed the PHOENIX-3452.


was (Author: comnetwork):
It seems PHOENIX-3451 is ok in Phoenix4.8.0, I fixed the PHOENIX-3452

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649807#comment-15649807
 ] 

chenglei commented on PHOENIX-3451:
---

It seems PHOENIX-3451 is ok in Phoenix4.8.0, I fixed the PHOENIX-3452

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Comment: was deleted

(was: I think the problem is cause by the GroupByCompiler, when GroupBy.compile 
method called the OrderPreservingTracker.track method to track the groupBy 
expression's order,just as following(in GroupByCompiler.java):
{code:borderStyle=solid} 
144if (isOrderPreserving) {
145 OrderPreservingTracker tracker = new 
OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
expressions.size(), tupleProjector);
146  for (int i = 0; i < expressions.size(); i++) {
147Expression expression = expressions.get(i);
148tracker.track(expression);
149   }
{code}


The track method inappropriately used the sortOrder != SortOrder.getDefault() 
as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, [~jamestaylor], please review.

)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expecte

[jira] [Updated] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3452:
--
Attachment: PHOENIX-3452_v1.patch

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3452_v1.patch
>
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649794#comment-15649794
 ] 

chenglei commented on PHOENIX-3452:
---

I think the problem is cause by the GroupByCompiler, when GroupBy.compile 
method called the OrderPreservingTracker.track method to track the groupBy 
expression's order,just as following(in GroupByCompiler.java):
{code:borderStyle=solid} 
144if (isOrderPreserving) {
145 OrderPreservingTracker tracker = new 
OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
expressions.size(), tupleProjector);
146  for (int i = 0; i < expressions.size(); i++) {
147Expression expression = expressions.get(i);
148tracker.track(expression);
149   }
{code}


The track method inappropriately used the sortOrder != SortOrder.getDefault() 
as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, [~jamestaylor], please review.


> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the 

[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Attachment: (was: PHOENIX-3451_v1.patch)

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:58 AM:


I think the problem is cause by the GroupByCompiler, when GroupBy.compile 
method called the OrderPreservingTracker.track method to track the groupBy 
expression's order,just as following(in GroupByCompiler.java):
{code:borderStyle=solid} 
144if (isOrderPreserving) {
145 OrderPreservingTracker tracker = new 
OrderPreservingTracker(context, GroupBy.EMPTY_GROUP_BY, Ordering.UNORDERED, 
expressions.size(), tupleProjector);
146  for (int i = 0; i < expressions.size(); i++) {
147Expression expression = expressions.get(i);
148tracker.track(expression);
149   }
{code}


The track method inappropriately used the sortOrder != SortOrder.getDefault() 
as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, [~jamestaylor], please review.




was (Author: comnetwork):
I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method to track the groupBy 
expression's order, it inappropriately used the sortOrder != 
SortOrder.getDefault() as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_T

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:53 AM:


I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method to track the groupBy 
expression's order, it inappropriately used the sortOrder != 
SortOrder.getDefault() as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, [~jamestaylor], please review.




was (Author: comnetwork):
I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unw

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:20 AM:


I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, [~jamestaylor], please review.




was (Author: comnetwork):
I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

   

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:19 AM:


I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}

I uploaded my patch, please review.




was (Author: comnetwork):
I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPl

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:10 AM:


I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

Once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}






was (Author: comnetwork):
I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQu

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei edited comment on PHOENIX-3451 at 11/9/16 3:10 AM:


I think the problem is cause by the GroupByCompiler, when GroupByCompiler 
called the OrderPreservingTracker.track method, it inappropriately used the 
sortOrder != SortOrder.getDefault() as the thrid "isNullsLast" parameter as 
following(in OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}






was (Author: comnetwork):
I think he problem is cause by the GroupByCompiler, when GroupByCompiler called 
the OrderPreservingTracker.track method, it inappropriately used the sortOrder 
!= SortOrder.getDefault() as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQue

[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3451:
--
Attachment: PHOENIX-3451_v1.patch

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649650#comment-15649650
 ] 

chenglei commented on PHOENIX-3451:
---

I think he problem is cause by the GroupByCompiler, when GroupByCompiler called 
the OrderPreservingTracker.track method, it inappropriately used the sortOrder 
!= SortOrder.getDefault() as the thrid "isNullsLast" parameter as following(in 
OrderPreservingTracker.java):

{code:borderStyle=solid} 
  101 public void track(Expression node) {
  102   SortOrder sortOrder = node.getSortOrder();
  103   track(node, sortOrder, sortOrder != SortOrder.getDefault());
  104 }
  105
  106public void track(Expression node, SortOrder sortOrder, boolean 
isNullsLast) {
{code}

once the node's SortOrder is SortOrder.DESC,  the "isNullsLast" is true. it 
affected the GroupBy 's isOrderPreserving  as following(in 
OrderPreservingTracker.java) :

{code:borderStyle=solid}
  141if (node.isNullable()) {
  142if (!Boolean.valueOf(isNullsLast).equals(isReverse)) {
  143  isOrderPreserving = false;
  144  isReverse = false;
  145  return;
  146}
  147  }
{code}

Actually, the "isNullsLast"  parameter is just related to orderBy ,it  should 
just affected the display order of "Null " in the sorted results , groupBy 
should not be affetced by "isNullsLast". I wrote a simple unit test to 
reproduce this problem in my patch:

{code:borderStyle=solid}
@Test
public void testGroupByDesc() throws Exception {
Connection conn = DriverManager.getConnection(getUrl());
try {
conn.createStatement().execute("DROP TABLE IF EXISTS 
GROUPBYDESC_TEST");

String sql="CREATE TABLE IF NOT EXISTS GROUPBYDESC_TEST ( "+
"ORGANIZATION_ID VARCHAR,"+
"CONTAINER_ID VARCHAR,"+
"CONSTRAINT TEST_PK PRIMARY KEY ( "+
"ORGANIZATION_ID DESC,"+
"CONTAINER_ID DESC"+
"))";
conn.createStatement().execute(sql);


sql="SELECT ORGANIZATION_ID, CONTAINER_ID,count(*) FROM 
GROUPBYDESC_TEST group by ORGANIZATION_ID, CONTAINER_ID";
PhoenixPreparedStatement statement = 
conn.prepareStatement(sql).unwrap(PhoenixPreparedStatement.class);

QueryPlan queryPlan = statement.optimizeQuery(sql);
queryPlan.iterator();
assertTrue(queryPlan.getGroupBy().isOrderPreserving());

} finally {
conn.close();
}
}
{code}





> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3451_v1.patch
>
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from th

[jira] [Commented] (PHOENIX-3461) Statistics collection broken if name space mapping enabled for SYSTEM tables

2016-11-08 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649635#comment-15649635
 ] 

Samarth Jain commented on PHOENIX-3461:
---

Just wanted to give an update - writing test cases is taking slightly longer 
than I expected. I hope to have a patch out tonight.

> Statistics collection broken if name space mapping enabled for SYSTEM tables
> 
>
> Key: PHOENIX-3461
> URL: https://issues.apache.org/jira/browse/PHOENIX-3461
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Fix For: 4.9.0
>
> Attachments: PHOENIX-3461_master.patch, PHOENIX-3461_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3467) Update Phoenix/Hive storage handler documentation

2016-11-08 Thread Frank Welsch (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Welsch updated PHOENIX-3467:
--
Description: The documentation on 
https://phoenix.apache.org/hive_storage_handler.html is not clear and has some 
small technical errors. This improvement retitles the page and has significant 
edits of the content.  (was: The documentation on 
https://phoenix.apache.org/hive_storage_handler.html is not clear and has some 
technical errors. This improvement retitles the page and has significant edits 
of the content.)

> Update Phoenix/Hive storage handler documentation
> -
>
> Key: PHOENIX-3467
> URL: https://issues.apache.org/jira/browse/PHOENIX-3467
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Frank Welsch
> Attachments: patch.patch
>
>
> The documentation on https://phoenix.apache.org/hive_storage_handler.html is 
> not clear and has some small technical errors. This improvement retitles the 
> page and has significant edits of the content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3467) Update Phoenix/Hive storage handler documentation

2016-11-08 Thread Frank Welsch (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Welsch updated PHOENIX-3467:
--
Attachment: patch.patch

> Update Phoenix/Hive storage handler documentation
> -
>
> Key: PHOENIX-3467
> URL: https://issues.apache.org/jira/browse/PHOENIX-3467
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Frank Welsch
> Attachments: patch.patch
>
>
> The documentation on https://phoenix.apache.org/hive_storage_handler.html is 
> not clear and has some technical errors. This improvement retitles the page 
> and has significant edits of the content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3467) Update Phoenix/Hive storage handler documentation

2016-11-08 Thread Frank Welsch (JIRA)
Frank Welsch created PHOENIX-3467:
-

 Summary: Update Phoenix/Hive storage handler documentation
 Key: PHOENIX-3467
 URL: https://issues.apache.org/jira/browse/PHOENIX-3467
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.8.0
Reporter: Frank Welsch


The documentation on https://phoenix.apache.org/hive_storage_handler.html is 
not clear and has some technical errors. This improvement retitles the page and 
has significant edits of the content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3333) Support Spark 2.0

2016-11-08 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-:
--
Summary: Support Spark 2.0  (was: can not load phoenix table by spark 2.0 
which is working well under spark 1.6)

> Support Spark 2.0
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Attachments: PHOENIX--interim.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
> 

[jira] [Resolved] (PHOENIX-3347) Set conformance level to PhoenixSqlConformance

2016-11-08 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue resolved PHOENIX-3347.
--
Resolution: Fixed

> Set conformance level to PhoenixSqlConformance
> --
>
> Key: PHOENIX-3347
> URL: https://issues.apache.org/jira/browse/PHOENIX-3347
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Eric Lomore
>Assignee: Maryann Xue
>
> Current conformance settings do not allow SELECT statements without a FROM 
> clause. Either need to change conformance or stop supporting SELECT without 
> FROM as Phoenix currently does.
> According to the Calcite parser,
> {{FROM is mandatory in standard SQL, optional in dialects such as MySQL, 
> PostgreSQL. The parser allows SELECT without FROM, but the validator fails if 
> conformance is, say, STRICT_2003.}}
> Based on PhoenixCalciteEmbeddedDriver.java, we are using ORACLE_10 
> conformance which does not support this
> {code}setPropertyIfNotSpecified(
> info2,
> CalciteConnectionProperty.CONFORMANCE.camelName(),
> SqlConformance.ORACLE_10.toString()){code}
> Confirming this is the fact that it is specifically the SqlValidator throwing 
> the exception in relevant test cases
> {{Caused by: org.apache.calcite.sql.validate.SqlValidatorException: SELECT 
> must have a FROM clause}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3347) Set conformance level to PhoenixSqlConformance

2016-11-08 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated PHOENIX-3347:
-
Summary: Set conformance level to PhoenixSqlConformance  (was: Change 
conformance or remove SELECT statements without FROM clauses)

> Set conformance level to PhoenixSqlConformance
> --
>
> Key: PHOENIX-3347
> URL: https://issues.apache.org/jira/browse/PHOENIX-3347
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Eric Lomore
>Assignee: Maryann Xue
>
> Current conformance settings do not allow SELECT statements without a FROM 
> clause. Either need to change conformance or stop supporting SELECT without 
> FROM as Phoenix currently does.
> According to the Calcite parser,
> {{FROM is mandatory in standard SQL, optional in dialects such as MySQL, 
> PostgreSQL. The parser allows SELECT without FROM, but the validator fails if 
> conformance is, say, STRICT_2003.}}
> Based on PhoenixCalciteEmbeddedDriver.java, we are using ORACLE_10 
> conformance which does not support this
> {code}setPropertyIfNotSpecified(
> info2,
> CalciteConnectionProperty.CONFORMANCE.camelName(),
> SqlConformance.ORACLE_10.toString()){code}
> Confirming this is the fact that it is specifically the SqlValidator throwing 
> the exception in relevant test cases
> {{Caused by: org.apache.calcite.sql.validate.SqlValidatorException: SELECT 
> must have a FROM clause}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3461) Statistics collection broken if name space mapping enabled for SYSTEM tables

2016-11-08 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648905#comment-15648905
 ] 

Samarth Jain commented on PHOENIX-3461:
---

Let me come up with a test case for this. We would need to test stats 
collection with namespace mapping for user and system tables. 

> Statistics collection broken if name space mapping enabled for SYSTEM tables
> 
>
> Key: PHOENIX-3461
> URL: https://issues.apache.org/jira/browse/PHOENIX-3461
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Fix For: 4.9.0
>
> Attachments: PHOENIX-3461_master.patch, PHOENIX-3461_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

2016-11-08 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3451:
--
Assignee: chenglei

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -
>
> Key: PHOENIX-3451
> URL: https://issues.apache.org/jira/browse/PHOENIX-3451
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId51.2
> entityId31.4
> The expected out out would be
> entityId81.45
> entityId31.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor resolved PHOENIX-3452.
---
Resolution: Duplicate

As suspected, this is duplicate of PHOENIX-3451.

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3452) Secondary index and query using distinct: ORDER BY doesn't work correctly

2016-11-08 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3452:
--
Assignee: chenglei

> Secondary index and query using distinct: ORDER BY doesn't work correctly
> -
>
> Key: PHOENIX-3452
> URL: https://issues.apache.org/jira/browse/PHOENIX-3452
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Joel Palmert
>Assignee: chenglei
>
> This may be related to PHOENIX-3451 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ORGANIZATION_ID CHAR(15) NOT NULL,
> CONTAINER_ID CHAR(15) NOT NULL,
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ORGANIZATION_ID,
> CONTAINER_ID,
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org1','container1','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org1','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org1','container1','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org1','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org1','container1','entityId2',1.5);
> UPSERT INTO test.test VALUES ('org1','container1','entityId1',1.6);
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org1'
> AND container_id = 'container1'
> ORDER BY score DESC
> Notice that the returned results are not returned in descending score order. 
> Instead they are returned in descending entity_id order. If I remove the 
> DISTINCT or remove the secondary index the result is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3333) can not load phoenix table by spark 2.0 which is working well under spark 1.6

2016-11-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648854#comment-15648854
 ] 

James Taylor commented on PHOENIX-:
---

Awesome, [~jmahonin]. FYI, [~lhofhansl] & [~andrew.purt...@gmail.com].

> can not load phoenix table by spark 2.0 which is working well under spark 1.6
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Attachments: PHOENIX--interim.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
> at org.apache.spa

[jira] [Commented] (PHOENIX-3427) rdd.saveToPhoenix gives table undefined error when attempting to write to a tenant-specific view (TenantId defined in configuration object and passed to saveToPhoenix

2016-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648811#comment-15648811
 ] 

ASF GitHub Bot commented on PHOENIX-3427:
-

Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/221#discussion_r87025450
  
--- Diff: 
phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala ---
@@ -60,16 +60,21 @@ class PhoenixSparkIT extends FunSuite with Matchers 
with BeforeAndAfterAll {
 ConfigurationUtil.getZookeeperURL(hbaseConfiguration).get
   }
 
-  override def beforeAll() {
-PhoenixSparkITHelper.doSetup
+  // Runs SQL commands located in the file defined in the sqlSource 
argument
+  // Optional argument tenantId used for running tenant-specific SQL
+  def setupTables(sqlSource: String, tenantId: Option[String]): Unit = {
+val url = tenantId match {
+  case Some(tenantId) => PhoenixSparkITHelper.getUrl + ";TenantId=" + 
tenantId
+  case _ => PhoenixSparkITHelper.getUrl
+}
 
-conn = DriverManager.getConnection(PhoenixSparkITHelper.getUrl)
+conn = DriverManager.getConnection(url)
--- End diff --

Better to set TenantId in Properties and use 
DriverManager.getConnection(url, props) than muck with the URL string


> rdd.saveToPhoenix gives table undefined error when attempting to write to a 
> tenant-specific view (TenantId defined in configuration object and passed to 
> saveToPhoenix)
> ---
>
> Key: PHOENIX-3427
> URL: https://issues.apache.org/jira/browse/PHOENIX-3427
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Nico Pappagianis
>
> Although we can read from a tenant-specific view by passing TenantId in the 
> conf object when calling sc.phoenixTableAsRDD the same does not hold for 
> rdd.saveToPhoenix. Calling saveToPhoenix with a tenant-specific view as the 
> table name gives a table undefined error, even when passing in the TenantId 
> with the conf object.
> It appears that TenantId is lost during the execution path of saveToPhoenix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3427) rdd.saveToPhoenix gives table undefined error when attempting to write to a tenant-specific view (TenantId defined in configuration object and passed to saveToPhoenix

2016-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648812#comment-15648812
 ] 

ASF GitHub Bot commented on PHOENIX-3427:
-

Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/221#discussion_r87029205
  
--- Diff: 
phoenix-spark/src/main/scala/org/apache/phoenix/spark/ProductRDDFunctions.scala 
---
@@ -16,19 +16,20 @@ package org.apache.phoenix.spark
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.io.NullWritable
 import org.apache.phoenix.mapreduce.PhoenixOutputFormat
-import 
org.apache.phoenix.mapreduce.util.{ColumnInfoToStringEncoderDecoder, 
PhoenixConfigurationUtil}
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil
 import org.apache.spark.Logging
 import org.apache.spark.rdd.RDD
+
 import scala.collection.JavaConversions._
 
 class ProductRDDFunctions[A <: Product](data: RDD[A]) extends Logging with 
Serializable {
 
   def saveToPhoenix(tableName: String, cols: Seq[String],
-conf: Configuration = new Configuration, zkUrl: 
Option[String] = None)
+conf: Configuration = new Configuration, zkUrl: 
Option[String] = None, tenantId: Option[String] = None)
--- End diff --

Can we set the TenantId on props of the PhoenixConnection instead of on the 
Configuration as the latter may be shared among multiple users.


> rdd.saveToPhoenix gives table undefined error when attempting to write to a 
> tenant-specific view (TenantId defined in configuration object and passed to 
> saveToPhoenix)
> ---
>
> Key: PHOENIX-3427
> URL: https://issues.apache.org/jira/browse/PHOENIX-3427
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Nico Pappagianis
>
> Although we can read from a tenant-specific view by passing TenantId in the 
> conf object when calling sc.phoenixTableAsRDD the same does not hold for 
> rdd.saveToPhoenix. Calling saveToPhoenix with a tenant-specific view as the 
> table name gives a table undefined error, even when passing in the TenantId 
> with the conf object.
> It appears that TenantId is lost during the execution path of saveToPhoenix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] phoenix pull request #221: PHOENIX-3427 fix saveToRdd for tenant-specific co...

2016-11-08 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/221#discussion_r87025450
  
--- Diff: 
phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala ---
@@ -60,16 +60,21 @@ class PhoenixSparkIT extends FunSuite with Matchers 
with BeforeAndAfterAll {
 ConfigurationUtil.getZookeeperURL(hbaseConfiguration).get
   }
 
-  override def beforeAll() {
-PhoenixSparkITHelper.doSetup
+  // Runs SQL commands located in the file defined in the sqlSource 
argument
+  // Optional argument tenantId used for running tenant-specific SQL
+  def setupTables(sqlSource: String, tenantId: Option[String]): Unit = {
+val url = tenantId match {
+  case Some(tenantId) => PhoenixSparkITHelper.getUrl + ";TenantId=" + 
tenantId
+  case _ => PhoenixSparkITHelper.getUrl
+}
 
-conn = DriverManager.getConnection(PhoenixSparkITHelper.getUrl)
+conn = DriverManager.getConnection(url)
--- End diff --

Better to set TenantId in Properties and use 
DriverManager.getConnection(url, props) than muck with the URL string


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] phoenix pull request #221: PHOENIX-3427 fix saveToRdd for tenant-specific co...

2016-11-08 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/221#discussion_r87029205
  
--- Diff: 
phoenix-spark/src/main/scala/org/apache/phoenix/spark/ProductRDDFunctions.scala 
---
@@ -16,19 +16,20 @@ package org.apache.phoenix.spark
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.io.NullWritable
 import org.apache.phoenix.mapreduce.PhoenixOutputFormat
-import 
org.apache.phoenix.mapreduce.util.{ColumnInfoToStringEncoderDecoder, 
PhoenixConfigurationUtil}
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil
 import org.apache.spark.Logging
 import org.apache.spark.rdd.RDD
+
 import scala.collection.JavaConversions._
 
 class ProductRDDFunctions[A <: Product](data: RDD[A]) extends Logging with 
Serializable {
 
   def saveToPhoenix(tableName: String, cols: Seq[String],
-conf: Configuration = new Configuration, zkUrl: 
Option[String] = None)
+conf: Configuration = new Configuration, zkUrl: 
Option[String] = None, tenantId: Option[String] = None)
--- End diff --

Can we set the TenantId on props of the PhoenixConnection instead of on the 
Configuration as the latter may be shared among multiple users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-3427) rdd.saveToPhoenix gives table undefined error when attempting to write to a tenant-specific view (TenantId defined in configuration object and passed to saveToPhoenix

2016-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648761#comment-15648761
 ] 

ASF GitHub Bot commented on PHOENIX-3427:
-

Github user nico-pappagianis commented on the issue:

https://github.com/apache/phoenix/pull/221
  
@jmahonin I've added a couple tests for DataFrames and refactored the test 
code a bit. Thanks for looking at this again.


> rdd.saveToPhoenix gives table undefined error when attempting to write to a 
> tenant-specific view (TenantId defined in configuration object and passed to 
> saveToPhoenix)
> ---
>
> Key: PHOENIX-3427
> URL: https://issues.apache.org/jira/browse/PHOENIX-3427
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Nico Pappagianis
>
> Although we can read from a tenant-specific view by passing TenantId in the 
> conf object when calling sc.phoenixTableAsRDD the same does not hold for 
> rdd.saveToPhoenix. Calling saveToPhoenix with a tenant-specific view as the 
> table name gives a table undefined error, even when passing in the TenantId 
> with the conf object.
> It appears that TenantId is lost during the execution path of saveToPhoenix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] phoenix issue #221: PHOENIX-3427 fix saveToRdd for tenant-specific connectio...

2016-11-08 Thread nico-pappagianis
Github user nico-pappagianis commented on the issue:

https://github.com/apache/phoenix/pull/221
  
@jmahonin I've added a couple tests for DataFrames and refactored the test 
code a bit. Thanks for looking at this again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-3461) Statistics collection broken if name space mapping enabled for SYSTEM tables

2016-11-08 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648675#comment-15648675
 ] 

James Taylor commented on PHOENIX-3461:
---

We need a variant of this function that takes a Configuration argument instead 
of a ReadOnlyProps argument as it's expensive to go from one to the other:
{code}
public static boolean isNamespaceMappingEnabled(PTableType type, 
ReadOnlyProps readOnlyProps) {
return 
readOnlyProps.getBoolean(QueryServices.IS_NAMESPACE_MAPPING_ENABLED,
QueryServicesOptions.DEFAULT_IS_NAMESPACE_MAPPING_ENABLED)
&& (type == null || !PTableType.SYSTEM.equals(type)
|| 
readOnlyProps.getBoolean(QueryServices.IS_SYSTEM_TABLE_MAPPED_TO_NAMESPACE,

QueryServicesOptions.DEFAULT_IS_SYSTEM_TABLE_MAPPED_TO_NAMESPACE));
}
{code}
Then change this to call the new one instead of creating a new ReadOnlyProps 
here:
{code}
public static TableName getPhysicalTableName(byte[] fullTableName, 
Configuration conf) {
return getPhysicalTableName(fullTableName, isNamespaceMappingEnabled(
isSystemTable(fullTableName) ? PTableType.SYSTEM : null, new 
ReadOnlyProps(conf.iterator(;
}
{code}
Also, can we get a test in place for stats collection when namespaces are 
enabled so we don't regress it again?

> Statistics collection broken if name space mapping enabled for SYSTEM tables
> 
>
> Key: PHOENIX-3461
> URL: https://issues.apache.org/jira/browse/PHOENIX-3461
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Fix For: 4.9.0
>
> Attachments: PHOENIX-3461_master.patch, PHOENIX-3461_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3427) rdd.saveToPhoenix gives table undefined error when attempting to write to a tenant-specific view (TenantId defined in configuration object and passed to saveToPhoenix

2016-11-08 Thread Nico Pappagianis (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648637#comment-15648637
 ] 

Nico Pappagianis commented on PHOENIX-3427:
---

Thanks Josh. I implemented that change and wrote a couple tests around it.

I've refactored the IT test classes some, as the single class was starting to 
get unwieldy. Let me know if it looks OK, I'll push to my branch shortly. 
Thanks again! Hope all is well with your new baby :)

> rdd.saveToPhoenix gives table undefined error when attempting to write to a 
> tenant-specific view (TenantId defined in configuration object and passed to 
> saveToPhoenix)
> ---
>
> Key: PHOENIX-3427
> URL: https://issues.apache.org/jira/browse/PHOENIX-3427
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Nico Pappagianis
>
> Although we can read from a tenant-specific view by passing TenantId in the 
> conf object when calling sc.phoenixTableAsRDD the same does not hold for 
> rdd.saveToPhoenix. Calling saveToPhoenix with a tenant-specific view as the 
> table name gives a table undefined error, even when passing in the TenantId 
> with the conf object.
> It appears that TenantId is lost during the execution path of saveToPhoenix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3355) Register Phoenix built-in functions as Calcite functions

2016-11-08 Thread Maryann Xue (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648607#comment-15648607
 ] 

Maryann Xue commented on PHOENIX-3355:
--

[~lomoree], PHOENIX-3466 is in now. You can use 
PhoenixImplementor.getStatementContext() in CalciteUtils for function 
expression instantiation. Still I found that CurrentDateFunction does not have 
any arguments but do need a statement context to get the current timestamp. So 
the default value solution may simply be not enough. Please do follow 
[~jamestaylor]'s suggestion about checking for all {{nodeClass}} parameters in 
the {{@BuiltInFunction}} annotations. That way you can see all places where 
StatementContext is needed.

> Register Phoenix built-in functions as Calcite functions
> 
>
> Key: PHOENIX-3355
> URL: https://issues.apache.org/jira/browse/PHOENIX-3355
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Eric Lomore
>  Labels: calcite
> Attachments: PHOENIX-3355.function_constructor.patch, 
> PHOENIX-3355.wip, PHOENIX-3355.wip2
>
>
> We should register all Phoenix built-in functions that don't exist in Calcite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3394) Handle SequenceResolving through ConnectionQueryServices interface

2016-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648601#comment-15648601
 ] 

ASF GitHub Bot commented on PHOENIX-3394:
-

Github user lomoree closed the pull request at:

https://github.com/apache/phoenix/pull/220


> Handle SequenceResolving through ConnectionQueryServices interface
> --
>
> Key: PHOENIX-3394
> URL: https://issues.apache.org/jira/browse/PHOENIX-3394
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Eric Lomore
>Assignee: Eric Lomore
>
> Tons of unit tests have this same stack trace. It appears that this call 
> shouldn't reach ConnectionlessQueryServicesImpl.getTable?
> {code}
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.phoenix.query.ConnectionlessQueryServicesImpl.getTable(ConnectionlessQueryServicesImpl.java:157)
>   at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getTable(DelegateConnectionQueryServices.java:70)
>   at 
> org.apache.phoenix.execute.MutationState.getHTable(MutationState.java:360)
>   at 
> org.apache.phoenix.iterate.TableResultIterator.(TableResultIterator.java:101)
>   at 
> org.apache.phoenix.iterate.DefaultTableResultIteratorFactory.newIterator(DefaultTableResultIteratorFactory.java:33)
>   at 
> org.apache.phoenix.iterate.ParallelIterators.submitWork(ParallelIterators.java:104)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:871)
>   ... 71 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] phoenix pull request #220: PHOENIX-3394 Handle SequenceResolving through Con...

2016-11-08 Thread lomoree
Github user lomoree closed the pull request at:

https://github.com/apache/phoenix/pull/220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (PHOENIX-3466) Add StatementContext instance in PhoenixImplementorImpl

2016-11-08 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue resolved PHOENIX-3466.
--
Resolution: Fixed

> Add StatementContext instance in PhoenixImplementorImpl
> ---
>
> Key: PHOENIX-3466
> URL: https://issues.apache.org/jira/browse/PHOENIX-3466
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
>  Labels: calcite
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3333) can not load phoenix table by spark 2.0 which is working well under spark 1.6

2016-11-08 Thread Josh Mahonin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin updated PHOENIX-:
--
Attachment: PHOENIX--interim.patch

Patchfile of spark_2.0 branch on github:jmahonin/phoenix

Does not yet include maven / assembly changes

> can not load phoenix table by spark 2.0 which is working well under spark 1.6
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
> Attachments: PHOENIX--interim.patch
>
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.io.ObjectStreamClass.(ObjectStreamClass.java:472)
> at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
> at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
>   

[jira] [Commented] (PHOENIX-3333) can not load phoenix table by spark 2.0 which is working well under spark 1.6

2016-11-08 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648565#comment-15648565
 ] 

Josh Mahonin commented on PHOENIX-:
---

I've got a proof-of-concept version that works with Spark 2.0 here:
https://github.com/jmahonin/phoenix/tree/spark_2.0

Although this code compiles against either Spark 1.6 or Spark 2.0, 
unfortunately, due to Spark changing the DataFrame API, as well as a Scala 
version change, the resultant JAR isn't binary compatible with Spark versions < 
2.0.

Other projects have wrestled with this in a variety of ways, e.g. HBase 
[1|https://issues.apache.org/jira/browse/HBASE-16179], Cassandra 
[2|https://github.com/datastax/spark-cassandra-connector/pull/996] and 
ElasticSearch 
[3|https://github.com/elastic/elasticsearch-hadoop/commit/43017a2566f7b50ebca1e20e96820f0d037655ff]

In terms of simplicity, dropping all support for Spark 1.6 and below would be 
easiest, but least user friendly. Another option is to use maven profiles to 
switch between which Spark version Phoenix gets compiled against. The down-side 
there is it's not plainly obvious for those using the client JAR which version 
of Spark it will be compatible with. And yet another option is to create two 
client JARs which are compatible with specific Spark versions, but adds more 
bloat and complexity to the existing assembly process.

I'm leaning towards using a Maven profile that defaults to Spark 2.0+, but I'd 
be curious if other users (vendors?) have any opinions here.

cc [~jamestaylor] [~sergey.soldatov] [~kalyanhadoop] [~ankit.singhal] [~devaraj]

> can not load phoenix table by spark 2.0 which is working well under spark 1.6
> -
>
> Key: PHOENIX-
> URL: https://issues.apache.org/jira/browse/PHOENIX-
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
> Environment: spark 2.0 ,phoenix 4.8.0 , os is centos 6.7 ,hadoop is 
> hdp 2.5
>Reporter: dalin qin
>
> spark version is  2.0.0.2.5.0.0-1245
> As mentioned by Josh , I believe spark 2.0 changed their api so that failed 
> phoenix. Please come up with update version to adapt spark's change.
> In [1]: df = sqlContext.read \
>...:   .format("org.apache.phoenix.spark") \
>...:   .option("table", "TABLE1") \
>...:   .option("zkUrl", "namenode:2181:/hbase-unsecure") \
>...:   .load()
> ---
> Py4JJavaError Traceback (most recent call last)
>  in ()
> > 1 df = sqlContext.read   .format("org.apache.phoenix.spark")   
> .option("table", "TABLE1")   .option("zkUrl", 
> "namenode:2181:/hbase-unsecure")   .load()
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/readwriter.pyc in load(self, 
> path, format, schema, **options)
> 151 return 
> self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
> 152 else:
> --> 153 return self._df(self._jreader.load())
> 154
> 155 @since(1.4)
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
> 931 answer = self.gateway_client.send_command(command)
> 932 return_value = get_return_value(
> --> 933 answer, self.gateway_client, self.target_id, self.name)
> 934
> 935 for temp_arg in temp_args:
> /usr/hdp/2.5.0.0-1245/spark2/python/pyspark/sql/utils.pyc in deco(*a, **kw)
>  61 def deco(*a, **kw):
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
>  65 s = e.java_exception.toString()
> /usr/hdp/2.5.0.0-1245/spark2/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py 
> in get_return_value(answer, gateway_client, target_id, name)
> 310 raise Py4JJavaError(
> 311 "An error occurred while calling {0}{1}{2}.\n".
> --> 312 format(target_id, ".", name), value)
> 313 else:
> 314 raise Py4JError(
> Py4JJavaError: An error occurred while calling o43.load.
> : java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
> at java.lang.Class.getDeclaredMethods0(Native Method)
> at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
> at java.lang.Class.getDeclaredMethod(Class.java:2128)
> at 
> java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
> at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
> at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
> at java.security.AccessController.doPrivi

[jira] [Issue Comment Deleted] (PHOENIX-3355) Register Phoenix built-in functions as Calcite functions

2016-11-08 Thread Eric Lomore (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lomore updated PHOENIX-3355:
-
Comment: was deleted

(was: I actually think 1 is more intuitive, but what I'm possibly missing is 
how are we going to alias without if statements (if(name==CEIL)) or an extra 
map that maps CEIL, etc. onto its other function(s). To me, both of those 
solutions are bulky to maintain, and likely to break in the future. If we do 
all instantiations in a Factory, there's a single standard way to instantiate 
builtin functions. What do you think, is there another way to do aliasing in 
case 1?)

> Register Phoenix built-in functions as Calcite functions
> 
>
> Key: PHOENIX-3355
> URL: https://issues.apache.org/jira/browse/PHOENIX-3355
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Eric Lomore
>  Labels: calcite
> Attachments: PHOENIX-3355.function_constructor.patch, 
> PHOENIX-3355.wip, PHOENIX-3355.wip2
>
>
> We should register all Phoenix built-in functions that don't exist in Calcite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3466) Add StatementContext instance in PhoenixImplementorImpl

2016-11-08 Thread Maryann Xue (JIRA)
Maryann Xue created PHOENIX-3466:


 Summary: Add StatementContext instance in PhoenixImplementorImpl
 Key: PHOENIX-3466
 URL: https://issues.apache.org/jira/browse/PHOENIX-3466
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Maryann Xue
Assignee: Maryann Xue
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3465) order by incorrect when array column be used

2016-11-08 Thread Yuan Kang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuan Kang updated PHOENIX-3465:
---
Description: 
when I create a table like that:
create table "TABLE_A"
(
task_id varchar not null,
date varchar not null,
dim varchar not null,
valueArray double array,
dimNameArray varchar array,
constraint pk primary key (task_id, date, dim)
) SALT_BUCKETS = 4, COMPRESSION='SNAPPY';

upsert some data ,when I query a sql below :

select date, sum(valueArray[16]) as val1
from TABLE_A 
where date = '2016-11-01' and task_id = '4692' order by val1 desc limit 50;"

the result is incorrert.the similer issus was announced be fix in 4.5.0,this 
issus is happened in 4.8.0

  was:
when I create a table like that:
create table "TABLE_A"
(
task_id varchar not null,
date varchar not null,
dim varchar not null,
valueArray double array,
dimNameArray varchar array,
constraint pk primary key (task_id, date, dim)
) SALT_BUCKETS = 4, COMPRESSION='SNAPPY';

upsert some data ,when I query a sql blew :

select date, sum(valueArray[16]) as val1
from TABLE_A 
where date = '2016-11-01' and task_id = '4692' order by val1 desc limit 50;"

the result is incorrert.the similer issus was announced be fix in 4.5.0,this 
issus is happened in 4.8.0


> order by incorrect when array column be used
> 
>
> Key: PHOENIX-3465
> URL: https://issues.apache.org/jira/browse/PHOENIX-3465
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Yuan Kang
>  Labels: bug
>
> when I create a table like that:
> create table "TABLE_A"
> (
> task_id varchar not null,
> date varchar not null,
> dim varchar not null,
> valueArray double array,
> dimNameArray varchar array,
> constraint pk primary key (task_id, date, dim)
> ) SALT_BUCKETS = 4, COMPRESSION='SNAPPY';
> upsert some data ,when I query a sql below :
> select date, sum(valueArray[16]) as val1
> from TABLE_A 
> where date = '2016-11-01' and task_id = '4692' order by val1 desc limit 50;"
> the result is incorrert.the similer issus was announced be fix in 4.5.0,this 
> issus is happened in 4.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3465) order by incorrect when array column be used

2016-11-08 Thread Yuan Kang (JIRA)
Yuan Kang created PHOENIX-3465:
--

 Summary: order by incorrect when array column be used
 Key: PHOENIX-3465
 URL: https://issues.apache.org/jira/browse/PHOENIX-3465
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.8.0
Reporter: Yuan Kang


when I create a table like that:
create table "TABLE_A"
(
task_id varchar not null,
date varchar not null,
dim varchar not null,
valueArray double array,
dimNameArray varchar array,
constraint pk primary key (task_id, date, dim)
) SALT_BUCKETS = 4, COMPRESSION='SNAPPY';

upsert some data ,when I query a sql blew :

select date, sum(valueArray[16]) as val1
from TABLE_A 
where date = '2016-11-01' and task_id = '4692' order by val1 desc limit 50;"

the result is incorrert.the similer issus was announced be fix in 4.5.0,this 
issus is happened in 4.8.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3464) Pig doesn't handle hbase://query/ properly

2016-11-08 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646925#comment-15646925
 ] 

Adam Szita commented on PHOENIX-3464:
-

This will be handled in Pig by creating a marker interface for such 
non-filesystem based LoadFunc implementations. On Phoenix end we will have to 
wait until a new release comes out from Pig containing this feature (0.17.0) 
and update Pig version dependency as well.

> Pig doesn't handle hbase://query/ properly
> --
>
> Key: PHOENIX-3464
> URL: https://issues.apache.org/jira/browse/PHOENIX-3464
> Project: Phoenix
>  Issue Type: Bug
> Environment: Pig
>Reporter: Adam Szita
>  Labels: Pig
> Attachments: PHOENIX-3464.patch
>
>
> {code}
> A = load 'hbase://query/SELECT ID,NAME,DATE FROM HIRES WHERE DATE > 
> TO_DATE('1990-12-21 05:55:00.000');
> STORE A into 'output';
> {code}
> This will throw an exception in pig:
> Caused by: Failed to parse: Pig script failed to parse: 
>  pig script failed to validate: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI...
> Reason is that setHdfsServers method is called in pig, see PIG-4939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3464) Pig doesn't handle hbase://query/ properly

2016-11-08 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PHOENIX-3464:

Attachment: PHOENIX-3464.patch

> Pig doesn't handle hbase://query/ properly
> --
>
> Key: PHOENIX-3464
> URL: https://issues.apache.org/jira/browse/PHOENIX-3464
> Project: Phoenix
>  Issue Type: Bug
> Environment: Pig
>Reporter: Adam Szita
>  Labels: Pig
> Attachments: PHOENIX-3464.patch
>
>
> {code}
> A = load 'hbase://query/SELECT ID,NAME,DATE FROM HIRES WHERE DATE > 
> TO_DATE('1990-12-21 05:55:00.000');
> STORE A into 'output';
> {code}
> This will throw an exception in pig:
> Caused by: Failed to parse: Pig script failed to parse: 
>  pig script failed to validate: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI...
> Reason is that setHdfsServers method is called in pig, see PIG-4939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3464) Pig doesn't handle hbase://query/ properly

2016-11-08 Thread Adam Szita (JIRA)
Adam Szita created PHOENIX-3464:
---

 Summary: Pig doesn't handle hbase://query/ properly
 Key: PHOENIX-3464
 URL: https://issues.apache.org/jira/browse/PHOENIX-3464
 Project: Phoenix
  Issue Type: Bug
 Environment: Pig
Reporter: Adam Szita


{code}
A = load 'hbase://query/SELECT ID,NAME,DATE FROM HIRES WHERE DATE > 
TO_DATE('1990-12-21 05:55:00.000');
STORE A into 'output';
{code}

This will throw an exception in pig:
Caused by: Failed to parse: Pig script failed to parse: 
 pig script failed to validate: 
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
path in absolute URI...

Reason is that setHdfsServers method is called in pig, see PIG-4939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3464) Pig doesn't handle hbase://query/ properly

2016-11-08 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PHOENIX-3464:

Labels: Pig  (was: )

> Pig doesn't handle hbase://query/ properly
> --
>
> Key: PHOENIX-3464
> URL: https://issues.apache.org/jira/browse/PHOENIX-3464
> Project: Phoenix
>  Issue Type: Bug
> Environment: Pig
>Reporter: Adam Szita
>  Labels: Pig
>
> {code}
> A = load 'hbase://query/SELECT ID,NAME,DATE FROM HIRES WHERE DATE > 
> TO_DATE('1990-12-21 05:55:00.000');
> STORE A into 'output';
> {code}
> This will throw an exception in pig:
> Caused by: Failed to parse: Pig script failed to parse: 
>  pig script failed to validate: 
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI...
> Reason is that setHdfsServers method is called in pig, see PIG-4939



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)