[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

chenglei (JIRA) Fri, 11 Nov 2016 05:15:17 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656799#comment-15656799
 ]


chenglei edited comment on PHOENIX-3451 at 11/11/16 1:14 PM:
-------------------------------------------------------------

[~jamestaylor], In order to remove the effect of MULTI_TENANT, I add  the 
"ORGANIZATION_ID" column as the first column of the index TEST_SCORE , so this 
problem can be reproduced for a ordinary table,just as following:
{code:borderStyle=solid} 
     CREATE TABLE IF NOT EXISTS TEST.TEST (
            ORGANIZATION_ID CHAR(15) NOT NULL,
            CONTAINER_ID CHAR(15) NOT NULL,
            ENTITY_ID CHAR(15) NOT NULL,
            SCORE DOUBLE,
            CONSTRAINT TEST_PK PRIMARY KEY (
               ORGANIZATION_ID,
               CONTAINER_ID,
               ENTITY_ID
             )
         )

    CREATE INDEX IF NOT EXISTS TEST_SCORE ON  
TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC);

    UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
    UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
    UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
    UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
    UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
    UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);

    SELECT DISTINCT entity_id, score
    FROM test.test
    WHERE organization_id = 'org2'
    AND container_id IN ( 'container1','container2','container3' )
    ORDER BY score DESC
    LIMIT 2
{code} 

and I also wrote a simple unit case to reproduce this problem without index:

{code:borderStyle=solid}
    @Test
    public void testBug3451() throws Exception
    {
        Connection conn = DriverManager.getConnection(getUrl());
        try {
                String tableName="GROUPBY3451_TEST";        
                conn.createStatement().execute("DROP TABLE if exists 
"+tableName);
                String sql="CREATE TABLE "+tableName+" ( "+
                                        "ORGANIZATION_ID CHAR(15) NOT NULL,"+
                                        "CONTAINER_ID CHAR(15) NOT NULL,"+
                                        "SCORE DOUBLE NOT NULL,"+
                                        "ENTITY_ID CHAR(15) NOT NULL,"+         
                                        "CONSTRAINT TEST_PK PRIMARY KEY ( "+
                                        "ORGANIZATION_ID,"+
                                        "CONTAINER_ID,"+
                                        "SCORE DESC,"+
                                        "ENTITY_ID DESC"+
                                        "))";
                  conn.createStatement().execute(sql);
                  sql="SELECT DISTINCT entity_id, score FROM "+tableName+" 
WHERE organization_id = 'org1' AND container_id IN ( 
'container1','container2','container3' ) ORDER BY score DESC LIMIT 2";
                  QueryPlan queryPlan=getQueryPlan(conn, sql);
                  
assertFalse(queryPlan.getOrderBy()==OrderBy.FWD_ROW_KEY_ORDER_BY);
        }
        finally{
              if(conn!=null)
              {
                  conn.close();
              }
        }
    }
{code}  

[~jamestaylor], this problem can only be reproduced with GroupBy,so in order 
not to interfere with IT tests, maybe we should fix PHOENIX-3452 first.I will 
upload my first patch soon.


was (Author: comnetwork):
[~jamestaylor], In order to remove the effect of MULTI_TENANT, I create a new 
index , this problem can be reproduced with a ordinary table as following:
{code:borderStyle=solid} 
     CREATE TABLE IF NOT EXISTS TEST.TEST (
            ORGANIZATION_ID CHAR(15) NOT NULL,
            CONTAINER_ID CHAR(15) NOT NULL,
            ENTITY_ID CHAR(15) NOT NULL,
            SCORE DOUBLE,
            CONSTRAINT TEST_PK PRIMARY KEY (
               ORGANIZATION_ID,
               CONTAINER_ID,
               ENTITY_ID
             )
         )

    CREATE INDEX IF NOT EXISTS TEST_SCORE ON  
TEST.TEST(ORGANIZATION_ID,CONTAINER_ID, SCORE DESC, ENTITY_ID DESC);

    UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
    UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
    UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
    UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
    UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
    UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);

    SELECT DISTINCT entity_id, score
    FROM test.test
    WHERE organization_id = 'org2'
    AND container_id IN ( 'container1','container2','container3' )
    ORDER BY score DESC
    LIMIT 2
{code} 

and I also wrote a simple unit case to reproduce this problem without index:

{code:borderStyle=solid}
    @Test
    public void testBug3451() throws Exception
    {
        Connection conn = DriverManager.getConnection(getUrl());
        try {
                String tableName="GROUPBY3451_TEST";        
                conn.createStatement().execute("DROP TABLE if exists 
"+tableName);
                String sql="CREATE TABLE "+tableName+" ( "+
                                        "ORGANIZATION_ID CHAR(15) NOT NULL,"+
                                        "CONTAINER_ID CHAR(15) NOT NULL,"+
                                        "SCORE DOUBLE NOT NULL,"+
                                        "ENTITY_ID CHAR(15) NOT NULL,"+         
                                        "CONSTRAINT TEST_PK PRIMARY KEY ( "+
                                        "ORGANIZATION_ID,"+
                                        "CONTAINER_ID,"+
                                        "SCORE DESC,"+
                                        "ENTITY_ID DESC"+
                                        "))";
                  conn.createStatement().execute(sql);
                  sql="SELECT DISTINCT entity_id, score FROM "+tableName+" 
WHERE organization_id = 'org1' AND container_id IN ( 
'container1','container2','container3' ) ORDER BY score DESC LIMIT 2";
                  QueryPlan queryPlan=getQueryPlan(conn, sql);
                  
assertFalse(queryPlan.getOrderBy()==OrderBy.FWD_ROW_KEY_ORDER_BY);
        }
        finally{
              if(conn!=null)
              {
                  conn.close();
              }
        }
    }
{code}  

[~jamestaylor], this problem can only be reproduced with GroupBy,so in order 
not to interfere with IT tests, maybe we should fix PHOENIX-3452 first.

> Secondary index and query using distinct: LIMIT doesn't return the first rows
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-3451
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3451
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.8.0
>            Reporter: Joel Palmert
>            Assignee: chenglei
>
> This may be related to PHOENIX-3452 but the behavior is different so filing 
> it separately.
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
>     ORGANIZATION_ID CHAR(15) NOT NULL,
>     CONTAINER_ID CHAR(15) NOT NULL,
>     ENTITY_ID CHAR(15) NOT NULL,
>     SCORE DOUBLE,
>     CONSTRAINT TEST_PK PRIMARY KEY (
>         ORGANIZATION_ID,
>         CONTAINER_ID,
>         ENTITY_ID
>     )
> ) VERSIONS=1, MULTI_TENANT=TRUE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (CONTAINER_ID, SCORE DESC, 
> ENTITY_ID DESC);
> UPSERT INTO test.test VALUES ('org2','container2','entityId6',1.1);
> UPSERT INTO test.test VALUES ('org2','container1','entityId5',1.2);
> UPSERT INTO test.test VALUES ('org2','container2','entityId4',1.3);
> UPSERT INTO test.test VALUES ('org2','container1','entityId3',1.4);
> UPSERT INTO test.test VALUES ('org2','container3','entityId7',1.35);
> UPSERT INTO test.test VALUES ('org2','container3','entityId8',1.45);
> EXPLAIN
> SELECT DISTINCT entity_id, score
> FROM test.test
> WHERE organization_id = 'org2'
> AND container_id IN ( 'container1','container2','container3' )
> ORDER BY score DESC
> LIMIT 2
> OUTPUT
> entityId5    1.2
> entityId3    1.4
> The expected out out would be
> entityId8    1.45
> entityId3    1.4
> You will get the expected output if you remove the secondary index from the 
> table or remove distinct from the query.
> As described in PHOENIX-3452 if you run the query without the LIMIT the 
> ordering is not correct. However, the 2first results in that ordering is 
> still not the onces returned by the limit clause, which makes me think there 
> are multiple issues here and why I filed both separately. The rows being 
> returned are the ones assigned to container1. It looks like Phoenix is first 
> getting the rows from the first container and when it finds that to be enough 
> it stops the scan. What it should be doing is getting 2 results for each 
> container and then merge then and then limit again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3451) Secondary index and query using distinct: LIMIT doesn't return the first rows

Reply via email to