[ 
https://issues.apache.org/jira/browse/PHOENIX-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054708#comment-15054708
 ] 

James Taylor commented on PHOENIX-2514:
---------------------------------------

I tried unsuccessfully to reproduce this in Phoenix 4.6.0 by doing the 
following:
- executed the following DDL (as using ':' in the table name is problematic as 
HBase wants to treat this as a namespace):
{code}
CREATE TABLE exDocStore (CURRENT_TIMESTAMP BIGINT NOT NULL, ID VARCHAR(96), 
CURR_EXDOC VARCHAR, CURR_CHECKSUM VARCHAR(32), PREV_EXDOC VARCHAR, 
PREV_CHECKSUM VARCHAR(32), PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR, 
OBJ_SUMMARY VARCHAR, PARAM_SAMPLES VARCHAR, BULK_PUBLISH_UUID VARCHAR, 
CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP, ID))
SALT_BUCKETS=8, BLOCKCACHE=FALSE;

CREATE INDEX IF NOT EXISTS indx_exdoc ON exDocStore(ID) INCLUDE (SUMMARY, 
OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID);
{code}
- cleaned up the data.csv by removing the extra lines on the top and the bottom 
and stripping out the quotes (which Phoenix does not want to be able to 
re-import it), removing extraneous columns which aren't relevant for this issue:
{code}
1448620592695,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cd65ac128a9832ecef4cb7d377ee09619
1448622730912,f066e30a746c9ddb6d6bbfa46e3c0797f2d1176f37751aa7b8ad85aaeb0e259b32becedc6cf7768560c1faadcd578404
1448620592817,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cc100f8b54fb5ef8f11f6c69f9235d143
1448623567634,71b6694ee9f60d591634ff7d39afc11c71b6694ee9f60d591634ff7d39afc11c71b6694ee9f60d591634ff7d39afc11c
1448623922082,71b6694ee9f60d591634ff7d39afc11c71f6a912cc4d0b8c6cc85aeb4a91cf3c87da25579939399f91028e70bac7f5f1
1448622731562,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c3f287250ac6e1864df9ca293ac2982f7
1448620592597,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cd4e83c046a0fddf1966ca0889b8dd4fb
1448623921643,71b6694ee9f60d591634ff7d39afc11c71f6a912cc4d0b8c6cc85aeb4a91cf3c9c8bc93d6b93be305777eb5bbdad8f92
1448622731909,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c208abf4193acede7fac6249d9b1854d8
1448622731498,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c49e3413cfa81767074f36286e33ebbc0
{code}
- imported using this command
{code}
./psql.py -t EXDOCSTORE -h CURRENT_TIMESTAMP,ID localhost data-cleansed.csv
{code}
- ran the following SQL queries which produced the correct result:
{code}
0: jdbc:phoenix:localhost> select current_timestamp from exdocstore order by 
current_timestamp desc limit 1;
+------------------------------------------+
|            CURRENT_TIMESTAMP             |
+------------------------------------------+
| 1448623922082                            |
+------------------------------------------+
1 row selected (0.12 seconds)
0: jdbc:phoenix:localhost> select max(current_timestamp) from exdocstore;
+------------------------------------------+
|         MAX("CURRENT_TIMESTAMP")         |
+------------------------------------------+
| 1448623922082                            |
+------------------------------------------+
1 row selected (0.061 seconds)
{code}

I'd recommend you create an end-to-end unit test that reproduces this issue, 
confirming first whether or not it occurs in 4.6.0. 

> Even with ORDER BY clause the LIMIT does not work correctly with salted 
> tables containing many records.
> -------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-2514
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2514
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.5.1
>         Environment: HBase-0.98.14
>            Reporter: Sumit Nigam
>            Priority: Critical
>              Labels: LIMIT, hbase, phoenix, salted
>             Fix For: 4.5.1
>
>         Attachments: data.zip
>
>
> A query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP 
> DESC LIMIT 1 does not really return the MAX(CURRENT_TIMESTAMP). The table is 
> salted and has 200272 records.
> select current_timestamp from TBL order by current_timestamp desc limit 1;
> +------------------------------------------+
> |            CURRENT_TIMESTAMP    |
> +------------------------------------------+
> | 1448815328556                            |
> +------------------------------------------+
> select max(current_timestamp) from TBL;
> +------------------------------------------+
> |         MAX("CURRENT_TIMESTAMP")         |
> +------------------------------------------+
> | 1449732792090                            |
> +------------------------------------------+
> The results are different. MAX is of course, returning the right record.
> The above query is one example. There are other queries which also seem to be 
> returning incorrect record with ORDER BY and LIMIT.
> Is this also correct that when there is a WHERE clause limiting the number of 
> projected records, then LIMIT seems to work fine? I seem to be noticing that 
> also.
> The table DDL is:
> CREATE TABLE IF NOT EXISTS TBL 
> (CURRENT_TIMESTAMP BIGINT NOT NULL, ID VARCHAR(96), CURR_EXDOC VARCHAR,  
> CURR_CHECKSUM VARCHAR(32), SUMMARY VARCHAR, 
> CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP, ID)) 
> BLOCKCACHE=FALSE, COMPRESSION=SNAPPY, SALT_BUCKETS=8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to