[
https://issues.apache.org/jira/browse/PHOENIX-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054708#comment-15054708
]
James Taylor commented on PHOENIX-2514:
---------------------------------------
I tried unsuccessfully to reproduce this in Phoenix 4.6.0 by doing the
following:
- executed the following DDL (as using ':' in the table name is problematic as
HBase wants to treat this as a namespace):
{code}
CREATE TABLE exDocStore (CURRENT_TIMESTAMP BIGINT NOT NULL, ID VARCHAR(96),
CURR_EXDOC VARCHAR, CURR_CHECKSUM VARCHAR(32), PREV_EXDOC VARCHAR,
PREV_CHECKSUM VARCHAR(32), PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR,
OBJ_SUMMARY VARCHAR, PARAM_SAMPLES VARCHAR, BULK_PUBLISH_UUID VARCHAR,
CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP, ID))
SALT_BUCKETS=8, BLOCKCACHE=FALSE;
CREATE INDEX IF NOT EXISTS indx_exdoc ON exDocStore(ID) INCLUDE (SUMMARY,
OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID);
{code}
- cleaned up the data.csv by removing the extra lines on the top and the bottom
and stripping out the quotes (which Phoenix does not want to be able to
re-import it), removing extraneous columns which aren't relevant for this issue:
{code}
1448620592695,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cd65ac128a9832ecef4cb7d377ee09619
1448622730912,f066e30a746c9ddb6d6bbfa46e3c0797f2d1176f37751aa7b8ad85aaeb0e259b32becedc6cf7768560c1faadcd578404
1448620592817,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cc100f8b54fb5ef8f11f6c69f9235d143
1448623567634,71b6694ee9f60d591634ff7d39afc11c71b6694ee9f60d591634ff7d39afc11c71b6694ee9f60d591634ff7d39afc11c
1448623922082,71b6694ee9f60d591634ff7d39afc11c71f6a912cc4d0b8c6cc85aeb4a91cf3c87da25579939399f91028e70bac7f5f1
1448622731562,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c3f287250ac6e1864df9ca293ac2982f7
1448620592597,f066e30a746c9ddb6d6bbfa46e3c079771f6a912cc4d0b8c6cc85aeb4a91cf3cd4e83c046a0fddf1966ca0889b8dd4fb
1448623921643,71b6694ee9f60d591634ff7d39afc11c71f6a912cc4d0b8c6cc85aeb4a91cf3c9c8bc93d6b93be305777eb5bbdad8f92
1448622731909,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c208abf4193acede7fac6249d9b1854d8
1448622731498,c1945b4a17b4f84c320a2e25d1bc4396160839998ffd47ff1c2be917019b332c49e3413cfa81767074f36286e33ebbc0
{code}
- imported using this command
{code}
./psql.py -t EXDOCSTORE -h CURRENT_TIMESTAMP,ID localhost data-cleansed.csv
{code}
- ran the following SQL queries which produced the correct result:
{code}
0: jdbc:phoenix:localhost> select current_timestamp from exdocstore order by
current_timestamp desc limit 1;
+------------------------------------------+
| CURRENT_TIMESTAMP |
+------------------------------------------+
| 1448623922082 |
+------------------------------------------+
1 row selected (0.12 seconds)
0: jdbc:phoenix:localhost> select max(current_timestamp) from exdocstore;
+------------------------------------------+
| MAX("CURRENT_TIMESTAMP") |
+------------------------------------------+
| 1448623922082 |
+------------------------------------------+
1 row selected (0.061 seconds)
{code}
I'd recommend you create an end-to-end unit test that reproduces this issue,
confirming first whether or not it occurs in 4.6.0.
> Even with ORDER BY clause the LIMIT does not work correctly with salted
> tables containing many records.
> -------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-2514
> URL: https://issues.apache.org/jira/browse/PHOENIX-2514
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.1
> Environment: HBase-0.98.14
> Reporter: Sumit Nigam
> Priority: Critical
> Labels: LIMIT, hbase, phoenix, salted
> Fix For: 4.5.1
>
> Attachments: data.zip
>
>
> A query such as SELECT CURRENT_TIMESTAMP FROM TBL ORDER BY CURRENT_TIMESTAMP
> DESC LIMIT 1 does not really return the MAX(CURRENT_TIMESTAMP). The table is
> salted and has 200272 records.
> select current_timestamp from TBL order by current_timestamp desc limit 1;
> +------------------------------------------+
> | CURRENT_TIMESTAMP |
> +------------------------------------------+
> | 1448815328556 |
> +------------------------------------------+
> select max(current_timestamp) from TBL;
> +------------------------------------------+
> | MAX("CURRENT_TIMESTAMP") |
> +------------------------------------------+
> | 1449732792090 |
> +------------------------------------------+
> The results are different. MAX is of course, returning the right record.
> The above query is one example. There are other queries which also seem to be
> returning incorrect record with ORDER BY and LIMIT.
> Is this also correct that when there is a WHERE clause limiting the number of
> projected records, then LIMIT seems to work fine? I seem to be noticing that
> also.
> The table DDL is:
> CREATE TABLE IF NOT EXISTS TBL
> (CURRENT_TIMESTAMP BIGINT NOT NULL, ID VARCHAR(96), CURR_EXDOC VARCHAR,
> CURR_CHECKSUM VARCHAR(32), SUMMARY VARCHAR,
> CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP, ID))
> BLOCKCACHE=FALSE, COMPRESSION=SNAPPY, SALT_BUCKETS=8
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)