[ https://issues.apache.org/jira/browse/PHOENIX-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188864#comment-16188864 ]
Sergey Soldatov commented on PHOENIX-3112: ------------------------------------------ [~lhofhansl] as I wrote earlier, Scan.allowPartialResults only defines whether application receive the partial result from HBase client or not. On the server side HBase produces partial results if client support it (and it's hardcoded) and client does all stitching stuff. Number of rows client may specify by JDBC setFetchSize(). Another point - we are using internal scanners for aggregation/join stuff and at the moment they are using the same ScannerContext that was instantiated during the original scan. So all limits applied to the internal scanner results that have nothing with the result we need to return to the client. So, the patch has a simple change - for all delegates it calls nextRaw without scanner context. To track the progress it updates the original scanner context with the values that it received from the delegate. For regular scan it just a simple prevention from getting a partial result. For aggregations/joins it updates the progress basing on the result of operation, but not on the data that was processed by internal scanner. > Partial row scan not handled correctly > -------------------------------------- > > Key: PHOENIX-3112 > URL: https://issues.apache.org/jira/browse/PHOENIX-3112 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.7.0 > Reporter: Pierre Lacave > Assignee: Sergey Soldatov > Priority: Blocker > Fix For: 4.12.0 > > Attachments: PHOENIX-3112-1.patch, PHOENIX-3112-ssa-v4.patch, > PHOENIX-3112_v3.patch, PHOENIX-3112_wip2.patch > > > When doing a select of a relatively large table (a few touthands rows) some > rows return partially missing. > When increasing the fitler to return those specific rows, the values appear > as expected > {noformat} > CREATE TABLE IF NOT EXISTS TEST ( > BUCKET VARCHAR, > TIMESTAMP_DATE TIMESTAMP, > TIMESTAMP UNSIGNED_LONG NOT NULL, > SRC VARCHAR, > DST VARCHAR, > ID VARCHAR, > ION VARCHAR, > IC BOOLEAN NOT NULL, > MI UNSIGNED_LONG, > AV UNSIGNED_LONG, > MA UNSIGNED_LONG, > CNT UNSIGNED_LONG, > DUMMY VARCHAR > CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC) > );{noformat} > using a python script to generate a CSV with 5000 rows > {noformat} > for i in xrange(5000): > print "5SEC,2016-07-21 > 07:25:35.{i},146908593500{i},WWWWWWWW,AAA,BBBB,CCCCCCCC,false,{i}1181000,1788000{i},2497001{i},{i},aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa{i}".format(i=i) > {noformat} > bulk inserting the csv in the table > {noformat} > phoenix/bin/psql.py localhost -t TEST large.csv > {noformat} > here we can see one row that contains no TIMESTAMP_DATE and null values in MI > and MA > {noformat} > 0: jdbc:phoenix:localhost:2181> select * from TEST > .... > +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+ > | BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST | > ID | ION | IC | MI | AV | MA | > CNT | DUMMY > | > +---------+--------------------------+-------------------+-----------+------+-------+-----------+--------+--------------+--------------+--------------+-------+----------------------------------------------------------------------------+ > | 5SEC | 2016-07-21 07:25:35.100 | 1469085935001000 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 10001181000 | 17880001000 | 24970011000 | > 1000 | > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1000 | > | 5SEC | 2016-07-21 07:25:35.999 | 146908593500999 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9991181000 | 1788000999 | 2497001999 | 999 > | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa999 > | > | 5SEC | 2016-07-21 07:25:35.998 | 146908593500998 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9981181000 | 1788000998 | 2497001998 | 998 > | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa998 > | > | 5SEC | | 146908593500997 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | null | 1788000997 | null | 997 > | > | > | 5SEC | 2016-07-21 07:25:35.996 | 146908593500996 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9961181000 | 1788000996 | 2497001996 | 996 > | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa996 > | > | 5SEC | 2016-07-21 07:25:35.995 | 146908593500995 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9951181000 | 1788000995 | 2497001995 | 995 > | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa995 > | > | 5SEC | 2016-07-21 07:25:35.994 | 146908593500994 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9941181000 | 1788000994 | 2497001994 | 994 > | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa994 > | > .... > {noformat} > but when selecting that row specifically the values are correct > {noformat} > 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp = > 146908593500997; > +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+ > | BUCKET | TIMESTAMP_DATE | TIMESTAMP | SRC | DST | > ID | ION | IC | MI | AV | MA | CNT | > DUMMY | > +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+ > | 5SEC | 2016-07-21 07:25:35.997 | 146908593500997 | WWWWWWWW | AAA | > BBBB | CCCCCCCC | false | 9971181000 | 1788000997 | 2497001997 | 997 | > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa997 | > +---------+--------------------------+------------------+-----------+------+-------+-----------+--------+-------------+-------------+-------------+------+---------------------------------------------------------------------------+ > 1 row selected (0.159 seconds){noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)