[ https://issues.apache.org/jira/browse/PHOENIX-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011051#comment-15011051 ]
Josh Mahonin commented on PHOENIX-2169: --------------------------------------- Hi [~Nilansg] are you able to post your schema DDL and query, if possible? > Illegal data error on UPSERT SELECT and JOIN with salted tables > --------------------------------------------------------------- > > Key: PHOENIX-2169 > URL: https://issues.apache.org/jira/browse/PHOENIX-2169 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.5.0 > Reporter: Josh Mahonin > Attachments: PHOENIX-2169-bug.patch > > > I have an issue where I get periodic failures (~50%) for an UPSERT SELECT > query involving a JOIN on salted tables. Unfortunately I haven't been able to > create a reproducible test case yet, though I'll keep trying. I believe this > same behaviour existed in 4.3.1 as well, so I don't think it's a regression. > The upsert query itself looks something like this: > {code} > UPSERT INTO a(tid, ds, etp, eid, ts, atp, rel, tp, tpid, dt, pro) > SELECT c.tid, > c.ds, > c.etp, > c.eid, > c.dh, > 0, > c.rel, > c.tp, > c.tpid, > current_time(), > 1.0 / s.th > FROM e_c c > join e_s s > ON s.tid = c.tid > AND s.ds = c.ds > AND s.etp = c.etp > AND s.eid = c.eid > WHERE c.tid = 'FOO'; > {code} > Without the upsert, the query always returns the right data, but with the > upsert, it ends up with failures like: > Error: ERROR 201 (22000): Illegal data. ERROR 201 (22000): Illegal data. > Expected length of at least 109 bytes, but had 19 (state=22000,code=201) > The explain plan looks like: > {code} > UPSERT SELECT > CLIENT 16-CHUNK PARALLEL 16-WAY RANGE SCAN OVER E_C [0,'FOO'] > SERVER FILTER BY FIRST KEY ONLY > PARALLEL INNER-JOIN TABLE 0 > CLIENT 16-CHUNK PARALLEL 16-WAY FULL SCAN OVER E_S > DYNAMIC SERVER FILTER BY (C.TID, C.DS, C.ETP, C.EID) IN ((S.TID, S.DS, > S.ETP, S.EID)) > {code} > I'm using SALT_BUCKETS=16 for both tables in the join, and this is a dev > environment, so only 1 region server. Note that without salted tables, I have > no issue with this query. > The number of rows in E_C is around 23K, and the number of rows in E_S is 62. -- This message was sent by Atlassian JIRA (v6.3.4#6332)