Hi Qifan, I explained the CAST business poorly.
Actually, in the compiler we are generating a Narrow node. This is a non-standard node that casts a value from a larger data type to a smaller one. It is necessary for MDAM in order to properly populate key buffers. Key buffers have the data type of the key, which may be smaller than the value the key is being compared against. It is perfectly fine in ANSI SQL to issue a query such as SELECT * FROM T WHERE X > 100000, given a SMALLINT column X. Such a query should always return zero rows as the predicate can never be true for a SMALLINT. If we choose a non-key access path, the compiler internally will convert X > 100000 to CAST(X TO INTEGER) > 100000. And the predicate will evaluate just fine. For a key access path however we need to make 100000 smaller, rather than making X bigger, for the aforementioned reason. Hence the Narrow node. The Narrow node has the property that it has two outputs, one being the value being converted and the other being a "data conversion error" flag. That flag can then be used to manipulate inclusion/exclusion on a key predicate. So, for example, X < 100000 can be changed at run time to X <= 32767 if X is SMALLINT SIGNED. So, no, we do not want to raise a SQL error in this case. Ironically, to do so would make us non-standard. Dave -----Original Message----- From: Qifan Chen <[email protected]> Sent: Wednesday, August 15, 2018 2:13 PM To: [email protected] Subject: Re: problems in ExHbaseAccessTcb and MdamColumn Hi Dave, Thanks a lot for the good explanation on the MDAM code. On the data conversion error you gave as an example, it seems the best treatment is to raise it as a SQL error, per SQL standard quoted below. Perhaps we are already doing it? Thanks --Qifan ISO/IEC 9075-2:2003 (E) 6.12 <cast specification> yielding value TVEi. d) If TD is an array type, then let TC be the maximum cardinality of TD. Case: i) If SC is greater than TC, then an exception condition is raised: data exception - array data, right truncation. ii) Otherwise, TV is the array with elements TVEi, 1 (one) ≤ i ≤ SC. e) If TD is a multiset type, then TV is the multiset with elements TVEi, 1 (one) ≤ i ≤ SC. If SD is a row type, then: a) For i varying from 1 (one) to DSD, the <cast specification> is applied: CAST ( FSDi AS TFTDi ) yielding a value TVEi. b) TV is ROW ( TVE1, TVE2, ..., TVEDSD ). If TD is exact numeric, then Case: a) If SD is exact numeric or approximate numeric, then Case: i) If there is a representation of SV in the data type TD that does not lose any leading significant digits after rounding or truncating if necessary, then TV is that representation. The choice of whether to round or truncate is implementation-defined. ii) Otherwise, an exception condition is raised: data exception - numeric value out of range. ________________________________ From: Dave Birdsall <[email protected]> Sent: Wednesday, August 15, 2018 1:03:43 PM To: [email protected] Subject: RE: problems in ExHbaseAccessTcb and MdamColumn Hi, My previous answer was incorrect. The code at hand is execution time code, not compile time code. When a predicate of the form X > 100000 for a SMALLINT column X is executed, predPtr->getValue(atp0,workAtp) returns ex_expr::EXPR_OK; the data conversion error of converting 100000 to a SMALLINT is reflected instead in dataConvErrorFlag. So, I am guessing that the problem you are trying to solve is that the expression evaluator is returning a genuine error, one that does not involve data conversions. However I just tried such an example but observed that the error was returned (see below). So the error is returned in spite of MdamColumn::buildDisjunct() not reporting it via a return code. I will guess that instead higher level code looks for the presence of errors in ComDiagsArea. Perhaps the issue is we don't respond to the error quickly enough? Thanks, Dave >>showddl tmdam; CREATE TABLE TRAFODION.SEABASE.TMDAM ( A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , B INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , C INT DEFAULT NULL NOT SERIALIZED , PRIMARY KEY (A ASC, B ASC) ) ATTRIBUTES ALIGNED FORMAT ; -- GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON TRAFODION.SEABASE.TMDAM TO DB__ROOT WITH GRANT OPTION; --- SQL operation complete. >>log example.txt; >>showddl tmdam; CREATE TABLE TRAFODION.SEABASE.TMDAM ( A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , B INT NO DEFAULT NOT NULL NOT DROPPABLE NOT SERIALIZED , C INT DEFAULT NULL NOT SERIALIZED , PRIMARY KEY (A ASC, B ASC) ) ATTRIBUTES ALIGNED FORMAT ; -- GRANT SELECT, INSERT, DELETE, UPDATE, REFERENCES ON TRAFODION.SEABASE.TMDAM TO DB__ROOT WITH GRANT OPTION; --- SQL operation complete. >>showstats for table tmdam on existing columns; Histogram data for Table TRAFODION.SEABASE.TMDAM Table ID: 8581215122595296219 Hist ID # Ints Rowcount UEC Colname(s) ========== ====== =========== =========== =========================== 1638572183 24 16384 24 C 1638572188 48 16384 8192 B 1638572193 2 16384 2 A 1638572198 1 16384 16384 A, B --- SQL operation complete. >>prepare s2 from select * from tmdam where b = 'abc'; --- SQL command prepared. >>showshape select * from tmdam where b = 'abc'; control query shape scan(path 'TRAFODION.SEABASE.TMDAM', forward , blocks_per_access 4 , mdam forced, mdam_columns all(dense, sparse)); --- SQL operation complete. >>execute s2; Breakpoint 2, MdamPred::getValue_ (this=0x7ffff7eec520, value=..., atp0=0x7ffff7eb7390, atp1=0x7ffff7eb6cf8) at ../comexe/ComKeyMDAM.cpp:600 600 ex_expr::exp_return_type returnExpReturnType = ex_expr::EXPR_OK; (gdb) c Continuing. *** ERROR[8413] The string argument contains characters that cannot be converted. Source data(in hex): 616263 --- 0 row(s) selected. >> -----Original Message----- From: Dave Birdsall Sent: Wednesday, August 15, 2018 9:37 AM To: [email protected] Subject: RE: problems in ExHbaseAccessTcb and MdamColumn Hi, Could you elaborate on what problem you are trying to solve? I think I might have written this code, but it was over 20 years ago. I'll try to refresh my memory to explain what is going on in this code. Looking at the code: 587 ex_expr::exp_return_type errorCode = predPtr->getValue(atp0,workAtp); 588 589 Int32 dcErrFlag1 = dataConvErrorFlag; 590 Int32 dcErrFlag2 = 0; 591 if (errorCode == ex_expr::EXPR_OK && 592 predPtr->getPredType() == MdamPred::MDAM_BETWEEN) 593 { 594 dataConvErrorFlag = 0; 595 errorCode = predPtr->getValue2(atp0,workAtp); 596 dcErrFlag2 = dataConvErrorFlag; 597 } 598 599 MdamPred::MdamPredType predType = MdamPred::MDAM_RETURN_FALSE; 600 // Next 2 used only for MDAM_BETWEEN. 601 MdamEnums::MdamInclusion startInclusion = predPtr->getStartInclusion(); 602 MdamEnums::MdamInclusion endInclusion = predPtr->getEndInclusion(); 603 if (errorCode == ex_expr::EXPR_OK) 604 predType = predPtr->getTransformedPredType(dcErrFlag1, dcErrFlag2, 605 startInclusion, endInclusion); errorCode is set at line 587. It is checked at line 591; if everything is OK we go ahead and process the second value if it is a BETWEEN. Then errCode is again set at line 595. At line 599, we are pessimistic; we set predType to FALSE. Then if there were no errors at line 603, we then compute the predType at line 604. If I remember correctly, the point is that when processing certain key predicates (MDAM predicates are key predicates) we expect to get certain errors. For example, if we have a SMALLINT column X, and we have a predicate X > 100000, the compiler will down-cast the 100000 for the comparison. So when we get here the predicate is transformed to X < CAST(100000 AS SMALLINT). This CAST results in an error, because 100000 is greater than any possible SMALLINT value. So in this case, we want the predicate to always evaluate as FALSE. And that is precisely what this code does. The handling of errorCode locally in this procedure is quite intentional. If we change this code to instead pass the error up to the caller, we may get compile time errors for a predicate that is perfectly OK. Thanks, Dave -----Original Message----- From: Zhu, Wen-Jun <[email protected]> Sent: Tuesday, August 14, 2018 9:10 PM To: [email protected] Subject: problems in ExHbaseAccessTcb and MdamColumn Hi, in MdamColumn::buildDisjunct() at ../executor/ex_mdam.cpp: 587 ex_expr::exp_return_type errorCode = predPtr->getValue(atp0,workAtp); 588 589 Int32 dcErrFlag1 = dataConvErrorFlag; 590 Int32 dcErrFlag2 = 0; 591 if (errorCode == ex_expr::EXPR_OK && 592 predPtr->getPredType() == MdamPred::MDAM_BETWEEN) 593 { 594 dataConvErrorFlag = 0; 595 errorCode = predPtr->getValue2(atp0,workAtp); 596 dcErrFlag2 = dataConvErrorFlag; 597 } when errorCode is not OK on line 587, it is not checked immediately and is then reused. So I suggest to change the interface of function MdamColumn::buildDisjunct(), making the return value type from NABoolean to int, to distingush the normal value and error code. And in function keyMdamEx::buildNetwork(), which calls buildDisjunct(), the current logic requires the returned value must be true, why? If there is an error occurred, we can ignore that ex_assert() and just return the errorCode, right? Another problem, ExHbaseAccessTcb::initNextKeyRange() in core/sql/executor/ExHbaseAccess.cpp: 2430 Lng32 ExHbaseAccessTcb::initNextKeyRange(sql_buffer_pool *pool, 2431 atp_struct * atp) 2432 { 2433 if (keyExeExpr()) 2434 keyExeExpr()->initNextKeyRange(pool, atp); 2435 else 2436 return -1; 2437 2438 return 0; 2439 } which does not check the returned value of initNextKeyRange(), so if there is something wrong in it, but keyExeExpr() is fine, this function returns 0, indicating success. Regards, Wenjun Zhu
