[GitHub] drill issue #517: DRILL-4704 fix

daveoshinsky Thu, 04 Aug 2016 19:45:01 -0700

Github user daveoshinsky commented on the issue:

    https://github.com/apache/drill/pull/517
  
    Thanks for looking at this, Paul. Â If there are problems with comparing an 
input decimal (from the "where clause", for example) against the data, it is 
because of the algorithms for comparing decimal values of different precisions, 
not because the proper precision was computed for the input decimal (e.g., 
based on the actual numerical value, as my PR-517 fix is doing when no 
precision is provided initially). Â I was going to comment about the idea of 
using a fixed value of precision=10, but you beat me to it. Â It does not 
suffice for larger integers, like those that require a "long" instead of an 
"int" to represent. Â So, I suggest that we take a "divide and conquer" 
approach with this problem. Â First, compute a proper precision for the value 
casted from int or BigInt. Â Next, make sure the comparison with other decimal 
values works for any other decimal value precision (a separate issue).
    Decimal support in Drill is disabled by default for good reasons. Â The 
(decimal) design itself is fundamentally flawed, and it has all sorts of issues 
related to code maintainability as well as runtime performance. Â I have been 
experimenting in a Drill "play build node" with having a single, 
one-size-fits-all variable width vector class that represents any decimal value 
efficiently (I call it "VarDecimal"). Â It's difficult to even get this to 
compile (with all of the FreeMarker use, and the numerous conversions between 
all of the decimal and non-decimal types), but eventually I hope to experiment 
with this to deal with the DRILL-4184 problem I encountered a number of months 
ago (for which I have a pull request PR-372 with a short-term, but not a clean, 
fix).
    As food for thought, while looking at this experimental "VarDecimal" thing, 
I noticed that there are (numerous) casts/conversions between all of the 
pairwise combinations of different decimal precisions (DECIMAL9, DECIMAL18, 
DECIMAL28, DECIMAL38, both dense and sparse, from what I recall), and between 
all of these and all other types that have numeric interpretation. Â Eventually 
replacing all of these different DECIMAL* fixed width representations with a 
single VarDecimal variable width representation (one-size-fits-all) would be 
much more efficient (memory-wise) at runtime (for typical scenarios where most 
actual numeric values don't require the full precision to represent), would 
greatly simplify parts of the Drill code, and would fix DRILL-4184 cleanly. 
    Until some major re-work can be done for Drill decimal support, we will 
probably have to settle for small, incremental improvements. Â I would say that 
PR-517 represents one such improvement.
    
        On Thursday, August 4, 2016 9:31 PM, Paul Rogers 
<[email protected]> wrote:
     
    
     The plot thickens. I tried the fix of setting the precision to a constant 
of 10. This uncovered a larger issue. The template in question generates cast 
functions for (INTEGER, BIGINT) x (DECIMAL9, DECIMAL18, DECIMAL28) and perhaps 
others. The constant of 10 does not, of course, work for BIGINT (long) 
values.The trick is that precision=10 won't work for DECIMAL9 either. Dave's 
solution has a similar problem. Dave sets the precision to whatever is right 
for the input value, which seems great. But, that value could be too large for 
the output DECIMAL type.What we need is to set the precision to the min( max 
int precision, max decimal precision ). Or, if we use Dave's proposed solution, 
max( input arg precision, max decimal precision ).In either case, the code must 
handle overflow. Passing a Long.MAX_VALUE or even Integer.MAX_VALUE to 
CastBigIntDecimal9( ) should cause an overflow error or data truncation. I'll 
research how that worked previously to see if we've uncovered a new issu
 e, or if a solution already exists.â
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or mute the thread.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #517: DRILL-4704 fix

Reply via email to