[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330182#comment-15330182 ] Sergey Shelukhin commented on HIVE-13957: - Committed there too. Thanks! > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329225#comment-15329225 ] Jesus Camacho Rodriguez commented on HIVE-13957: [~sershe], fix can be pushed to branch-2.1 and fix version set to 2.1.1. About 2.1.0, it is waiting for you vote! :p Thanks > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.0.2 > > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328211#comment-15328211 ] Matt McCline commented on HIVE-13957: - (Patch #3) LGTM +1 > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328079#comment-15328079 ] Sergey Shelukhin commented on HIVE-13957: - [~mmccline] [~gopalv] ping? > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325434#comment-15325434 ] Sergey Shelukhin commented on HIVE-13957: - Test failures are unrelated. > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322645#comment-15322645 ] Hive QA commented on HIVE-13957: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12809044/HIVE-13957.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10224 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/61/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/61/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-61/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12809044 - PreCommit-HIVE-MASTER-Build > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, > HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list. > This can cause queries to produce incorrect results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319216#comment-15319216 ] Sergey Shelukhin commented on HIVE-13957: - Well, then we should do this first for correctness, then do other things as a follow-up. > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319192#comment-15319192 ] Gopal V commented on HIVE-13957: It is not handled for concat and concat fails vectorizing entire vertices because it doesn't wrap a UDFToString() around all args - see HIVE-7160 > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319187#comment-15319187 ] Sergey Shelukhin commented on HIVE-13957: - Yeah, that's what I was alluding to above. It seems like it would be a complex fix. Where is that handled for concat? > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319177#comment-15319177 ] Gopal V commented on HIVE-13957: bq. derives the common type for the column and constants, and casts both columns and constants to that whenever needed. The cast is implicit right now - the UDF does the conversion behind the scenes. Making the cast explicit actually would do the trick instead. {{id in ('1', '2')}} could be written in 2 ways. {{cast(id as string) in (...)}} or {{id in (cast('1' as decimal(17,2)), ...)}} Vectorization should have no trouble with either disambiguated forms, but cannot deal with the actual lack of type conversions in the plan. > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319171#comment-15319171 ] Sergey Shelukhin commented on HIVE-13957: - Can you elaborate? The problem is the difference between the approaches, not the type of the cast per se. Normal IN, at some point that doesn't really matter, derives the common type for the column and constants, and casts both columns and constants to that whenever needed. Whereas vectorization always tries to convert the constants to the column type, the reason being (I supposed) that the specializations for IN all have a particular column type in mind. I am not actually very familiar with these and whether it would be easy to incorporate a cast; I assume the cast of the column would need to come earlier than the specialized IN (i.e. specialized IN should already be able to utilize values of the correct type straight out of the VRB), which would require the vectorizer to modify the plan above the IN. Or something like that. We could do that, however, as far as I see, it's not the solution we want, because of the following. First, in case of decimal-string, this issue can produce incorrect results, so we want a simple fix for that, which the above isn't. >From the long term perspective, I'd say we need to prohibit implicit casts in >this case (I opened a separate JIRA) AND/OR change non-vectorized pipeline >rather than vectorized, because casting decimal column to string in this case >(what the non-vectorized IN does) is not the intuitively logical thing for the >user and may produce unexpected result. With the latter in mind, we /could/ fix the proximate issue in vectorized code (cast to decimal(38,38) that ends up converting all reasonable values to null), e.g. constrain the precision and scale to the column type (potentially +2/+1 for NOT, although the enforcement will probably convert the values that don't fit to NULL), assuming the values are trimmed, since more should never be needed. But that's still inconsistent with normal IN, and we should probably do it later. Actually, come think of it, this might also be broken for other UDFs, where constraining is not as easy or at least is different (e.g. between needs more than strict equality, and with arithmetic ops, if this problem applies, the only way would be to derive the maximum values from the value list). I can also file a separate JIRA for that... > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318725#comment-15318725 ] Hive QA commented on HIVE-13957: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12808537/HIVE-13957.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_in org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testPermFunc org.apache.hive.jdbc.TestJdbcWithMiniMr.testPermFunc {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/30/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/30/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-30/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12808537 - PreCommit-HIVE-MASTER-Build > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))
[ https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317698#comment-15317698 ] Gopal V commented on HIVE-13957: Would it be better to convert the implicit conversion (as done in GenericUDF) into an explicit one if the type conversion is legal? Similar type issues exist for concat() for instance, where the UDF internally type-casts all args to String, but the actual plan doesn't. > vectorized IN is inconsistent with non-vectorized (at least for decimal in > (string)) > > > Key: HIVE-13957 > URL: https://issues.apache.org/jira/browse/HIVE-13957 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13957.patch, HIVE-13957.patch > > > The cast is applied to the column in regular IN, but vectorized IN applies it > to the IN() list -- This message was sent by Atlassian JIRA (v6.3.4#6332)