[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330182#comment-15330182
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Committed there too. Thanks!

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-14 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329225#comment-15329225
 ] 

Jesus Camacho Rodriguez commented on HIVE-13957:


[~sershe], fix can be pushed to branch-2.1 and fix version set to 2.1.1. About 
2.1.0, it is waiting for you vote! :p Thanks

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 1.3.0, 2.2.0, 2.0.2
>
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-13 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328211#comment-15328211
 ] 

Matt McCline commented on HIVE-13957:
-

(Patch #3) LGTM +1

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328079#comment-15328079
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

[~mmccline] [~gopalv] ping?

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325434#comment-15325434
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Test failures are unrelated.

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322645#comment-15322645
 ] 

Hive QA commented on HIVE-13957:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12809044/HIVE-13957.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10224 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/61/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/61/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-61/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12809044 - PreCommit-HIVE-MASTER-Build

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.01.patch, HIVE-13957.02.patch, 
> HIVE-13957.03.patch, HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list.
> This can cause queries to produce incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319216#comment-15319216
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Well, then we should do this first for correctness, then do other things as a 
follow-up.

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319192#comment-15319192
 ] 

Gopal V commented on HIVE-13957:


It is not handled for concat and concat fails vectorizing entire vertices 
because it doesn't wrap a UDFToString() around all args  - see HIVE-7160

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319187#comment-15319187
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Yeah, that's what I was alluding to above. It seems like it would be a complex 
fix. Where is that handled for concat?

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319177#comment-15319177
 ] 

Gopal V commented on HIVE-13957:


bq. derives the common type for the column and constants, and casts both 
columns and constants to that whenever needed.

The cast is implicit right now - the UDF does the conversion behind the scenes. 
Making the cast explicit actually would do the trick instead.

{{id in ('1', '2')}}

could be written in 2 ways.

{{cast(id as string) in (...)}} or {{id in (cast('1' as decimal(17,2)), 
...)}}

Vectorization should have no trouble with either disambiguated forms, but 
cannot deal with the actual lack of type conversions in the plan.

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15319171#comment-15319171
 ] 

Sergey Shelukhin commented on HIVE-13957:
-

Can you elaborate? The problem is the difference between the approaches, not 
the type of the cast per se.

Normal IN, at some point that doesn't really matter, derives the common type 
for the column and constants, and casts both columns and constants to that 
whenever needed.
Whereas vectorization always tries to convert the constants to the column type, 
the reason being (I supposed) that the specializations for IN all have a 
particular column type in mind. I am not actually very familiar with these and 
whether it would be easy to incorporate a cast; I assume the cast of the column 
would need to come earlier than the specialized IN (i.e. specialized IN should 
already be able to utilize values of the correct type straight out of the VRB), 
which would require the vectorizer to modify the plan above the IN. Or 
something like that.

We could do that, however, as far as I see, it's not the solution we want, 
because of the following.
First, in case of decimal-string, this issue can produce incorrect results, so 
we want a simple fix for that, which the above isn't.
>From the long term perspective, I'd say we need to prohibit implicit casts in 
>this case (I opened a separate JIRA) AND/OR change non-vectorized pipeline 
>rather than vectorized, because casting decimal column to string in this case 
>(what the non-vectorized IN does) is not the intuitively logical thing for the 
>user and may produce unexpected result.

With the latter in mind, we /could/ fix the proximate issue in vectorized code 
(cast to decimal(38,38) that ends up converting all reasonable values to null), 
e.g. constrain the precision and scale to the column type (potentially +2/+1 
for NOT, although the enforcement will probably convert the values that don't 
fit to NULL), assuming the values are trimmed, since more should never be 
needed. But that's still inconsistent with normal IN, and we should probably do 
it later. 
Actually, come think of it, this might also be broken for other UDFs, where 
constraining is not as easy or at least is different (e.g. between needs more 
than strict equality, and with arithmetic ops, if this problem applies, the 
only way would be to derive the maximum values from the value list). I can also 
file a separate JIRA for that...


> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15318725#comment-15318725
 ] 

Hive QA commented on HIVE-13957:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12808537/HIVE-13957.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10223 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_table_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_func1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_constprog_partitioner
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_between_in
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_vector_between_in
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testPermFunc
org.apache.hive.jdbc.TestJdbcWithMiniMr.testPermFunc
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/30/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/30/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-30/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12808537 - PreCommit-HIVE-MASTER-Build

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13957) vectorized IN is inconsistent with non-vectorized (at least for decimal in (string))

2016-06-06 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317698#comment-15317698
 ] 

Gopal V commented on HIVE-13957:


Would it be better to convert the implicit conversion (as done in GenericUDF) 
into an explicit one if the type conversion is legal?

Similar type issues exist for concat() for instance, where the UDF internally 
type-casts all args to String, but the actual plan doesn't.

> vectorized IN is inconsistent with non-vectorized (at least for decimal in 
> (string))
> 
>
> Key: HIVE-13957
> URL: https://issues.apache.org/jira/browse/HIVE-13957
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-13957.patch, HIVE-13957.patch
>
>
> The cast is applied to the column in regular IN, but vectorized IN applies it 
> to the IN() list



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)