[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-04 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809694#comment-16809694
 ] 

Adam Szita commented on HIVE-21509:
---

Thanks Slim!

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch, HIVE-21509.4.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-03 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809294#comment-16809294
 ] 

slim bouguerra commented on HIVE-21509:
---

+1

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch, HIVE-21509.4.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807209#comment-16807209
 ] 

Hive QA commented on HIVE-21509:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12964448/HIVE-21509.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15890 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16800/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16800/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16800/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12964448 - PreCommit-HIVE-Build

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch, HIVE-21509.4.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807173#comment-16807173
 ] 

Hive QA commented on HIVE-21509:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
10s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
53s{color} | {color:blue} llap-server in master has 81 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
23s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
23s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 23s{color} 
| {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} llap-server: The patch generated 1 new + 29 unchanged 
- 1 fixed = 30 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
21s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16800/dev-support/hive-personality.sh
 |
| git revision | master / 34d2bda |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus/patch-mvninstall-llap-server.txt
 |
| compile | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus/patch-compile-llap-server.txt
 |
| javac | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus/patch-compile-llap-server.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus/diff-checkstyle-llap-server.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus/patch-findbugs-llap-server.txt
 |
| modules | C: storage-api llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16800/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch, HIVE-21509.4.patch
>
>
> In some scenarios, LLAP 

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806649#comment-16806649
 ] 

Hive QA commented on HIVE-21509:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12964421/HIVE-21509.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15886 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16793/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16793/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16793/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12964421 - PreCommit-HIVE-Build

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-04-01 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806601#comment-16806601
 ] 

Hive QA commented on HIVE-21509:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
57s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} llap-server in master has 81 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
23s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
23s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 23s{color} 
| {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} llap-server: The patch generated 4 new + 29 unchanged 
- 1 fixed = 33 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
15s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 15s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16793/dev-support/hive-personality.sh
 |
| git revision | master / 7bbd93f |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus/patch-mvninstall-llap-server.txt
 |
| compile | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus/patch-compile-llap-server.txt
 |
| javac | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus/patch-compile-llap-server.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus/diff-checkstyle-llap-server.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus/patch-findbugs-llap-server.txt
 |
| modules | C: storage-api llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16793/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch, HIVE-21509.3.patch
>
>
> In some scenarios, LLAP might store column 

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805527#comment-16805527
 ] 

Hive QA commented on HIVE-21509:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12964206/HIVE-21509.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 15883 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16762/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16762/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16762/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12964206 - PreCommit-HIVE-Build

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-29 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805481#comment-16805481
 ] 

Hive QA commented on HIVE-21509:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
47s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} storage-api in master has 48 extant Findbugs 
warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
49s{color} | {color:blue} llap-server in master has 81 extant Findbugs 
warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
22s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
22s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 22s{color} 
| {color:red} llap-server in the patch failed. {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
15s{color} | {color:red} llap-server: The patch generated 4 new + 29 unchanged 
- 1 fixed = 33 total (was 30) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
22s{color} | {color:red} llap-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16762/dev-support/hive-personality.sh
 |
| git revision | master / 23ab7f2 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| mvninstall | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus/patch-mvninstall-llap-server.txt
 |
| compile | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus/patch-compile-llap-server.txt
 |
| javac | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus/patch-compile-llap-server.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus/diff-checkstyle-llap-server.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus/patch-findbugs-llap-server.txt
 |
| modules | C: storage-api llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16762/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that 

[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-29 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805103#comment-16805103
 ] 

Adam Szita commented on HIVE-21509:
---

Also attached new patch [^HIVE-21509.2.patch] that has some additional null 
checks and added a test that can deterministically reproduce the issue.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch, 
> HIVE-21509.2.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-29 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804828#comment-16804828
 ] 

Adam Szita commented on HIVE-21509:
---

[~kgyrtkirk] I think what you're suggestion is to shallow copy the refcount 
part too. I agree, this way any of the shallow-copied instances of a CV will 
track the refcount correctly which is what we want. I updated my patch 
accordingly.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch, HIVE-21509.1.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801734#comment-16801734
 ] 

Zoltan Haindrich commented on HIVE-21509:
-

{code}
@@ -258,5 +279,6 @@ public void shallowCopyTo(ColumnVector otherCv) {
 otherCv.isRepeating = isRepeating;
 otherCv.preFlattenIsRepeating = preFlattenIsRepeating;
 otherCv.preFlattenNoNulls = preFlattenNoNulls;
+otherCv.refCount.set(refCount.get());
{code}
this seems to "shallowcopy" the "refCount" as well - I feel that something is 
not right with this...I would instead expect to that "other" should refer to 
the same reference counter; or should retain it's own counter...


> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801716#comment-16801716
 ] 

Adam Szita commented on HIVE-21509:
---

I've attached a proposed solution (work in progress) that would fix this 
problem, see [^HIVE-21509.0.wip.patch].

It adds a reference counter to ColumnVectors which is increased before 
beginning to write and decreased once the CV has been passed to the actual 
writer. EncodedDataConsumer will check this counter and will not return CVs 
into the object pool whose ref count is greater than zero.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-21509.0.wip.patch
>
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, have at least a couple of 
> 100k's of rows, and use text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * having more splits increases the issue showing itself, so it is worth to 
> _set tez.grouping.min-size=1024; set tez.grouping.max-size=1024;_
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Adam Szita (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801561#comment-16801561
 ] 

Adam Szita commented on HIVE-21509:
---

Looks like a quite serious issue. The root cause is as follows:
 # At the first execution of the query the LLAP IO thread has to read from the 
input text file and produce (Long)ColumnVectors (CVs) from the data, wrapped 
into VectorizedRowBatches (VRBs).
 # These VRBs are
 ## passed by VectorDeserializeOrcWriter to a newly created async ORC writer 
thread for ORC encoding and cache persistence. 
 ## also propagated back to consumers, namely to OrcEncodedDataConsumer and 
then finally to LLAPRecordReader.
 # The ORC writer thread may get to writing out the VRB (and therefore the CV) 
only after that the IO thread has:
 ## Created a CVB in OrcEncodedDataConsumer#decodeBatch to wrap the CV coming 
from VRB and passed the batch to LLAPRecordReader
 ## LLAPRecordReader used this batch and is receiving a new one. This time (on 
Tez thread) it will return the previous CVB and offer it back to an object pool 
so that the next decodeBatch may reuse it.
 ## The next decodeBatch call polls this reused CVB from the pool and will call 
CV.reset() on the CVs wrapped inside, and finally it will also overwrite the 
existing data in there
 ## and now is the time that the ORC writer thread got to writing the VRB 
and therefore the very same CVs into cache, that have just been modified in the 
meantime due to this re-using logic of LLAPRecordReader and 
OrcEncodedDataConsumer
 ## (I guess this is why high executor count vs low IO thread count helps 
surfacing this issue: the 32 Tez threads are very fast returning the used CVBs, 
but the one IO thread and its one ORC writer thread is outnumbered when trying 
to write it out in time, before it'd get corrupted)
 # Because of this the correct query result will be displayed at first 
(LLAPRecordReader does get all the correct CVs), but the content written in 
cache is corrupted
 # The second run of this query will go directly to cache and use the corrupted 
data there to produce a wrong result this time.

 

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Ivan Suller (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801560#comment-16801560
 ] 

Ivan Suller commented on HIVE-21509:


[~kgyrtkirk] it is possible. I already closed the ticket tracking that issue, 
because I couldn't reproduce it anymore. But if it is a cache issue this is 
expected.

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21509) LLAP may cache corrupted column vectors and return wrong query result

2019-03-26 Thread Zoltan Haindrich (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801542#comment-16801542
 ] 

Zoltan Haindrich commented on HIVE-21509:
-

cc: [~isuller] this could be the same issue you have been running into?

> LLAP may cache corrupted column vectors and return wrong query result
> -
>
> Key: HIVE-21509
> URL: https://issues.apache.org/jira/browse/HIVE-21509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
>
> In some scenarios, LLAP might store column vectors in cache that are getting 
> reused and reset just before their original content would be written.
> The issue is a concurrency issue and is thereby flaky. It is not easy to 
> reproduce, but the odds of surfacing this issue can by improved by setting 
> LLAP executor and IO thread counts this way:
>  * set hive.llap.daemon.num.executors=32;
>  * set hive.llap.io.threadpool.size=1;
>  * using TPCDS input data of store_sales table, which is in text format:
> {code:java}
> ROW FORMAT SERDE    'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'  
> WITH SERDEPROPERTIES (    'field.delim'='|',    'serialization.format'='|')  
> STORED AS INPUTFORMAT    'org.apache.hadoop.mapred.TextInputFormat'  
> OUTPUTFORMAT    
> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'{code}
>  * run query on this this table: select min(ss_sold_date_sk) from store_sales;
> The first query result is correct (2450816 in my case). Repeating the query 
> will trigger reading from LLAP cache and produce a wrong result: 0.
> If one wants to make sure of running into this issue, place a 
> Thread.sleep(250) at the beginning of VectorDeserializeOrcWriter#run().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)