[jira] [Commented] (SPARK-25648) Spark 2.3.1 reads orc format files with native and hive, and return different results

Jun Zheng (JIRA) Fri, 05 Oct 2018 07:25:17 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639881#comment-16639881
 ]


Jun Zheng commented on SPARK-25648:
-----------------------------------

Hi [~hyukjin.kwon] 

Here is brief steps:
 # Use the data generation followed by readme in  
[https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench],
 # then try to do VALIDATE_POWER_TEST, which set 
workload=ENGINE_VALIDATION_POWER_TEST in the file  
https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench/blob/master/conf/bigBench.properties
 # when execute the q22 , and  the valiation fails, the detailed the sql listed 
in 
[https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench/blob/master/engines/spark/queries/q22/q22.sql]
 , but i use the hive to execute the same sql in HIVE, the validation is OK. 
There is some results lost with the parameter _spark_.sql._orc_.impl set to 
native, and the returned row count is less then the result count returned by 
HIVE.

Thanks ALL.

> Spark 2.3.1 reads orc format  files with native and hive, and return 
> different results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-25648
>                 URL: https://issues.apache.org/jira/browse/SPARK-25648
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Jun Zheng
>            Priority: Major
>
> Hi All
> I am testing TPCx-BB[link title|www.tpc.org/tpcx-bb/default.asp] with the 
> code from 
> [https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench,] 
>  # The test data are loaded by spark-sql, the parameter 
> _spark_.sql._orc_.impl sets to native;
>  # During the engine validation power test,  when use the different read 
> engines that is set _spark_.sql._orc_.impl = hive or _spark_.sql._orc_.impl = 
> native, the q22 return different results. When set to hive,  the result is 
> right, but set to native, less results are returned. Can someone help to find 
> why it happens.
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25648) Spark 2.3.1 reads orc format files with native and hive, and return different results

Reply via email to