[jira] [Comment Edited] (SPARK-25648) Spark 2.3.1 reads orc format files with native and hive, and return different results

Dongjoon Hyun (JIRA) Fri, 05 Oct 2018 09:05:09 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639998#comment-16639998
 ]


Dongjoon Hyun edited comment on SPARK-25648 at 10/5/18 4:04 PM:
----------------------------------------------------------------

Thank you for reporting, [~justinnju].

How about Parquet result? Since Spark's default data source is Parquet, we had 
better compare with Parquet. Spark `hive` ORC is more similar to Apache Hive, 
Spark `native ORC is more similar to Spark's default data source.


was (Author: dongjoon):
Thank you for reporting, [~justinnju].

How about Parquet result? Since Spark's default data source is Parquet, we had 
better compare with Parquet.

> Spark 2.3.1 reads orc format  files with native and hive, and return 
> different results
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-25648
>                 URL: https://issues.apache.org/jira/browse/SPARK-25648
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Jun Zheng
>            Priority: Major
>
> Hi All
> I am testing TPCx-BB[link title|www.tpc.org/tpcx-bb/default.asp] with the 
> code from 
> [https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench,] 
>  # The test data are loaded by spark-sql, the parameter 
> _spark_.sql._orc_.impl sets to native;
>  # During the engine validation power test,  when use the different read 
> engines that is set _spark_.sql._orc_.impl = hive or _spark_.sql._orc_.impl = 
> native, the q22 return different results. When set to hive,  the result is 
> right, but set to native, less results are returned. Can someone help to find 
> why it happens.
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25648) Spark 2.3.1 reads orc format files with native and hive, and return different results

Reply via email to