[ https://issues.apache.org/jira/browse/SPARK-25648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639881#comment-16639881 ]
Jun Zheng commented on SPARK-25648: ----------------------------------- Hi [~hyukjin.kwon] Here is brief steps: # Use the data generation followed by readme in [https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench], # then try to do VALIDATE_POWER_TEST, which set workload=ENGINE_VALIDATION_POWER_TEST in the file https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench/blob/master/conf/bigBench.properties # when execute the q22 , and the valiation fails, the detailed the sql listed in [https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench/blob/master/engines/spark/queries/q22/q22.sql] , but i use the hive to execute the same sql in HIVE, the validation is OK. There is some results lost with the parameter _spark_.sql._orc_.impl set to native, and the returned row count is less then the result count returned by HIVE. Thanks ALL. > Spark 2.3.1 reads orc format files with native and hive, and return > different results > -------------------------------------------------------------------------------------- > > Key: SPARK-25648 > URL: https://issues.apache.org/jira/browse/SPARK-25648 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Reporter: Jun Zheng > Priority: Major > > Hi All > I am testing TPCx-BB[link title|www.tpc.org/tpcx-bb/default.asp] with the > code from > [https://github.com/BigData-Lab-Frankfurt/Big-Data-Benchmark-for-Big-Bench,] > # The test data are loaded by spark-sql, the parameter > _spark_.sql._orc_.impl sets to native; > # During the engine validation power test, when use the different read > engines that is set _spark_.sql._orc_.impl = hive or _spark_.sql._orc_.impl = > native, the q22 return different results. When set to hive, the result is > right, but set to native, less results are returned. Can someone help to find > why it happens. > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org