kaiqingli created SPARK-40741: --------------------------------- Summary: spark项目bin/beeline对于distribute by sort by语句支持不好,输出结果错误 Key: SPARK-40741 URL: https://issues.apache.org/jira/browse/SPARK-40741 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Environment: spark 3.1
hive 3.0 Reporter: kaiqingli sql中使用distribute by ... sort by ...时,通过spark/bin/beeline执行的结果错误,使用hive/beeline输出结果正确,具体场景为,先基于posexplode拆分array数据,然后基于拆分的下标进行sort by,之后再collect list,结果与原始的array结果不一致,sql如下: select id, samplingtimesec, array_data = new_array_data flag, array_data, new_array_data from ( select id, samplingtimesec, array_data, concat('[', concat_ws(',', collect_list(cell_voltage)), ']') new_array_data from ( select id, samplingtimesec, array_data, cell_index, cell_voltage from ( select id, samplingtimesec, array_data,--格式[1,2,3,4,5] row_number() over (partition by id,samplingtimesec order by samplingtimesec) r --去重 from table WHERE dt = '20221007' and samplingtimesec <= 1665079200000 ) tmp lateral view posexplode(split(replace(replace(array_data, '[', ''), ']', ''), ',')) v0 as cell_index, cell_voltage where r = 1 distribute by id , samplingtimesec sort by cell_index ) tmp group by id, samplingtimesec, array_data ) tmp where array_data != new_array_data; 以上sql,对于hive/beeline输出结果为0条; 对于spark/beeline输出结果不为0 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org