Abhinav Chawade created HIVE-3935: ------------------------------------- Summary: Extra new line character in output when sequence file is used for storage of a table Key: HIVE-3935 URL: https://issues.apache.org/jira/browse/HIVE-3935 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0 Environment: Centos 6.3 Reporter: Abhinav Chawade
When a "select distinct" command is issued on empty table which uses sequence file for storage, a new extra line (0x0a) is present in the result set even when table has no data. This output is not consistent with result of same command Hive 0.7.1 and can cause workflows to fail due to wrong record count. Execution on Hive 0.9 and 0.10 hive> create table hoge2(col1 string,col2 string) partitioned by (p_part string) stored as sequencefile; hive> describe hoge2; OK col1 string col2 string p_part string Time taken: 0.24 seconds hive> select distinct p_part from hoge2; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201301230112_0001, Tracking URL = http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001 Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2013-01-23 02:50:16,843 Stage-1 map = 0%, reduce = 0% 2013-01-23 02:50:26,897 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:27,905 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:28,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:29,919 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:30,925 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:31,933 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:32,939 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.13 sec 2013-01-23 02:50:33,945 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.8 sec MapReduce Total cumulative CPU time: 1 seconds 800 msec Ended Job = job_201301230112_0001 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.8 sec MAPRFS Read: 327 MAPRFS Write: 71 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 800 msec OK Time taken: 21.94 seconds Result on Hive 0.7.1 hive> select count(distinct p_part) from hoge3; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201210261659_0019, Tracking URL = http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019 Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019 2013-01-23 21:42:01,787 Stage-1 map = 0%, reduce = 0% 2013-01-23 21:42:07,815 Stage-1 map = 100%, reduce = 0% 2013-01-23 21:42:12,835 Stage-1 map = 100%, reduce = 100% Ended Job = job_201210261659_0019 OK 0 Time taken: 16.637 seconds Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is 0.20.203 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira