[jira] [Created] (HIVE-3935) Extra new line character in output when sequence file is used for storage of a table

Abhinav Chawade (JIRA) Thu, 24 Jan 2013 00:09:23 -0800

Abhinav Chawade created HIVE-3935:
-------------------------------------

             Summary: Extra new line character in output when sequence file is 
used for storage of a table
                 Key: HIVE-3935
                 URL: https://issues.apache.org/jira/browse/HIVE-3935
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.9.0, 0.10.0
         Environment: Centos 6.3
            Reporter: Abhinav Chawade



When a "select distinct" command is issued on empty table which uses sequence 
file for storage, a new extra line (0x0a) is present in the result set even 
when table has no data. This output is not consistent with result of same 
command Hive 0.7.1 and can cause workflows to fail due to wrong record count.

Execution on Hive 0.9 and 0.10
hive> create table hoge2(col1 string,col2 string) partitioned by (p_part
string) stored as sequencefile;
hive> describe hoge2;
OK
col1    string
col2    string
p_part  string
Time taken: 0.24 seconds
hive> select distinct p_part from hoge2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201301230112_0001, Tracking URL =
http://testcluster2-1:50030/jobdetails.jsp?jobid=job_201301230112_0001
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
-Dmapred.job.tracker=maprfs:/// -kill job_201301230112_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-01-23 02:50:16,843 Stage-1 map = 0%,  reduce = 0%
2013-01-23 02:50:26,897 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:27,905 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:28,911 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:29,919 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:30,925 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:31,933 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:32,939 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.13
sec
2013-01-23 02:50:33,945 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.8
sec
MapReduce Total cumulative CPU time: 1 seconds 800 msec
Ended Job = job_201301230112_0001
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 1.8 sec   MAPRFS Read: 327 MAPRFS
Write: 71 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 800 msec
OK

Time taken: 21.94 seconds

Result on Hive 0.7.1
hive> select count(distinct p_part) from hoge3;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201210261659_0019, Tracking URL =
http://testcluster1-1:50030/jobdetails.jsp?jobid=job_201210261659_0019
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job 
-Dmapred.job.tracker=maprfs:/// -kill job_201210261659_0019
2013-01-23 21:42:01,787 Stage-1 map = 0%,  reduce = 0%
2013-01-23 21:42:07,815 Stage-1 map = 100%,  reduce = 0%
2013-01-23 21:42:12,835 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201210261659_0019
OK
0
Time taken: 16.637 seconds

Underlying Hadoop version for Hive 0.9 is Hadoop 1.0.3 and for Hive 0.7 it is 
0.20.203

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-3935) Extra new line character in output when sequence file is used for storage of a table

Reply via email to