[ 
https://issues.apache.org/jira/browse/HIVE-19943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542841#comment-16542841
 ] 

Zoltan Haindrich commented on HIVE-19943:
-----------------------------------------

I'm not sure how this supposed to be fixed; exploring to add these as 
inputformat args is a dead end because the actual reader is some kind of 
"linereader" from hadoop...
I feel that this "HiveRecordReader" should somehow be pushed under the 
llaprecordreader somehow...but that seems like a hard thing to do (and probably 
not the right move)...

[~sershe] do you have any suggestion?

To reproduce, patching an "existing test" which by mistake only tested the 
local mode...so it missed this issue all along... (and run it with 
TestMiniLlapCliDriver)
{code}
diff --git ql/src/test/queries/clientpositive/file_with_header_footer.q 
ql/src/test/queries/clientpositive/file_with_header_footer.q
index 8913e54ad0..5dddcaba2a 100644
--- ql/src/test/queries/clientpositive/file_with_header_footer.q
+++ ql/src/test/queries/clientpositive/file_with_header_footer.q
@@ -11,6 +11,10 @@ CREATE EXTERNAL TABLE header_footer_table_1 (name string, 
message string, id int
 
 SELECT * FROM header_footer_table_1;
 
+explain
+SELECT count(distinct name) FROM header_footer_table_1;
+SELECT assert_true(count(distinct name)=11) FROM header_footer_table_1;
+
 SELECT * FROM header_footer_table_1 WHERE id < 50;
 
 CREATE EXTERNAL TABLE header_footer_table_2 (name string, message string, id 
int) PARTITIONED BY (year int, month int, day int) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\t' tblproperties ("skip.header.line.count"="1", 
"skip.footer.line.count"="2");
{code}



> Header values keep showing up in result sets
> --------------------------------------------
>
>                 Key: HIVE-19943
>                 URL: https://issues.apache.org/jira/browse/HIVE-19943
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 2.1.0
>         Environment: Hdinsight Hive interactivequerry
> [Components|https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning#hadoop-components-available-with-different-hdinsight-versions]
>            Reporter: Liam De Lee
>            Priority: Major
>
> We are using the tblproperties ("skip.header.line.count"="1") when creating 
> an external table.
> When we do a select * from table we get it back as expected without the 
> header present in the result set.
> However when we do for instance a count(1) we get the header back in this 
> count (tested with a select * from table and paste it in notepad to find the 
> amount of rows)
> If we also do this with a select distinct(column) from table we also get the 
> header as a distinct value.
> file structure:
> ||_TESTING_TYPE||
> |adf|
> |hyg|
> |abc|
>  
> *Update: 26/06/2018*
> Create statement:
> {code:java}
> -----------------------------------
> --test_type--
> -----------------------------------
> CREATE EXTERNAL TABLE IF NOT EXISTS ext.test_type_in
>   (
>     test_type      string
>     )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\073'
> STORED AS TEXTFILE
> LOCATION 'adl://{adlslocation}data/data2/test'
> tblproperties ("skip.header.line.count"="1")
> {code}
>  Select statement:
> {code:java}
> select * from test_type_in;
> {code}
> Distinct statement:
> {code:java}
> select distinct test_type from test_type_in ORDER BY test_type;
> {code}
> I cannot show the exact statement because of NDA so i changed those values to 
> test.
>  
> I can also tell you it is not just at our HDInsight but also at another 
> company we are working for. It does not Mather what is in the data as well. 
> so for testing purposes:
> {code:java}
> test_type,abcg,gjeiza,aze,grriajj,gd,rrjri,vdju{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to