[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555485#comment-14555485
 ] 

xiaowei wang commented on HIVE-10790:
-

i think,orc file WriterImpl invoke a depressed method of ViewFileSystem 
,getDefaultReplication() .

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-05-21 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555490#comment-14555490
 ] 

xiaowei wang commented on HIVE-10790:
-


 FSDataOutputStream getStream() throws IOException {
 if (rawWriter == null) {
   rawWriter = fs.create(path, false, HDFS_BUFFER_SIZE,
-fs.getDefaultReplication(), blockSize);
+fs.getDefaultReplication(path), blockSize);
   rawWriter.writeBytes(OrcFile.MAGIC);
   headerLength = rawWriter.getPos();

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) Lazysimpleserde bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Component/s: CLI

> Lazysimpleserde bug  when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) Lazysimpleserde bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.1.patch.txt

> Lazysimpleserde bug  when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Summary: LazySimpleSerDe bug  ,when Text is reused   (was: LazySimpleSerDe 
bug  when Text is reused )

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Summary: LazySimpleSerDe bug  when Text is reused   (was: Lazysimpleserde 
bug  when Text is reused )

> LazySimpleSerDe bug  when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

Attachment: HIVE-10790.0.patch.txt

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581724#comment-14581724
 ] 

xiaowei wang commented on HIVE-10790:
-

OK,I have put up a patch

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581723#comment-14581723
 ] 

xiaowei wang commented on HIVE-10790:
-

OK,I have put up a patch 

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Fix Version/s: (was: 1.2.0)
   0.14.1

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10983.1.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.2.patch.txt

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-11 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581882#comment-14581882
 ] 

xiaowei wang commented on HIVE-10983:
-

in the first patch,i invoke a method of Text ,copyBytes().This method is added 
behind Hadoop1.0 ,so the compile failed.
then i put up the second patch .

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

Fix Version/s: 0.14.1

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 0.14.1
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10790) orc file sql excute fail

2015-06-11 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10790:

 Flags: Patch,Important
Labels: patch  (was: )

> orc file sql excute fail 
> -
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table,like as 
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') select ur,rf,it,dt from custom.rank_text 
> where logdate='2015051500';
> will throws a error ,Error: java.lang.RuntimeException: Hive Runtime Error 
> while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-18 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Fix Version/s: 1.2.0

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HIVE-10983) LazySimpleSerDe bug ,when Text is reused

2015-06-23 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang reopened HIVE-10983:
-

> LazySimpleSerDe bug  ,when Text is reused 
> --
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Summary: SerDeUtils bug  ,when Text is reused   (was: LazySimpleSerDe bug  
,when Text is reused )

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Affects Version/s: 1.0.0
   1.2.0

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql 
,"select *   from web_searchhub where logdate=2015061003", the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


  was:
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql 
,"select *   from web_searchhub where logdate=2015061003", the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputForm

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Description: 
The method transformTextFromUTF8 have a  error bug, 
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql 
,"select * from web_searchhub where logdate=2015061003", the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;

  was:
the method transformTextFromUTF8 have a bug, 
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql 
,"select * from web_searchhub where logdate=2015061003", the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
> Fix For: 1.2.0
>
>
> The method transformTextFromUTF8 have a  error bug, 
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select * from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/r

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql 
,"select *   from web_searchhub where logdate=2015061003", the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


  was:
the mothod transformTextToUTF8 have a error bug!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql 
,"select *   from web_searchhub where logdate=2015061003", the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select *   from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Description: 
The method transformTextFromUTF8 have a  error bug, 
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql 
,"select * from web_searchhub where logdate=2015061003", the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;

  was:
The method transformTextFromUTF8 have a  error bug, 
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql 
,"select * from web_searchhub where logdate=2015061003", the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
> Fix For: 1.2.0
>
>
> The method transformTextFromUTF8 have a  error bug, 
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select * from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.had

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Attachment: HIVE-11095.1.patch.txt

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt
>
>
> The method transformTextFromUTF8 have a  error bug, 
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select * from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-24 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599430#comment-14599430
 ] 

xiaowei wang commented on HIVE-11095:
-

SerDeUtils  invoke a bad method of Text,getBytes()! 

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt
>
>
> The method transformTextFromUTF8 have a  error bug, 
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select * from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
"select *   from web_searchhub where logdate=2015061003", 
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` str

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
"select *   from web_searchhub where logdate=2015061003", 
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql 
,"select *   from web_searchhub where logdate=2015061003", the result of sql 
see blow.Notice that ,the second row content contains the first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> "select *   from web_searchhub where logdate=2015061003", 
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PART

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.

INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, t

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.

INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285


I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> the mothod transformTextToUTF8 have a error bug!
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is cau

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
Text,getBytes()!

When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
the mothod transformTextToUTF8 have a error bug!
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>Priority: Critical
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I th

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Priority: Major  (was: Critical)

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Priority: Major  (was: Critical)

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt
>
>
> The method transformTextFromUTF8 have a  error bug, 
> It invoke a bad method of Text,getBytes()!
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql 
> ,"select * from web_searchhub where logdate=2015061003", the result of sql 
> see blow.Notice that ,the second row content contains the first row content.
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> The content of origin lzo file content see below ,just 2 rows.
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Description: 
The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
Text,getBytes()!
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select * from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}
The content of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}

  was:
The method transformTextFromUTF8 have a  error bug, 
It invoke a bad method of Text,getBytes()!
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql 
,"select * from web_searchhub where logdate=2015061003", the result of sql see 
blow.Notice that ,the second row content contains the first row content.
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
The content of origin lzo file content see below ,just 2 rows.
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' ;


> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt
>
>
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string

[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Description: 
{noformat}
The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
Text,getBytes()!
The method getBytes of Text returns the raw bytes; however, only data up to 
Text.length is valid.A better way is  use copyBytes()  if you need the returned 
array to be precisely the length of the data.
But the copyBytes is added behind hadoop1. 
{noformat}
How I found this bug?
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select * from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}
The content of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}

  was:
The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
Text,getBytes()!
When i query data from a lzo table , I found in results : the length of the 
current row is always largr than the previous row, and sometimes,the current 
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select * from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}
The content of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}
I think this error is caused by the Text reuse,and I found the solutions .
Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
`line` string)
PARTITIONED BY (
`logdate` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '
U'
WITH SERDEPROPERTIES (
'serialization.encoding'='GBK')
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
LOCATION
'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt
>
>
> {noformat}
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> How I found this bug?
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msg

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-25 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
{noformat}
The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
Text,getBytes()!
The method getBytes of Text returns the raw bytes; however, only data up to 
Text.length is valid.A better way is  use copyBytes()  if you need the returned 
array to be precisely the length of the data.
But the copyBytes is added behind hadoop1. 
{noformat}

When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
Text,getBytes()!

When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> 

[jira] [Updated] (HIVE-14918) Function concat_ws get a wrong value

2016-10-08 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-14918:

Attachment: HIVE-14918.0.patch

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14918) Function concat_ws get a wrong value

2016-10-08 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-14918:

Status: Patch Available  (was: Open)

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 2.0.1, 2.1.0, 2.0.0, 1.1.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15559471#comment-15559471
 ] 

Xiaowei Wang commented on HIVE-14918:
-

Is this a problem ?
[~pxiong] [~speleato] [~ashutoshc] [~prasanth_j] [~thejas]

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560739#comment-15560739
 ] 

Xiaowei Wang commented on HIVE-14918:
-

I mean, concat_ws('.',NULL) should return NULL not a empty string "" .What do 
you think?



> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561056#comment-15561056
 ] 

Xiaowei Wang commented on HIVE-14918:
-

It is true that concat_ws('.',NULL) of MySQL return empty. 
https://bugs.mysql.com/bug.php?id=6719  
But I and most colleagues of mine don't understand.
Regardless of MySQL aside, which do you think is  more reasonable ?
Thanks for your explanation.

> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14918) Function concat_ws get a wrong value

2016-10-09 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15561156#comment-15561156
 ] 

Xiaowei Wang commented on HIVE-14918:
-

Yes,It is not a bug in MySQL .I close .Thanks!


> Function concat_ws get a wrong value  
> --
>
> Key: HIVE-14918
> URL: https://issues.apache.org/jira/browse/HIVE-14918
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 2.0.0, 2.1.0, 2.0.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>Priority: Critical
> Fix For: 2.1.0
>
> Attachments: HIVE-14918.0.patch
>
>
> FROM src INSERT OVERWRITE TABLE dest1 SELECT 'abc', 'xyz', '8675309'  WHERE 
> src.key = 86; 
> SELECT concat_ws('.',NULL)  FROM dest1 ;
> The result is a empty  string "",but I think it should be return NULL .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-10-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12303:

Attachment: HIVE-12303.0.patch

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-10-30 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982458#comment-14982458
 ] 

Xiaowei Wang commented on HIVE-12303:
-

Is this a bug ?
[~ashutoshc] [~xuefuz]  [~gopalv]

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-11-02 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985077#comment-14985077
 ] 

Xiaowei Wang commented on HIVE-12303:
-

The schema is 
{noformat}

# col_name  data_type   comment 
 
ip  string  from deserializer   
manualtime  string  from deserializer   
timezonestring  from deserializer   
pbparamsmap  from deserializer   
pageurl string  from deserializer   
useragent   string  from deserializer   
yyidstring  from deserializer   
suv string  from deserializer   
linestring  from deserializer   
applogs 
array>
   from deserializer   
 
# Partition Information  
# col_name  data_type   comment 
 
logdate string  
 
# Detailed Table Information 
Database:   default  
Owner:  hive 
CreateTime: Fri Nov 08 11:38:00 CST 2013 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   
viewfs://nsX/user/hive/warehouse/default.db/web/uigs/web_uigs_wapsearch  
Table Type: EXTERNAL_TABLE   
Table Parameters:
EXTERNALTRUE
last_modified_byslave   
last_modified_time  1414463853  
transient_lastDdlTime   1414463853  
 
# Storage Information
SerDe Library:  com.custom.datacat.hive.DataCatSerde  
InputFormat:com.custom.datadir.plugin.SymlinkLzoTextInputFormat 
  
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
datacat.fieldInspector  
applogs:com.custom.datacat.hive.DataCatListObjectInspector:\t&pbparams:com.custom.datacat.hive.DataCatMapObjectInspector
datacat.lineInspector   
com.custom.datacat.wapapp.WapAppSearchInspector: 
serialization.format1   

{noformat} 

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at 

[jira] [Updated] (HIVE-12482) When execution.engine=tez,set mapreduce.job.name does not work.

2015-11-20 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12482:

Attachment: HIVE-12482.0.patch

> When execution.engine=tez,set mapreduce.job.name does not work.
> ---
>
> Key: HIVE-12482
> URL: https://issues.apache.org/jira/browse/HIVE-12482
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.2.1
>Reporter: Xiaowei Wang
> Fix For: 0.14.1
>
> Attachments: HIVE-12482.0.patch
>
>
> When execution.engine=tez,set mapreduce.job.name does not work.
> In Tez mode, the default job name is "Hive_"+Sessionid ,for example 
> HIVE-ce5784d0-320c-4fb9-8b0b-2d92539dfd9e .It is difficulty to distinguish 
> job when there are too much jobs .
> A better way is to set the var of mapreduce.job.name .But set 
> mapreduce.job.name does not work!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12482) When execution.engine=tez,set mapreduce.job.name does not work.

2015-11-20 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018148#comment-15018148
 ] 

Xiaowei Wang commented on HIVE-12482:
-



[~ashutoshc] [~ashutoshc] [~xuefuz] [~gopalv]

> When execution.engine=tez,set mapreduce.job.name does not work.
> ---
>
> Key: HIVE-12482
> URL: https://issues.apache.org/jira/browse/HIVE-12482
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.2.1
>Reporter: Xiaowei Wang
> Fix For: 0.14.1
>
> Attachments: HIVE-12482.0.patch
>
>
> When execution.engine=tez,set mapreduce.job.name does not work.
> In Tez mode, the default job name is "Hive_"+Sessionid ,for example 
> HIVE-ce5784d0-320c-4fb9-8b0b-2d92539dfd9e .It is difficulty to distinguish 
> job when there are too much jobs .
> A better way is to set the var of mapreduce.job.name .But set 
> mapreduce.job.name does not work!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12482) When execution.engine=tez,set mapreduce.job.name does not work.

2015-11-20 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019183#comment-15019183
 ] 

Xiaowei Wang commented on HIVE-12482:
-

Thanks very much ! I close this jira!

> When execution.engine=tez,set mapreduce.job.name does not work.
> ---
>
> Key: HIVE-12482
> URL: https://issues.apache.org/jira/browse/HIVE-12482
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 0.14.0, 1.0.0, 1.0.1, 1.2.1
>Reporter: Xiaowei Wang
> Fix For: 0.14.1
>
> Attachments: HIVE-12482.0.patch
>
>
> When execution.engine=tez,set mapreduce.job.name does not work.
> In Tez mode, the default job name is "Hive_"+Sessionid ,for example 
> HIVE-ce5784d0-320c-4fb9-8b0b-2d92539dfd9e .It is difficulty to distinguish 
> job when there are too much jobs .
> A better way is to set the var of mapreduce.job.name .But set 
> mapreduce.job.name does not work!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-11-20 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019244#comment-15019244
 ] 

Xiaowei Wang commented on HIVE-12303:
-

Could you give me some advice ? Thanks

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-11-20 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019245#comment-15019245
 ] 

Xiaowei Wang commented on HIVE-12303:
-

Could you give me some advice ? Thanks

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-11-20 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019246#comment-15019246
 ] 

Xiaowei Wang commented on HIVE-12303:
-

[~sushanth]

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Sushanth Sowmyan
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: 12541.0.patch.txt

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: 12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  "viewfs://nsx/tmp/symlink* " 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Fix Version/s: 1.2.1

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: 12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  "viewfs://nsx/tmp/symlink* " 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Description: 
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.rework.mapredwork = true ;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :0

At the same time ,I add a test case in the patch.


  was:
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :0

At the same time ,I add a test case in the patch.



> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: 12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Description: 
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :0

At the same time ,I add a test case in the patch.


  was:
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 "viewfs://nsx/tmp/symlink* " 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :0

At the same time ,I add a test case in the patch.



> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: 12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031963#comment-15031963
 ] 

Xiaowei Wang commented on HIVE-12541:
-

I need a review!Thanks !

[~brocknoland] [~sushanth] [~ashutoshc] [~xuefuz] [~gopalv]

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: 12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.0.patch.txt

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: (was: 12541.0.patch.txt)

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.0.patch.txt
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.1.patch

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.0.patch.txt, HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: (was: HIVE-12541.0.patch.txt)

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will result a wrong result :0
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-11-30 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang reassigned HIVE-12303:
---

Assignee: Xiaowei Wang  (was: Sushanth Sowmyan)

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12303) HCatRecordSerDe throw a IndexOutOfBoundsException

2015-12-01 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033341#comment-15033341
 ] 

Xiaowei Wang commented on HIVE-12303:
-

[~sushanth] 

The table uses a cutom serde . There is a  column "applogs",a  list of 
struct in the table .Usually ,"applogs" will get the right struct elements ,but 
sometime there is some dirty data ,it cannot get the right elements  . A liltte 
dirty data will not cause the fail of the job .  

>  HCatRecordSerDe  throw a IndexOutOfBoundsException 
> 
>
> Key: HIVE-12303
> URL: https://issues.apache.org/jira/browse/HIVE-12303
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12303.0.patch
>
>
> When access hive table using hcatlog in Pig,sometime it throws a exception !
> Exception
> {noformat}
> 2015-10-30 06:44:35,219 WARN [Thread-4] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.pig.backend.executionengine.ExecException: ERROR 6018: Error 
> converting read value to tuple
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:76)
> at org.apache.hive.hcatalog.pig.HCatLoader.getNext(HCatLoader.java:59)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:553)
> at 
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: java.lang.IndexOutOfBoundsException: Index: 24, Size: 24
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeStruct(HCatRecordSerDe.java:175)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeList(HCatRecordSerDe.java:244)
> at 
> org.apache.hive.hcatalog.data.HCatRecordSerDe.serializeField(HCatRecordSerDe.java:196)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:53)
> at 
> org.apache.hive.hcatalog.data.LazyHCatRecord.get(LazyHCatRecord.java:97)
> at 
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.nextKeyValue(HCatRecordReader.java:204)
> at 
> org.apache.hive.hcatalog.pig.HCatBaseLoader.getNext(HCatBaseLoader.java:63)
> ... 13 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-03 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Description: 
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.rework.mapredwork = true ;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will get wrong result :0 

At the same time ,I add a test case in the patch.


  was:
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.rework.mapredwork = true ;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will result a wrong result :0

At the same time ,I add a test case in the patch.



> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-03 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037826#comment-15037826
 ] 

Xiaowei Wang commented on HIVE-12541:
-

/[~brocknoland] [~sushanth] [~ashutoshc] [~xuefuz] [~gopalv]  I need a 
review!Thanks ! 

各位大哥大姐,跪求review!

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051993#comment-15051993
 ] 

Xiaowei Wang commented on HIVE-12541:
-


I am talking about  the regex in the symbolic path .I found, if a path of the 
symbolic file  contains a regex ,
select is supported in default. So I  mistakenly  think that  
symlinktextinputformat support regex . 


And , I think it shoule be supported with the regex in the path . 

Thank you for your attention.

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052110#comment-15052110
 ] 

Xiaowei Wang commented on HIVE-12541:
-

../data/files/T*means the files starting with  T   

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex ,especially using CombineHiveInputFormat .Add test sql .

2015-12-10 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12652:

Attachment: HIVE-12652.0.patch

> SymbolicTextInputFormat should supports the  path with regex  ,especially 
> using CombineHiveInputFormat .Add test sql .
> --
>
> Key: HIVE-12652
> URL: https://issues.apache.org/jira/browse/HIVE-12652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12652.0.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
> test sql . 
> 2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
> resolve the path with regex ,so it will get a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052326#comment-15052326
 ] 

Xiaowei Wang commented on HIVE-12541:
-

[~aihuaxu] [~ctang.ma] [~ychena] 

May be it is better to talk about in another jira .  
https://issues.apache.org/jira/browse/HIVE-12652  . 



> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-10 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052327#comment-15052327
 ] 

Xiaowei Wang commented on HIVE-12541:
-

I  add some test cases in another jira  
https://issues.apache.org/jira/browse/HIVE-12652 .  

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex ,especially used in CombineHiveInputFormat .

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12652:

Summary: SymbolicTextInputFormat should supports the  path with regex  
,especially used in  CombineHiveInputFormat .  (was: SymbolicTextInputFormat 
should supports the  path with regex  ,especially using CombineHiveInputFormat 
.Add test sql .)

> SymbolicTextInputFormat should supports the  path with regex  ,especially 
> used in  CombineHiveInputFormat .
> ---
>
> Key: HIVE-12652
> URL: https://issues.apache.org/jira/browse/HIVE-12652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12652.0.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
> test sql . 
> 2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
> resolve the path with regex ,so it will get a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12652:

Summary: SymbolicTextInputFormat should supports the  path with regex   
(was: SymbolicTextInputFormat should supports the  path with regex  ,especially 
used in  CombineHiveInputFormat .)

> SymbolicTextInputFormat should supports the  path with regex 
> -
>
> Key: HIVE-12652
> URL: https://issues.apache.org/jira/browse/HIVE-12652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12652.0.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
> test sql . 
> 2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
> resolve the path with regex ,so it will get a wrong result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12652) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12652:

Description: 
1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
test sql . 
2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
resolve the path with regex ,so it will get a wrong result.I fix the problem.



  was:
1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
test sql . 
2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
resolve the path with regex ,so it will get a wrong result.



> SymbolicTextInputFormat should supports the  path with regex 
> -
>
> Key: HIVE-12652
> URL: https://issues.apache.org/jira/browse/HIVE-12652
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12652.0.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex  .I add some  
> test sql . 
> 2, But ,when using  CombineHiveInputFormat  to merge small file  , It cannot 
> resolve the path with regex ,so it will get a wrong result.I fix the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it will get a wrong result

2015-12-11 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052906#comment-15052906
 ] 

Xiaowei Wang commented on HIVE-12541:
-

May be https://issues.apache.org/jira/browse/HIVE-12652 is more clear .Thanks 
your for attention !

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it will get a wrong result
> --
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) Using CombineHiveInputFormat with the origin inputformat SymbolicTextInputFormat ,it cannot resolve the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Summary: Using CombineHiveInputFormat with the origin inputformat  
SymbolicTextInputFormat  ,it cannot resolve the path with regex  (was: Using 
CombineHiveInputFormat with the origin inputformat  SymbolicTextInputFormat  
,it will get a wrong result)

> Using CombineHiveInputFormat with the origin inputformat  
> SymbolicTextInputFormat  ,it cannot resolve the path with regex
> -
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Summary: SymbolicTextInputFormat should supports the path with regex  (was: 
Using CombineHiveInputFormat with the origin inputformat  
SymbolicTextInputFormat  ,it cannot resolve the path with regex)

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch
>
>
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get wrong result :0 
> At the same time ,I add a test case in the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Description: 
1, In fact,SybolicTextInputFormat supports the path with regex .I add some test 
sql . 
2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
resolve the path with regex ,so it will get a wrong result.I  give a example 
,and fix the problem.

Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.rework.mapredwork = true ;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will get a wrong result :0 




  was:
Table desc :
{noformat}
CREATE External TABLE `symlink_text_input_format`(
  `key` string,
  `value` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
{noformat}
There is a link file in the dir 
'/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
file is 
{noformat}
 viewfs://nsx/tmp/symlink* 
{noformat}
it contains one path ,and the path contains a regex!


Execute the sql : 
{noformat}
set hive.rework.mapredwork = true ;
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
set mapred.min.split.size.per.rack= 0 ;
set mapred.min.split.size.per.node= 0 ;
set mapred.max.split.size= 0 ;
select count(*) from  symlink_text_input_format ;

{noformat}
It will get wrong result :0 

At the same time ,I add a test case in the patch.



> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.2.patch

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.2.patch

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: (was: HIVE-12541.2.patch)

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-11 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053913#comment-15053913
 ] 

Xiaowei Wang commented on HIVE-12541:
-

Ok,I have modified the name of the jira ,and put up a new patch ,which contains 
more tests  .

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-14 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056959#comment-15056959
 ] 

Xiaowei Wang commented on HIVE-12541:
-

[~aihuaxu]

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Fix Version/s: 1.2.1

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10790) orc write on viewFS throws exception

2015-12-15 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057972#comment-15057972
 ] 

Xiaowei Wang commented on HIVE-10790:
-

[~wisgood]

> orc write on viewFS throws exception
> 
>
> Key: HIVE-10790
> URL: https://issues.apache.org/jira/browse/HIVE-10790
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Affects Versions: 0.13.0, 0.14.0
> Environment: Hadoop 2.5.0-cdh5.3.2 
> hive 0.14
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
>  Labels: patch
> Fix For: 2.0.0
>
> Attachments: HIVE-10790.0.patch.txt
>
>
> from a text table insert into a orc table like as 
> {code:sql}
> insert overwrite table custom.rank_less_orc_none 
> partition(logdate='2015051500') 
> select ur,rf,it,dt from custom.rank_text where logdate='2015051500';
> {code}
> will throws a error ,
> {noformat}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:260)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> Caused by: org.apache.hadoop.fs.viewfs.NotInMountpointException: 
> getDefaultReplication on empty path is invalid
> at 
> org.apache.hadoop.fs.viewfs.ViewFileSystem.getDefaultReplication(ViewFileSystem.java:593)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.getStream(WriterImpl.java:1750)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1767)
> at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2040)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:105)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:164)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:842)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:577)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:588)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:227)
> ... 8 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059381#comment-15059381
 ] 

Xiaowei Wang commented on HIVE-12541:
-

Ok,I will check 

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-15 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.3.patch

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-16 Thread Xiaowei Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaowei Wang updated HIVE-12541:

Attachment: HIVE-12541.4.patch

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-16 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060190#comment-15060190
 ] 

Xiaowei Wang commented on HIVE-12541:
-

symlink_text_input_format test case have been update within  2.1.0 version . 
There is still other test case failed . Seems it does not matter with my patch.

> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12541) SymbolicTextInputFormat should supports the path with regex

2015-12-17 Thread Xiaowei Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061928#comment-15061928
 ] 

Xiaowei Wang commented on HIVE-12541:
-

I checked several times  ,all these failed test case cannot reappear in my 
local test environment . I do not really understand . 


> SymbolicTextInputFormat should supports the path with regex
> ---
>
> Key: HIVE-12541
> URL: https://issues.apache.org/jira/browse/HIVE-12541
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.14.0, 1.2.0, 1.2.1
>Reporter: Xiaowei Wang
>Assignee: Xiaowei Wang
> Fix For: 1.2.1
>
> Attachments: HIVE-12541.1.patch, HIVE-12541.2.patch, 
> HIVE-12541.3.patch, HIVE-12541.4.patch
>
>
> 1, In fact,SybolicTextInputFormat supports the path with regex .I add some 
> test sql . 
> 2, But ,when using CombineHiveInputFormat to combine  input files , It cannot 
> resolve the path with regex ,so it will get a wrong result.I  give a example 
> ,and fix the problem.
> Table desc :
> {noformat}
> CREATE External TABLE `symlink_text_input_format`(
>   `key` string,
>   `value` string)
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/symlink_text_input_format'  
> {noformat}
> There is a link file in the dir 
> '/user/hive/warehouse/symlink_text_input_format' ,   the content of the link 
> file is 
> {noformat}
>  viewfs://nsx/tmp/symlink* 
> {noformat}
> it contains one path ,and the path contains a regex!
> Execute the sql : 
> {noformat}
> set hive.rework.mapredwork = true ;
> set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
> set mapred.min.split.size.per.rack= 0 ;
> set mapred.min.split.size.per.node= 0 ;
> set mapred.max.split.size= 0 ;
> select count(*) from  symlink_text_input_format ;
> {noformat}
> It will get a wrong result :0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602583#comment-14602583
 ] 

xiaowei wang commented on HIVE-10983:
-

Your method is better,more Concise .
According to your suggestions,I will put up another a  patch 
Thanks Very Much!  

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.3.patch.txt

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.4.patch.txt

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602596#comment-14602596
 ] 

xiaowei wang commented on HIVE-10983:
-

According to  the suggestion of Chengxiang Li   ,I  put up a new patch, 
HIVE-10983.4.patch.txt

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-11095:

Attachment: HIVE-11095.2.patch.txt

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt
>
>
> {noformat}
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> How I found this bug?
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602617#comment-14602617
 ] 

xiaowei wang commented on HIVE-11095:
-

According to the suggestion of Chengxiang Li ,I put up a new patch, 
HIVE-11095.2.patch.txt

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt
>
>
> {noformat}
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> How I found this bug?
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602683#comment-14602683
 ] 

xiaowei wang commented on HIVE-10983:
-

I have test it, and it is ok .Thanks for your suggestions!

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602742#comment-14602742
 ] 

xiaowei wang commented on HIVE-11095:
-

Ok,I will merge this patch into HIVE-10983 .
Thanks for your suggestions!

> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt
>
>
> {noformat}
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> How I found this bug?
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Description: 
{noformat}
The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
invoke a bad method of Text,getBytes()!
The method getBytes of Text returns the raw bytes; however, only data up to 
Text.length is valid.A better way is  use copyBytes()  if you need the returned 
array to be precisely the length of the data.
But the copyBytes is added behind hadoop1. 
{noformat}

When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}


  was:
{noformat}
The mothod transformTextToUTF8 have a error bug,It invoke a bad method of 
Text,getBytes()!
The method getBytes of Text returns the raw bytes; however, only data up to 
Text.length is valid.A better way is  use copyBytes()  if you need the returned 
array to be precisely the length of the data.
But the copyBytes is added behind hadoop1. 
{noformat}

When i query data from a lzo table , I found  in results : the length of the 
current row is always largr  than the previous row, and sometimes,the current  
row contains the contents of the previous row。 For example ,i execute a sql ,
{code:sql}
select *   from web_searchhub where logdate=2015061003
{code}
the result of sql see blow.Notice that ,the second row content contains the 
first row content.
{noformat}
INFO [03:00:05.589] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
session=901,thread=223ession=3151,thread=254 2015061003
{noformat}

The content  of origin lzo file content see below ,just 2 rows.
{noformat}
INFO [03:00:05.635]  
session=3148,thread=285
INFO [03:00:05.635] HttpFrontServer::FrontSH 
msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
{noformat}

I think this error is caused by the Text reuse,and I found the solutions .

Addicational, table create sql is : 
{code:sql}
CREATE EXTERNAL TABLE `web_searchhub`(
  `line` string)
PARTITIONED BY (
  `logdate` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\\U'
WITH SERDEPROPERTIES (
  'serialization.encoding'='GBK')
STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
  OUTPUTFORMAT 
"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

LOCATION
  'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
{code}



> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
> invoke a bad method of Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  

[jira] [Updated] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaowei wang updated HIVE-10983:

Attachment: HIVE-10983.5.patch.txt

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
> invoke a bad method of Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-26 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602792#comment-14602792
 ] 

xiaowei wang commented on HIVE-10983:
-

I picke up a new patch,modify  transformTextToUTF8 and transformTextFromUTF8  . 
In the previous patch,I just modify transformTextToUTF8  . I want to modify 
transformTextFromUTF8   in another jira  originally  
,https://issues.apache.org/jira/browse/HIVE-11095  .
Thanks for the suggestions of Chengxiang Li.

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
> invoke a bad method of Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11095) SerDeUtils another bug ,when Text is reused

2015-06-27 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604411#comment-14604411
 ] 

xiaowei wang commented on HIVE-11095:
-

This one is not the same as HIVE-2 .In 2,the patch is for method of 
transformTextToUTF8,In my patch, is for the  method of transformTextFromUTF8.


> SerDeUtils  another bug ,when Text is reused
> 
>
> Key: HIVE-11095
> URL: https://issues.apache.org/jira/browse/HIVE-11095
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
> Fix For: 1.2.0
>
> Attachments: HIVE-11095.1.patch.txt, HIVE-11095.2.patch.txt
>
>
> {noformat}
> The method transformTextFromUTF8 have a  error bug, It invoke a bad method of 
> Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> How I found this bug?
> When i query data from a lzo table , I found in results : the length of the 
> current row is always largr than the previous row, and sometimes,the current 
> row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select * from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
> `line` string)
> PARTITIONED BY (
> `logdate` string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '
> U'
> WITH SERDEPROPERTIES (
> 'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
> OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
> 'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10983) SerDeUtils bug ,when Text is reused

2015-06-27 Thread xiaowei wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604425#comment-14604425
 ] 

xiaowei wang commented on HIVE-10983:
-

 The unit test has passed .
[~chengxiang li] ,[~xuefuz],[~ctang.ma]

> SerDeUtils bug  ,when Text is reused 
> -
>
> Key: HIVE-10983
> URL: https://issues.apache.org/jira/browse/HIVE-10983
> Project: Hive
>  Issue Type: Bug
>  Components: API, CLI
>Affects Versions: 0.14.0, 1.0.0, 1.2.0
> Environment: Hadoop 2.3.0-cdh5.0.0
> Hive 0.14
>Reporter: xiaowei wang
>Assignee: xiaowei wang
>  Labels: patch
> Fix For: 0.14.1, 1.2.0
>
> Attachments: HIVE-10983.1.patch.txt, HIVE-10983.2.patch.txt, 
> HIVE-10983.3.patch.txt, HIVE-10983.4.patch.txt, HIVE-10983.5.patch.txt
>
>
> {noformat}
> The mothod transformTextToUTF8 and transformTextFromUTF8  have a error bug,It 
> invoke a bad method of Text,getBytes()!
> The method getBytes of Text returns the raw bytes; however, only data up to 
> Text.length is valid.A better way is  use copyBytes()  if you need the 
> returned array to be precisely the length of the data.
> But the copyBytes is added behind hadoop1. 
> {noformat}
> When i query data from a lzo table , I found  in results : the length of the 
> current row is always largr  than the previous row, and sometimes,the current 
>  row contains the contents of the previous row。 For example ,i execute a sql ,
> {code:sql}
> select *   from web_searchhub where logdate=2015061003
> {code}
> the result of sql see blow.Notice that ,the second row content contains the 
> first row content.
> {noformat}
> INFO [03:00:05.589] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42098,session=3151,thread=254 2015061003
> INFO [03:00:05.594] <18941e66-9962-44ad-81bc-3519f47ba274> 
> session=901,thread=223ession=3151,thread=254 2015061003
> {noformat}
> The content  of origin lzo file content see below ,just 2 rows.
> {noformat}
> INFO [03:00:05.635]  
> session=3148,thread=285
> INFO [03:00:05.635] HttpFrontServer::FrontSH 
> msgRecv:Remote=/10.13.193.68:42095,session=3148,thread=285
> {noformat}
> I think this error is caused by the Text reuse,and I found the solutions .
> Addicational, table create sql is : 
> {code:sql}
> CREATE EXTERNAL TABLE `web_searchhub`(
>   `line` string)
> PARTITIONED BY (
>   `logdate` string)
> ROW FORMAT DELIMITED
>   FIELDS TERMINATED BY '\\U'
> WITH SERDEPROPERTIES (
>   'serialization.encoding'='GBK')
> STORED AS INPUTFORMAT  "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
>   OUTPUTFORMAT 
> "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
> LOCATION
>   'viewfs://nsX/user/hive/warehouse/raw.db/web/web_searchhub' 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >