Hive .13 issue with hive.map.aggr=true
Hi Guys, In this query: select min(YYY), max(YYY) from where trim(YYY) is not null and trim(YYY)<>''; we expect the following result: #42684 ZYP7250455 Column 'YYY' is of type 'string' the file format is ORC with Snappy compression. However we get: ZYP7250455 (empty for the minimum), and the only work around is to set: hive.map.aggr=false; Any idea what is going on here, I also made sure to update the table stats using: analyze table PARTITION(monthly_capture) compute statistics; Thanks,
RE: hive 13: dynamic partition inserts
Hi Prasanth, Thanks a lot for your quick response. From: Gajendran, Vishnu Sent: Tuesday, July 22, 2014 11:47 AM To: user@hive.apache.org Cc: d...@hive.apache.org Subject: RE: hive 13: dynamic partition inserts Hi Prasanth, Thanks a lot for your quick response. From: Prasanth Jayachandran [pjayachand...@hortonworks.com] Sent: Tuesday, July 22, 2014 11:28 AM To: user@hive.apache.org Cc: d...@hive.apache.org Subject: Re: hive 13: dynamic partition inserts Hi Vishnu Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA https://issues.apache.org/jira/browse/HIVE-6455. Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue. Thanks Prasanth Jayachandran On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu mailto:vis...@amazon.com>> wrote: adding user@hive.apache.org<mailto:user@hive.apache.org> for wider audience From: Gajendran, Vishnu Sent: Tuesday, July 22, 2014 10:42 AM To: d...@hive.apache.org<mailto:d...@hive.apache.org> Subject: hive 13: dynamic partition inserts Hello, I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions. In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file. In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed. When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task. Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated. Thanks, vishnu CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: hive 13: dynamic partition inserts
Hi Prasanth, Thanks a lot for your quick response. From: Prasanth Jayachandran [pjayachand...@hortonworks.com] Sent: Tuesday, July 22, 2014 11:28 AM To: user@hive.apache.org Cc: d...@hive.apache.org Subject: Re: hive 13: dynamic partition inserts Hi Vishnu Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA https://issues.apache.org/jira/browse/HIVE-6455. Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue. Thanks Prasanth Jayachandran On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu mailto:vis...@amazon.com>> wrote: adding user@hive.apache.org<mailto:user@hive.apache.org> for wider audience From: Gajendran, Vishnu Sent: Tuesday, July 22, 2014 10:42 AM To: d...@hive.apache.org<mailto:d...@hive.apache.org> Subject: hive 13: dynamic partition inserts Hello, I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions. In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file. In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed. When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task. Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated. Thanks, vishnu CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: hive 13: dynamic partition inserts
Hi Vishnu Yes. There is change in the way dynamic partitions are inserted in hive 13. The new dynamic partitioning is highly scalable and uses very less memory. Here is the related JIRA https://issues.apache.org/jira/browse/HIVE-6455. Setting "hive.optimize.sort.dynamic.partition" to false will fallback to old way of insertion. If your destination table uses columnar formats like ORC, Parquet etc. then it makes sense leave the optimization ON, as columnar formats needs some buffer space for each column before flushing to disk. Buffer space (runtime memory) will quickly shoot up when there are lots of partition column values and columns. HIVE-6455 addresses this issue. Thanks Prasanth Jayachandran On Jul 22, 2014, at 10:51 AM, Gajendran, Vishnu wrote: > adding user@hive.apache.org for wider audience > From: Gajendran, Vishnu > Sent: Tuesday, July 22, 2014 10:42 AM > To: d...@hive.apache.org > Subject: hive 13: dynamic partition inserts > > Hello, > > I am seeing a difference between hive 11 and hive 13 when inserting to a > table with dynamic partitions. > > In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic > partition insert, I see number of files (generated my each mapper) in the > specified hdfs location as expected. But, in Hive 13, when I set > hive.merge.mapfiles=false, I just see one file in specified hdfs location for > the same query. I think hive is not honoring the hive.merge.mapfiles > parameter and it merged all the mapper outputs to a single file. > > In Hive 11, 19 mappers were executed for the dynamic partition insert task. > But in Hive 13, 19 mappers and 2 reducers were executed. > > When I checked the query plan for hive 11, there is only a map operator task > for dynamic partition insert. But, in hive 13, I see both map operator and > reduce operator task. > > Is there any changes in hive 13 regarding dymamic partition inserts? Any > comments on this issue is greatly appreciated. > > Thanks, > vishnu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: hive 13: dynamic partition inserts
adding user@hive.apache.org for wider audience From: Gajendran, Vishnu Sent: Tuesday, July 22, 2014 10:42 AM To: d...@hive.apache.org Subject: hive 13: dynamic partition inserts Hello, I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partitions. In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file. In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed. When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task. Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated. Thanks, vishnu
dynamic partition inserts in hive 13
Hello, I am seeing a difference between hive 11 and hive 13 when inserting to a table with dynamic partition. In Hive 11, when I set hive.merge.mapfiles=false before doing a dynamic partition insert, I see number of files (generated my each mapper) in the specified hdfs location as expected. But, in Hive 13, when I set hive.merge.mapfiles=false, I just see one file in specified hdfs location for the same query. I think hive is not honoring the hive.merge.mapfiles parameter and it merged all the mapper outputs to a single file. In Hive 11, 19 mappers were executed for the dynamic partition insert task. But in Hive 13, 19 mappers and 2 reducers were executed. When I checked the query plan for hive 11, there is only a map operator task for dynamic partition insert. But, in hive 13, I see both map operator and reduce operator task. Is there any changes in hive 13 regarding dymamic partition inserts? Any comments on this issue is greatly appreciated. Thanks, vishnu
HIVE 13 : Simple Join Throwing java.io.IOException
Hi, I am trying to run a simple join query on hive 13. Both tables are in text format. Both tables are read in mappers, and the error is thrown in reducer. I don't get why a reducer is reading a table when the mappers have read it already and the reason for assuming that the video file is in SequenceFile format. Below, you can find query, query plan, and the error. Any help is greatly appreciated. Thanks, Sid Hadoop Version: 2.0.0-mr1 Query: SELECT computerguid FROM revenue_start_adeffx_v2 JOIN video ON revenue_start_adeffx_v2.video_id = video.video_id WHERE hourid = '389567'; Query Plan: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: revenue_start_adeffx_v2 Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE value expressions: computerguid (type: string) TableScan alias: video Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Error: 2014-06-11 10:18:34,818 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs:///video/video_20140611051139 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: hdfs://hive/warehouse/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more 2014-06-11 10:18:34,822 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-06-11 10:18:34,824 WARN org.apache.hadoop.mapred.Child:
Re: Hive 13
We do not have a firm release date yet. The branch has been cut. I think Harish said he’d like to have a first RC early next week. It usually takes 1 to 2 weeks after the first RC, depending on any show stoppers found in it, etc. Alan. On Mar 19, 2014, at 6:50 AM, Bryan Jeffrey wrote: > Hello. > > Is there a firm release date for Hive 13? I know there was talk several > weeks ago about cutting a branch and looking at stability. > > Regards, > > Bryan Jeffrey -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Hive 13
Hello. Is there a firm release date for Hive 13? I know there was talk several weeks ago about cutting a branch and looking at stability. Regards, Bryan Jeffrey