Hi Eugene, One more observation, in namenode logs when i run select count(*) on individual tables i still see the same error as before,
org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.120.133:47114 Call#4 Retry#0: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/mls_public_record_association_ snapshot_orc/delta_0000002_0000002_0000/bucket_00094_flush_length but i get the count of the tables and they match well with the source data. So i believe the problem is with joining these tables together. Any specific logs you want me to debug. Regards, Satyajit. On Wed, Oct 26, 2016 at 4:16 PM, satyajit vegesna < satyajit.apas...@gmail.com> wrote: > Hi Eugene, > > select count(*) from mls_public_record_association_snapshot_orc pra left > outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = ml.id left > outer join attribute a on a.id = ml.standard_status > > ran till end and threw the below exception. > > MapReduce Total cumulative CPU time: 0 days 1 hours 0 minutes 53 seconds > 760 msec > Ended Job = job_1477494091659_0024 > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in [jar:file:/opt/apache-hive-2. > 1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/ > StaticLoggerBinder.class] > SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.2/ > share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/ > impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.apache.logging.slf4j. > Log4jLoggerFactory] > 2016-10-26 16:09:01 Starting to launch local task to process map join; maximum > memory = 514850816 > Execution failed with exit status: 2 > Obtaining error information > > Task failed! > Task ID: > Stage-9 > > Logs: > > FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql. > exec.mr.MapredLocalTask > MapReduce Jobs Launched: > Stage-Stage-1: Map: 300 Reduce: 121 Cumulative CPU: 3654.02 sec HDFS > Read: 1771032233 HDFS Write: 1917532703 SUCCESS > Total MapReduce CPU Time Spent: 0 days 1 hours 0 minutes 54 seconds 20 msec > > Explain Plan: > > STAGE DEPENDENCIES: > Stage-8 is a root stage , consists of Stage-1 > Stage-1 > Stage-9 depends on stages: Stage-1 > Stage-3 depends on stages: Stage-9 > Stage-0 depends on stages: Stage-3 > > STAGE PLANS: > Stage: Stage-8 > Conditional Operator > > Stage: Stage-1 > Map Reduce > Map Operator Tree: > TableScan > alias: pra > Statistics: Num rows: 99241216 Data size: 9924121600 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: primary_listing_id (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 99241216 Data size: 9924121600 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 99241216 Data size: 9924121600 Basic > stats: COMPLETE Column stats: NONE > TableScan > alias: ml > Statistics: Num rows: 201432950 Data size: 20949026816 Basic > stats: COMPLETE Column stats: NONE > Select Operator > expressions: id (type: string), standard_status (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 201432950 Data size: 20949026816 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: string) > sort order: + > Map-reduce partition columns: _col0 (type: string) > Statistics: Num rows: 201432950 Data size: 20949026816 > Basic stats: COMPLETE Column stats: NONE > value expressions: _col1 (type: int) > Reduce Operator Tree: > Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 _col0 (type: string) > 1 _col0 (type: string) > outputColumnNames: _col2 > Statistics: Num rows: 221576249 Data size: 23043929997 Basic > stats: COMPLETE Column stats: NONE > File Output Operator > compressed: false > table: > input format: org.apache.hadoop.mapred. > SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io. > HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2. > lazybinary.LazyBinarySerDe > > Stage: Stage-9 --it is failing in the same mapreduce local work. > Map Reduce Local Work > Alias -> Map Local Tables: > $hdt$_2:a > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > $hdt$_2:a > TableScan > alias: a > Statistics: Num rows: 12830 Data size: 51322 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: id (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 12830 Data size: 51322 Basic stats: > COMPLETE Column stats: NONE > HashTable Sink Operator > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > > Stage: Stage-3 > Map Reduce > Map Operator Tree: > TableScan > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 _col2 (type: int) > 1 _col0 (type: int) > Statistics: Num rows: 243733879 Data size: 25348323546 Basic > stats: COMPLETE Column stats: NONE > Group By Operator > aggregations: count() > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 8 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: bigint) > Local Work: > Map Reduce Local Work > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE > Column stats: NONE > table: > input format: org.apache.hadoop.mapred. > SequenceFileInputFormat > output format: org.apache.hadoop.hive.ql.io. > HiveSequenceFileOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > Any suggestion in debugging this issue is appreciated. > > > Regards, > Satyajit. > > > > > On Wed, Oct 26, 2016 at 3:34 PM, Eugene Koifman <ekoif...@hortonworks.com> > wrote: > >> If you can run this, then it’s safe to ignore “00094_flush_length” messages >> and the issue is somewhere else >> >> select count(*) from mls_public_record_association_snapshot_orc pra >> left outer join mls_listing_snapshot_orc ml on pra.primary_listing_id = >> ml.id left outer join attribute a on a.id = ml.standard_status >> >> Eugene >> >> From: satyajit vegesna <satyajit.apas...@gmail.com> >> Date: Wednesday, October 26, 2016 at 2:14 PM >> To: "u...@hive.apache.org" <u...@hive.apache.org>, Eugene Koifman < >> ekoif...@hortonworks.com> >> Cc: "dev@hive.apache.org" <dev@hive.apache.org> >> Subject: Re: Error with flush_length File in Orc, in hive 2.1.0 and mr >> execution engine. >> >> Hi Eugene, >> >> PFB Transaction table in green and parquet tables in yellow, >> >> INSERT INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") >> select pra.url as prUrl,pra.url_type as urlType,CAST(pra.created_at AS >> timestamp) as prCreated, CAST(pra.updated_at AS timestamp) as prUpdated, >> CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS >> timestamp) as mlUpdated, a.name as status, pra.public_record_id as prId, >> acl.accesstime as crawledon, pra.id as propId, pra.primary_listing_id as >> listingId, datediff(CAST(acl.accesstime AS timestamp),CAST(ml.created_at AS >> timestamp)) as mlcreateage, datediff(CAST(acl.accesstime AS >> timestamp),CAST(ml.updated_at AS timestamp)) as mlupdateage, >> datediff(CAST(acl.accesstime AS timestamp),CAST(pra.created_at AS >> timestamp)) as prcreateage, datediff(CAST(acl.accesstime AS >> timestamp),CAST(pra.updated_at AS timestamp)) as prupdateage, (case when >> (pra.public_record_id is not null and TRIM(pra.public_record_id) <> '') >> then (case when (pra.primary_listing_id is null or >> TRIM(pra.primary_listing_id) = '') then 'PR' else 'PRMLS' END) else (case >> when (pra.primary_listing_id is not null and TRIM(pra.primary_listing_id) >> <> '') then 'MLS' else 'UNKNOWN' END) END) as listingType, >> acl.httpstatuscode, acl.httpverb, acl.requesttime, >> acl.upstreamheadertime , acl.upstreamresponsetime, acl.page_id, useragent >> AS user_agent, substring(split(pra.url,'/')[0], >> 0,length(split(pra.url,'/')[0])-3) as city, >> substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as >> state, ml.mls_id FROM access_logs.loadbalancer_accesslogs acl inner >> join mls_public_record_association_snapshot_orc pra on acl.listing_url = >> pra.url left outer join mls_listing_snapshot_orc ml on >> pra.primary_listing_id = ml.id left outer join attribute a on a.id = >> ml.standard_status WHERE acl.accesstimedate="2016-10-23"; >> >> >> Any clue, or something that you would want me to focus on to debug the >> issue. >> >> Regards, >> Satyajit. >> >> >> >> On Tue, Oct 25, 2016 at 8:49 PM, Eugene Koifman <ekoif...@hortonworks.com >> > wrote: >> >>> Which of your tables are are transactional? Can you provide the DDL? >>> >>> I don’t think “File does not exist” error is causing your queries to >>> fail. It’s an INFO level msg. >>> There should be some other error. >>> >>> Eugene >>> >>> >>> From: satyajit vegesna <satyajit.apas...@gmail.com> >>> Reply-To: "u...@hive.apache.org" <u...@hive.apache.org> >>> Date: Tuesday, October 25, 2016 at 5:46 PM >>> To: "u...@hive.apache.org" <u...@hive.apache.org>, "dev@hive.apache.org" >>> <dev@hive.apache.org> >>> Subject: Error with flush_length File in Orc, in hive 2.1.0 and mr >>> execution engine. >>> >>> HI All, >>> >>> i am using hive 2.1.0 , hadoop 2.7.2 , but when i try running queries >>> like simple insert, >>> >>> set mapreduce.job.queuename=default;set hive.exec.dynamic.partition=true;set >>> hive.exec.dynamic.partition.mode=nonstrict;set >>> hive.exec.max.dynamic.partitions.pernode=400;set >>> hive.exec.max.dynamic.partitions=2000;set mapreduce.map.memory.mb=5120;set >>> mapreduce.reduce.memory.mb=5120;set >>> mapred.tasktracker.map.tasks.maximum=30;set >>> mapred.tasktracker.reduce.tasks.maximum=20;set >>> mapred.reduce.child.java.opts=-Xmx2048m;set >>> mapred.map.child.java.opts=-Xmx2048m; set >>> hive.support.concurrency=true; set hive.txn.manager=org.apache.ha >>> doop.hive.ql.lockmgr.DbTxnManager; set hive.compactor.initiator.on=false; >>> set hive.compactor.worker.threads=1;set mapreduce.job.queuename=default;set >>> hive.exec.dynamic.partition=true;set >>> hive.exec.dynamic.partition.mode=nonstrict;INSERT >>> INTO access_logs.crawlstats_dpp PARTITION(day="2016-10-23") select pra.url >>> as prUrl,pra.url_type as urlType,CAST(pra.created_at AS timestamp) as >>> prCreated, CAST(pra.updated_at AS timestamp) as prUpdated, >>> CAST(ml.created_at AS timestamp) as mlCreated, CAST(ml.updated_at AS >>> timestamp) as mlUpdated, a.name as status, pra.public_record_id as >>> prId, acl.accesstime as crawledon, pra.id as propId, >>> pra.primary_listing_id as listingId, datediff(CAST(acl.accesstime AS >>> timestamp),CAST(ml.created_at AS timestamp)) as mlcreateage, >>> datediff(CAST(acl.accesstime AS timestamp),CAST(ml.updated_at AS >>> timestamp)) as mlupdateage, datediff(CAST(acl.accesstime AS >>> timestamp),CAST(pra.created_at AS timestamp)) as prcreateage, >>> datediff(CAST(acl.accesstime AS timestamp),CAST(pra.updated_at AS >>> timestamp)) as prupdateage, (case when (pra.public_record_id is not null >>> and TRIM(pra.public_record_id) <> '') then (case when >>> (pra.primary_listing_id is null or TRIM(pra.primary_listing_id) = '') then >>> 'PR' else 'PRMLS' END) else (case when (pra.primary_listing_id is not null >>> and TRIM(pra.primary_listing_id) <> '') then 'MLS' else 'UNKNOWN' END) END) >>> as listingType, acl.httpstatuscode, acl.httpverb, acl.requesttime, >>> acl.upstreamheadertime , acl.upstreamresponsetime, acl.page_id, useragent >>> AS user_agent, substring(split(pra.url,'/')[0], >>> 0,length(split(pra.url,'/')[0])-3) as city, >>> substring(split(pra.url,'/')[0], length(split(pra.url,'/')[0])-1,2) as >>> state, ml.mls_id FROM access_logs.loadbalancer_accesslogs acl inner >>> join mls_public_record_association_snapshot_orc pra on acl.listing_url >>> = pra.url left outer join mls_listing_snapshot_orc ml on >>> pra.primary_listing_id = ml.id left outer join attribute a on a.id = >>> ml.standard_status WHERE acl.accesstimedate="2016-10-23"; >>> >>> i finally end up getting below error, >>> >>> 2016-10-25 17:40:18,725 Stage-2 map = 100%, reduce = 52%, Cumulative >>> CPU 1478.96 sec >>> 2016-10-25 17:40:19,761 Stage-2 map = 100%, reduce = 62%, Cumulative >>> CPU 1636.58 sec >>> 2016-10-25 17:40:20,794 Stage-2 map = 100%, reduce = 64%, Cumulative >>> CPU 1764.97 sec >>> 2016-10-25 17:40:21,820 Stage-2 map = 100%, reduce = 69%, Cumulative >>> CPU 1879.61 sec >>> 2016-10-25 17:40:22,842 Stage-2 map = 100%, reduce = 80%, Cumulative >>> CPU 2051.38 sec >>> 2016-10-25 17:40:23,872 Stage-2 map = 100%, reduce = 90%, Cumulative >>> CPU 2151.49 sec >>> 2016-10-25 17:40:24,907 Stage-2 map = 100%, reduce = 93%, Cumulative >>> CPU 2179.67 sec >>> 2016-10-25 17:40:25,944 Stage-2 map = 100%, reduce = 94%, Cumulative >>> CPU 2187.86 sec >>> 2016-10-25 17:40:29,062 Stage-2 map = 100%, reduce = 95%, Cumulative >>> CPU 2205.22 sec >>> 2016-10-25 17:40:30,107 Stage-2 map = 100%, reduce = 100%, Cumulative >>> CPU 2241.25 sec >>> MapReduce Total cumulative CPU time: 37 minutes 21 seconds 250 msec >>> Ended Job = job_1477437520637_0009 >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in [jar:file:/opt/apache-hive-2.1 >>> .0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/Static >>> LoggerBinder.class] >>> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.2/sh >>> are/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/im >>> pl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >>> explanation. >>> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4 >>> jLoggerFactory] >>> 2016-10-25 17:40:35Starting to launch local task to process map join;maximum >>> memory = 514850816 >>> Execution failed with exit status: 2 >>> Obtaining error information >>> >>> Task failed! >>> Task ID: >>> Stage-14 >>> >>> Logs: >>> >>> FAILED: Execution Error, return code 2 from >>> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask >>> MapReduce Jobs Launched: >>> Stage-Stage-1: Map: 106 Reduce: 45 Cumulative CPU: 3390.11 sec HDFS >>> Read: 8060555201 HDFS Write: 757253756 SUCCESS >>> Stage-Stage-2: Map: 204 Reduce: 85 Cumulative CPU: 2241.25 sec HDFS >>> Read: 2407914653 HDFS Write: 805874953 SUCCESS >>> Total MapReduce CPU Time Spent: 0 days 1 hours 33 minutes 51 seconds 360 >>> msec >>> >>> Could not find any errors in logs, but when i check namenode logs , oi >>> get the following error, >>> >>> 2016-10-25 17:01:51,923 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47114 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00094_flush_length >>> 2016-10-25 17:01:52,779 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.132:43008 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00095_flush_length >>> 2016-10-25 17:01:52,984 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 0 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47260 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00096_flush_length >>> 2016-10-25 17:01:53,381 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 0 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.132:43090 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00097_flush_length >>> 2016-10-25 17:01:53,971 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 1 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.134:37444 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00098_flush_length >>> 2016-10-25 17:01:54,092 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 2 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47300 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00099_flush_length >>> 2016-10-25 17:01:55,094 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 8 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.134:37540 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00012_flush_length >>> 2016-10-25 17:02:11,269 INFO org.apache.hadoop.ipc.Server: IPC Server >>> handler 5 on 9000, call org.apache.hadoop.hdfs.protoco >>> l.ClientProtocol.getBlockLocations from 192.168.120.133:47378 Call#4 >>> Retry#0: java.io.FileNotFoundException: File does not exist: >>> /user/hive/warehouse/mls_public_record_association_snapshot_ >>> orc/delta_0000002_0000002_0000/bucket_00075_flush_length >>> >>> i also search for find the flush_length files in the above mentioned >>> location, but i only see buckets but no files ending with flush_length. >>> >>> Any clue or help would be highly appreciated. >>> >>> Regards, >>> Satyajit. >>> >>> >> >