Hi, One of my job keeps facing FSError: java.io.IOException: No space left on device with some tasks fail with org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/file.out at .... on Host node72-142.prod-aws.eadpdata.ea.com OR org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for attempt_201405211957_566618_m_000001_0/intermediate.34 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at ...
The nodes failed the tasks don't look that full and the stats for this job is attached below. The job is doing a self inner join in the subquery then do some aggregation. Does anybody possibly know what's the reason the job fails on space issue while we still have some space? And is there any way to optimize the query itself besides the space cleanup? Thanks a lot! SET mapred.max.split.size=134217728; SET mapred.min.split.size.per.node=100000000; SET mapred.min.split.size.per.rack=100000000; CREATE EXTERNAL TABLE IF NOT EXISTS mpst.score_per_min_v2 ( game_name STRING, hosted_platform STRING, s_kit STRING, vehicle STRING, score_amt FLOAT, min_spent FLOAT, score_per_min FLOAT ) PARTITIONED BY (load_datetime STRING) STORED AS RCFILE LOCATION '/hive/warehouse/mpst/score_per_min_v2'; INSERT OVERWRITE TABLE score_per_min_v2 PARTITION(load_datetime='2014-06-09 23-58-00') SELECT game_name, hosted_platform, CASE WHEN s_kit IS NOT NULL THEN s_kit ELSE "NA" END AS s_kit, vehicle, SUM(score_amt), SUM(time_duration/60) AS min_spent, CASE WHEN SUM(time_duration/60)=0 THEN 0.0 ELSE round(SUM(score_amt)/SUM(time_duration/60),2) END AS score_per_min FROM ( SELECT c.round_guid AS round_guid, c.persona_id AS persona_id, c.player_id AS player_id, c.round_start_datetime AS round_start_datetime, c.s_kit AS s_kit, c.vehicle AS vehicle, a.round_time AS start_time, c.round_time AS end_time, (c.round_time - a.round_time) AS time_duration, c.score_amt, c.hosted_platform, c.game_name FROM mpst.spm_stg_v2 c INNER JOIN mpst.spm_stg_v2 a ON a.dt= '2014-06-10' AND c.dt = '2014-06-10' AND a.dt = c.dt AND a.service = c.service AND a.hour = c.hour AND a.round_guid = c.round_guid AND a.player_id = c.player_id AND a.hosted_platform = c.hosted_platform AND a.persona_id = c.persona_id AND a.player_id = c.player_id AND a.round_start_datetime = c.round_start_datetime AND a.rank = (c.rank - 1) ) x GROUP BY game_name, hosted_platform, s_kit, vehicle; Map-Reduce Framework Map output materialized bytes 173,033,990,918 0 173,033,990,918 Map input records 555,343,308 0 555,343,308 Reduce shuffle bytes 0 173,033,990,918 173,033,990,918 Spilled Records 4,188,988,304 1,350,009,594 5,538,997,898 Map output bytes 169,705,718,344 0 169,705,718,344 Total committed heap usage (bytes) 3,002,007,552 553,385,984 3,555,393,536 CPU time spent (ms) 26,347,260 10,932,050 37,279,310 Map input bytes 1,275,536,063 0 1,275,536,063 SPLIT_RAW_BYTES 13,493 0 13,493 Combine input records 0 0 0 Reduce input records 0 1,110,686,616 1,110,686,616 Reduce input groups 0 1,110,686,616 1,110,686,616 Combine output records 0 0 0 Physical memory (bytes) snapshot 3,628,310,528 493,240,320 4,121,550,848 Reduce output records 0 0 0 Virtual memory (bytes) snapshot 21,354,807,296 4,420,263,936 25,775,071,232 Map output records 1,110,686,616 0 1,110,686,616 Regards, Y. Chen --- Perspiration never betray you ---