We're trying to insert into a table, using a dynamic partition, but the query runs for a while and then dies with a LeaseExpiredException. The hadoop details & some discussion is at https://issues.apache.org/jira/browse/HDFS-198 Is there a way to configure hive, or our query, to work around this? If we adjust our query to handle less data at once, it can complete in under 10 minutes, but then we have to run the query many more times to get all the data processed.
The query is: FROM ( FROM ( SELECT file, os, country, dt, project FROM downloads WHERE dt='2010-10-01' DISTRIBUTE BY project SORT BY project asc, file asc ) a SELECT TRANSFORM(file, os, country, dt, project) USING 'transformwrap reduce.py' AS (file, downloads, os, country, project) ) b INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-10-01', project) SELECT file, downloads, os, country, FALSE, project The project partition has roughly 100000 values. We're using Hive trunk from about a month ago. Hadoop 0.18.3-14.cloudera.CH0_3 -- Dave Brondsema Software Engineer Geeknet www.geek.net