(always helpful to call out a version, I'm going to assume 1.2) > select * from (select count(1) from T union all select count(1) from T2) x;
> I have to admit that I don't quite understand that. Would it mean that we'd > only get a single row if we left out this empty path? AFAIK, this is a bit of historical stuff from MR, where a 0 task job is not valid (in Tez, it is). I know of at least one fix for metadata optimizations for partitioned table, which does this faster (but is not in Apache, AFAIK) https://issues.apache.org/jira/browse/HIVE-10596 > I do not understand the internals of query planning and execution well > enough but if someone has time to explain it to me I'd be very grateful. There's a DEBUG level log named <PERFLOG> that would be useful in debugging why this is slow. --hiveconf DEBUG,DRFA should get you the split of times within a query. > For simple queries like SELECT * FROM T LIMIT 10 I'm seeing 5-10min runtimes > just > because of this overhead. There are 2 optimizers you can disable and try this out. set hive.optimize.metadataonly=false; set hive.fetch.task.conversion=minimal; (or none) The first one prevents creation of dummy files for a simple query like the count(1). The second one prevents an optimizer check which will sum up the file sizes of all files till it reaches 1Gb before disabling the fetch codepath. Cheers, Gopal
