Hi guys, Recently I started to investigate Hive, so far I have several questions.
1). I have a table A partitioned by column x INT. The table has several partitions say x=1, x=2, x=10. Running query "select max(x) from A" hive runs map-reduce job trying to find column x inside data files. Is it possible / realizable to get Hive more smart and just do simple query against metastore to get highest value of partition in case when I'm interested in only (several) "meta" column, thats available in metastore only? Next one I guess is correct behavior but also could be fine to get more sophisticated. When I'm doing "select x from A" it runs map-reduce which is correct. As I understand in this case Hive returns value of x multiplied by number of rows of any partition. But again, "select distinct x from A" fires map-reduce... 2). It could be very useful if hive could use UDF or value of column inside dataset when creating new partition. In particular: insert overwrite table B partition(ds=A.foo) select A.foo, A.bar from A; insert overwrite table B partition(x=unix_timestamp()) select A.foo, A.bar from A; So far I didn't find case when it's possible. Thanks guys! -- Andrey Pankov