Currently the only way to do it is to use a reducer. set mapred.reduce.tasks=1; SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile, count) FROM (SELECT actor_id FROM activities CLUSTER BY actor_id) a;
On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson <[email protected]> wrote: > If I'm running a query like this: > > hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, > percentile, count) FROM activities; > > It creates a map job for each file. I need every row that is in the table > to be run through a single instance of the script since certain parts > require global list information. Do I need to rework this query to use a > reducer or can I change some configuration variable to load in all of my > data from this table and run it through /my/script all at once? > > Josh F. > -- Yours, Zheng
