Currently the only way to do it is to use a reducer.

set mapred.reduce.tasks=1;
SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id, percentile,
count) FROM (SELECT actor_id FROM activities CLUSTER BY actor_id) a;

On Sun, Jan 11, 2009 at 8:45 PM, Josh Ferguson <[email protected]> wrote:

> If I'm running a query like this:
>
> hive> SELECT TRANSFORM(actor_id) USING '/my/script' AS (actor_id,
> percentile, count) FROM activities;
>
> It creates a map job for each file. I need every row that is in the table
> to be run through a single instance of the script since certain parts
> require global list information. Do I need to rework this query to use a
> reducer or can I change some configuration variable to load in all of my
> data from this table and run it through /my/script all at once?
>
> Josh F.
>



-- 
Yours,
Zheng

Reply via email to