[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

jameszhouyi Thu, 21 May 2015 18:03:00 -0700

Github user jameszhouyi commented on the pull request:

    https://github.com/apache/spark/pull/5688#issuecomment-104464124
  
    @viirya , please see below query details with Using script transform:
    
    ADD FILE ${env:BIG_BENCH_QUERIES_DIR}/Resources/bigbenchqueriesmr.jar;
    
    --CREATE RESULT TABLE. Store query result externally in 
output_dir/qXXresult/
    DROP TABLE IF EXISTS ${hiveconf:RESULT_TABLE};
    CREATE TABLE ${hiveconf:RESULT_TABLE} (
      pid1 BIGINT,
      pid2 BIGINT,
      cnt  BIGINT
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
    STORED AS ${env:BIG_BENCH_hive_default_fileformat_result_table} LOCATION 
'${hiveconf:RESULT_DIR}';
    
    -- the real query part
    --Find the most frequent ones
    INSERT INTO TABLE ${hiveconf:RESULT_TABLE}
    SELECT pid1, pid2, COUNT (*) AS cnt
    FROM (
      --Make items basket
      FROM (
        -- Joining two tables
        SELECT s.ss_ticket_number AS oid , s.ss_item_sk AS pid
        FROM store_sales s
        INNER JOIN item i ON (s.ss_item_sk = i.i_item_sk)
        WHERE i.i_category_id in (${hiveconf:q01_i_category_id_IN})
        AND s.ss_store_sk in (${hiveconf:q01_ss_store_sk_IN})
        CLUSTER BY oid
      ) q01_map_output
      REDUCE q01_map_output.oid, q01_map_output.pid
      USING '${env:BIG_BENCH_JAVA} ${env:BIG_BENCH_java_child_process_xmx} -cp 
bigbenchqueriesmr.jar de.bankmark.bigbench.queries.q01.Red -ITEM_SET_MAX 
${hiveconf:q01_NPATH_ITEM_SET_MAX} '
      AS (pid1 BIGINT, pid2 BIGINT)
    ) q01_temp_basket
    GROUP BY pid1, pid2
    HAVING COUNT (pid1) > ${hiveconf:q01_COUNT_pid1_greater}
    CLUSTER BY pid1 ,cnt ,pid2
    ;



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

Reply via email to