Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by CorinneC: http://wiki.apache.org/pig/PigTutorial ------------------------------------------------------------------------------ 1. Install Java. 1. Install Pig (using the Pig JAR file). - 1. Install and run the Pig scripts (using the Pig tutorial file). + 1. Install and run the Pig scripts (using the Pig tutorial file) - in either local mode or on a Hadoop cluster. == Java Installation == Make sure your run-time environment includes the following: - 1. Java 1.5.x (perferably from Sun) + 1. Java 1.5.x (preferably from Sun) 1. The JAVA_HOME environment variable is set the root of your Java installation. @@ -30, +30 @@ 1. Define an environment variable with the location of the Pig JAR file. For example: export PIGDIR=/home/me/pig (bash, sh) or setenv PIGDIR /home/me/pig (tcsh, csh). - == Pig Script Installation: Local Mode == + == Pig Scripts: Local Mode == To install and run the Pig scripts in local mode, do the following: 1. Create a temporary directory. For example: /home/me/tmp 1. Download and unzip the Pig tutorial file in the temporary directory (''... not available yet''). 1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File]. - 1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and the[#Tutorial_Join_Pig_Script Tutorial-Join Pig Script]. + 1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig Script 2]. - 1. Execute the following command (using either tutorial-local.pig or tutorial-join-local.pig). + 1. Execute the following command (using either script1-local.pig or script2-local.pig). {{{ - $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local tutorial-local.pig + $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local script1-local.pig }}} - 1.#6 Review the results (either the tutorial-local-results.txt or tutorial-join-local-results.txt file in your local directory): + 1.#6 Review the result file (either script1-local-results.txt or script2-local-results.txt): {{{ - $ ls -l tutorial-local-results.txt + $ ls -l script1-local-results.txt }}} - == Pig Script Installation: Hadoop Cluster == + == Pig Scripts: Hadoop Cluster == To install and run the Pig scripts on a Hadoop cluster, do the following: 1. Create a temporary directory. For example: /home/me/tmp 1. Download and unzip the Pig tutorial file in the temporary directory (''... not available yet''). 1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File]. - 1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and the[#Tutorial_Join_Pig_Script Tutorial-Join Pig Script]. + 1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig Script 2]. - 1. Copy the exite.log file from your local directory to your DFS directory. View the file in your DFS directory. + 1. Copy the excite.log file from the temporary directory to the DFS directory. View the file in your DFS directory. {{{ $ hadoop dfs âcopyFromLocal excite.log . $ hadoop dfs -ls }}} 1.#6 Set the HADOOPSITEPATH environment variable to the location of your hadoop-site.xml file. - 1. Execute the following command (using either tutorial.pig or tutorial-join.pig): + 1. Execute the following command (using either script1-hadoop.pig or script2-hadoop.pig): {{{ - $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main tutorial.pig + $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main script-1-hadoop.pig }}} - 1.#8 Review the results (the files are located in either your tutorial-results or tutorial-join-results DFS directory): + 1.#8 Review the result files (located in either the script1-hadoop-results or script2-hadoop-results DFS directory): {{{ $ hadoop dfs -ls tutorial-results }}} @@ -75, +75 @@ The contents of the Pig tutorial file (*.gz) are described here. || '''File''' || '''Description'''|| || tutorial.jar|| User-defined functions (UDFs) || + || script1-local.pig || Query Phrase Popularity Pig script (local mode) || + || script1-hadoop.pig ||Query Phrase Popularity Pig script (Hadoop cluster) || + || script2-local.pig || Temporal Query Phrase Popularity (local mode)|| + || script2-hadoop.pig || Temporal Query Phrase Popularity (Hadoop cluster) || - || tutorial.pig || Tutorial pig script (Hadoop) > creates tutorial-results || - || tutorial-local.pig ||Tutorail pig script (local mode) > creates tutorial-local-results.txt || - || tutorial-join.pig || Tutorial-join pig script (Hadoop) > creates tutorial-join-results || - || tutorial-join-local.pig || Tutorial-join pig script (local mode) > creates tutorial-join-local-results.txt || - || excite.log || Data file (Hadoop) || || excite-small.log || Data file (local mode) || + || excite.log || Data file (Hadoop cluster) || || pornwords || Data file (porn keywords) || The user-defined functions (UDFs) are described here. @@ -95, +95 @@ || !TutorialUtil || Divides the query string into a set of words.|| - [[Anchor(Tutorial_Pig_Script)]] + [[Anchor(Pig_Script_1)]] - == Tutorial Pig Script == + == Pig Script 1: Query Phrase Popularity == - The tutorial pig script (tutorial.pig or tutorial-local.pig) does the following: + The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) processes a search query log file from the Excite search engine and finds search phrases that occur with particular high frequency during certain times of the day. + + + The script is shown here: * Register the tutorial JAR file so that the included UDFs can be called in the script. {{{ @@ -180, +183 @@ STORE ordered_uniq_frequency INTO '/tmp/tutorial-results' USING PigStorage(); }}} - [[Anchor(Tutorial_Join_Pig_Script)]] - == Tutorial-Join Pig Script == + [[Anchor(Pig_Script_2)]] + == Pig Script 2: Temporal Query Phrase Popularity == + The Temporal Query Phrase Popularity script (script2-local.pig or script2-hadoop.pig) processes a search query log file from the Excite search engine and compares the occurrence of frequency of search phrases across two time periods separated by twelve hours. - The tutorial-join pig script (tutorial-join.pig or tutorial-join-local.pig) does the following: + The script is shown here: * Register the tutorial JAR file so that the user-defined functions (UDFs) can be called in the script. {{{