Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigTutorial

------------------------------------------------------------------------------
  
   1. Install Java.
   1. Install Pig (using the Pig JAR file).
-  1. Install and run the Pig scripts (using the Pig tutorial file).
+  1. Install and run the Pig scripts (using the Pig tutorial file) - in either 
local mode or on a Hadoop cluster.
  
  == Java Installation ==
  Make sure your run-time environment includes the following:
-  1. Java 1.5.x (perferably from Sun)
+  1. Java 1.5.x (preferably from Sun)
   1. The JAVA_HOME environment variable is set the root of your Java 
installation. 
  
  
@@ -30, +30 @@

   1. Define an environment variable with the location of the Pig JAR file. For 
example: export PIGDIR=/home/me/pig (bash, sh) or setenv PIGDIR /home/me/pig 
(tcsh, csh).
  
  
- == Pig Script Installation: Local Mode ==
+ == Pig Scripts: Local Mode ==
  To install and run the Pig scripts in local mode, do the following:
  
   1. Create a temporary directory. For example: /home/me/tmp
   1. Download and unzip the Pig tutorial file in the temporary directory 
(''... not available yet'').
   1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File].
-  1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and 
the[#Tutorial_Join_Pig_Script Tutorial-Join Pig Script].
+  1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig 
Script 2].
-  1. Execute the following command (using either tutorial-local.pig or 
tutorial-join-local.pig).
+  1. Execute the following command (using either script1-local.pig or 
script2-local.pig).
  {{{
- $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local tutorial-local.pig
+ $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local script1-local.pig
  }}}
  
-  1.#6 Review the results (either the tutorial-local-results.txt or 
tutorial-join-local-results.txt file in your local directory):
+  1.#6 Review the result file (either script1-local-results.txt or 
script2-local-results.txt):
  {{{
- $ ls -l tutorial-local-results.txt
+ $ ls -l script1-local-results.txt
  }}}
  
  
- == Pig Script Installation: Hadoop Cluster ==
+ == Pig Scripts: Hadoop Cluster ==
  To install and run the Pig scripts on a Hadoop cluster, do the following:
  
   1. Create a temporary directory. For example: /home/me/tmp
   1. Download and unzip the Pig tutorial file in the temporary directory 
(''... not available yet'').
   1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File].
-  1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and 
the[#Tutorial_Join_Pig_Script Tutorial-Join Pig Script].
+  1. Review the scripts: [#Pig_Script_1 Pig Script 1] and [#Pig_Script_2 Pig 
Script 2].
-  1. Copy the exite.log file from your local directory to your DFS directory. 
View the file in your DFS directory.
+  1. Copy the excite.log file from the temporary directory to the DFS 
directory. View the file in your DFS directory.
  {{{
  $ hadoop dfs –copyFromLocal excite.log .
  $ hadoop dfs -ls
  }}}
   1.#6 Set the HADOOPSITEPATH environment variable to the location of your 
hadoop-site.xml file.
-  1. Execute the following command (using either tutorial.pig or 
tutorial-join.pig):
+  1. Execute the following command (using either script1-hadoop.pig or 
script2-hadoop.pig):
  {{{
- $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main tutorial.pig
+ $ java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main 
script-1-hadoop.pig
  }}}
-  1.#8 Review the results (the files are located in either your 
tutorial-results or tutorial-join-results DFS directory):
+  1.#8 Review the result files (located in either the script1-hadoop-results 
or script2-hadoop-results DFS directory):
  {{{
  $ hadoop dfs -ls tutorial-results
  }}}
@@ -75, +75 @@

  The contents of the Pig tutorial file (*.gz) are described here.
  || '''File''' || '''Description'''||
  || tutorial.jar|| User-defined functions (UDFs) ||
+ || script1-local.pig || Query Phrase Popularity Pig script (local mode) ||
+ || script1-hadoop.pig ||Query Phrase Popularity Pig script (Hadoop cluster) ||
+ || script2-local.pig || Temporal Query Phrase Popularity (local mode)||
+ || script2-hadoop.pig || Temporal Query Phrase Popularity (Hadoop cluster) ||
- || tutorial.pig || Tutorial pig script (Hadoop) > creates tutorial-results ||
- || tutorial-local.pig ||Tutorail pig script (local mode) > creates 
tutorial-local-results.txt ||
- || tutorial-join.pig || Tutorial-join pig script (Hadoop) > creates 
tutorial-join-results ||
- || tutorial-join-local.pig || Tutorial-join pig script (local mode) > creates 
tutorial-join-local-results.txt ||
- || excite.log || Data file (Hadoop) ||
  || excite-small.log || Data file (local mode) ||
+ || excite.log || Data file (Hadoop cluster) ||
  || pornwords || Data file (porn keywords) ||
  
  The user-defined functions (UDFs) are described here.
@@ -95, +95 @@

  || !TutorialUtil || Divides the query string into a set of words.||
  
  
- [[Anchor(Tutorial_Pig_Script)]]
+ [[Anchor(Pig_Script_1)]]
- == Tutorial Pig Script ==
+ == Pig Script 1: Query Phrase Popularity ==
  
- The tutorial pig script (tutorial.pig or tutorial-local.pig) does the 
following:
+ The Query Phrase Popularity script (script1-local.pig or script1-hadoop.pig) 
processes a search query log file from the Excite search engine and finds 
search phrases that occur with particular high frequency during certain times 
of the day.
+ 
+ 
+ The script is shown here:
  
   * Register the tutorial JAR file so that the included UDFs can be called in 
the script.
  {{{ 
@@ -180, +183 @@

  STORE ordered_uniq_frequency INTO '/tmp/tutorial-results' USING PigStorage(); 
  }}}
  
- [[Anchor(Tutorial_Join_Pig_Script)]]
- == Tutorial-Join Pig Script ==
+ [[Anchor(Pig_Script_2)]]
+ == Pig Script 2: Temporal Query Phrase Popularity ==
+ The Temporal Query Phrase Popularity script (script2-local.pig or 
script2-hadoop.pig) processes a search query log file from the Excite search 
engine and compares the occurrence of frequency of search phrases across two 
time periods separated by twelve hours.
  
- The tutorial-join pig script (tutorial-join.pig or tutorial-join-local.pig) 
does the following:
+ The script is shown here:
  
   * Register the tutorial JAR file so that the user-defined functions (UDFs) 
can be called in the script.
  {{{

Reply via email to