svn commit: r896134 - in /hadoop/pig/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/piglatin_reference.xml src/docs/src/documentation/content/xdocs/piglatin_users.xml
Author: olga Date: Tue Jan 5 17:18:51 2010 New Revision: 896134 URL: http://svn.apache.org/viewvc?rev=896134view=rev Log: PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) Modified: hadoop/pig/trunk/CHANGES.txt hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml Modified: hadoop/pig/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=896134r1=896133r2=896134view=diff == --- hadoop/pig/trunk/CHANGES.txt (original) +++ hadoop/pig/trunk/CHANGES.txt Tue Jan 5 17:18:51 2010 @@ -24,6 +24,8 @@ IMPROVEMENTS +PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) + PIG-1102: Collect number of spills per job (sriranjan via olgan) PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml?rev=896134r1=896133r2=896134view=diff == --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml Tue Jan 5 17:18:51 2010 @@ -4919,58 +4919,7 @@ /section/section - section - titleDUMP/title - paraDisplays the contents of a relation./para - - section - titleSyntax/title - informaltable frame=all - tgroup cols=1tbodyrow -entry - paraDUMP alias;Â Â Â Â /para -/entry - /row/tbody/tgroup - /informaltable/section - - section - titleTerms/title - informaltable frame=all - tgroup cols=2tbodyrow -entry - paraalias/para -/entry -entry - paraThe name of a relation./para -/entry - /row/tbody/tgroup - /informaltable/section - - section - titleUsage/title - paraUse the DUMP operator to run (execute) a Pig Latin statement and to display the contents of an alias. You can use DUMP as a debugging device to make sure the results you are expecting are being generated./para/section - - section - titleExample/title - paraIn this example a dump is performed after each statement./para -programlisting -A = LOAD 'student' AS (name:chararray, age:int, gpa:float); - -DUMP A; -(John,18,4.0F) -(Mary,19,3.7F) -(Bill,20,3.9F) -(Joe,22,3.8F) -(Jill,20,4.0F) - -B = FILTER A BY name matches 'J.+'; - -DUMP B; -(John,18,4.0F) -(Joe,22,3.8F) -(Jill,20,4.0F) -/programlisting -/section/section + section titleFILTER /title @@ -6521,7 +6470,7 @@ section titleSTORE /title - paraStores data to the file system./para + paraStores or saves results to the file system./para section titleSyntax/title @@ -6591,7 +6540,10 @@ section titleUsage/title - paraUse the STORE operator to run (execute) Pig Latin statements and to store data on the file system. /para/section + paraUse the STORE operator to run (execute) Pig Latin statements and save (persist) results to the file system. Use STORE for production scripts and batch mode processing./para + + paraNote: To debug scripts during development, you can use ulink url=piglatin_reference.html#DUMPDUMP/ulink to check intermediate results./para +/section section titleExamples/title @@ -6962,6 +6914,68 @@ /section/section + + section + titleDUMP/title + paraDumps or displays results to screen./para + + section + titleSyntax/title + informaltable frame=all + tgroup cols=1tbodyrow +entry + paraDUMP alias;Â Â Â Â /para +/entry + /row/tbody/tgroup + /informaltable/section + + section + titleTerms/title + informaltable frame=all + tgroup cols=2tbodyrow +entry + paraalias/para +/entry +entry + paraThe name of a relation./para +/entry + /row/tbody/tgroup + /informaltable/section + + section + titleUsage/title + paraUse the DUMP operator to run (execute) Pig Latin statements and display the results to your screen. DUMP is meant for interactive mode; statements are executed immediately and the results are not saved (persisted). You can use DUMP as a debugging device to make sure that the results you are expecting are actually generated. /para + + para + Note that production scripts emphasisshould not/emphasis use DUMP as it will disable multi-query optimizations and is likely to slow down execution + (see ulink url=piglatin_users.html#Store+vs.+DumpStore vs. Dump/ulink). + /para +
[Pig Wiki] Update of PigMix by AlanGates
Dear Wiki user, You have subscribed to a wiki page or wiki category on Pig Wiki for change notification. The PigMix page has been changed by AlanGates. http://wiki.apache.org/pig/PigMix?action=diffrev1=13rev2=14 -- || PigMix_12 || 55.33|| 95.33 || 0.58 || || Total || 1352.33 || 1357 || 1.00 || || Weighted avg || || || 1.04 || + + Run date: January 4, 2010, run against 0.6 branch as of that day + || Test || Pig run time || Java run time || Multiplier || + || PigMix_1 || 138.33 || 112.67|| 1.23 || + || PigMix_2 || 66.33|| 39.33 || 1.69 || + || PigMix_3 || 199 || 83.33 || 2.39 || + || PigMix_4 || 59 || 60.67 || 0.97 || + || PigMix_5 || 80.33|| 113.67|| 0.71 || + || PigMix_6 || 65 || 77.67 || 0.84 || + || PigMix_7 || 63.33|| 61|| 1.04 || + || PigMix_8 || 40 || 47.67 || 0.84 || + || PigMix_9 || 214 || 215.67|| 0.99 || + || PigMix_10 || 284.67 || 284.33|| 1.00 || + || PigMix_11 || 141.33 || 151.33|| 0.93 || + || PigMix_12 || 55.67|| 115 || 0.48 || + || Total || 1407 || 1362.33 || 1.03 || + || Weighted Avg || || || 1.09 || + == Features Tested ==
svn commit: r896212 - in /hadoop/pig/branches/branch-0.6: CHANGES.txt src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml src/docs/src/documentation/content/xdocs/zebra_pig.xml src/docs/src/d
Author: olga Date: Tue Jan 5 20:37:14 2010 New Revision: 896212 URL: http://svn.apache.org/viewvc?rev=896212view=rev Log: PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan) Modified: hadoop/pig/branches/branch-0.6/CHANGES.txt hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_reference.xml Modified: hadoop/pig/branches/branch-0.6/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/CHANGES.txt?rev=896212r1=896211r2=896212view=diff == --- hadoop/pig/branches/branch-0.6/CHANGES.txt (original) +++ hadoop/pig/branches/branch-0.6/CHANGES.txt Tue Jan 5 20:37:14 2010 @@ -26,6 +26,8 @@ IMPROVEMENTS +PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan) + PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan) Modified: hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml URL: http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml?rev=896212r1=896211r2=896212view=diff == --- hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml (original) +++ hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_mapreduce.xml Tue Jan 5 20:37:14 2010 @@ -45,14 +45,215 @@ /section !-- END HADOOP M/R API-- + !-- ZEBRA API-- + section + titleZebra MapReduce APIs/title +pZebra includes several classes for use in MapReduce programs. The main entry point into Zebra are the two classes for reading and writing tables, namely TableInputFormat and BasicTableOutputFormat. /p + + section + titleBasicTableOutputFormat /title + table + trthStatic/ththMethod/ththDescription/th/tr + tr + tdyes/td + tdvoid setOutputPath(JobConf, Path) /td + tdSet the output path of the BasicTable in JobConf /td + /tr + tr + tdyes/td + tdPath[] getOutputPaths(JobConf) /td + tdGet the output paths of the BasicTable from JobConf /td + /tr + tr + tdyes/td + tdvoid setStorageInfo(JobConf, ZebraSchema, ZebraStorageHint, ZebraSortInfo) /td + tdSet the table storage information (schema, storagehint, sortinfo) in JobConf/td + /tr + tr + tdyes/td + tdSchema getSchema(JobConf) /td + tdGet the table schema in JobConf /td + /tr + tr + tdyes/td + tdBytesWritable generateSortKey(JobConf, Tuple) /td + tdGenerates a BytesWritable key for the input key /td + /tr + tr + tdyes/td + tdString getStorageHint(JobConf) /td + tdGet the table storage hint in JobConf /td + /tr + tr + tdyes/td + tdSortInfo getSortInfo(JobConf) /td + tdGet the SortInfo object /td + /tr + tr + tdyes/td + tdvoid close(JobConf) /td + tdClose the output BasicTable, No more rows can be added into the table /td + /tr + tr + tdyes/td + tdvoid setMultipleOutputs(JobConf, String commaSeparatedLocs, Class lt; extends ZebraOutputPartitiongt; theClass) /td + tdEnables data to be written to multiple zebra tables based on the ZebraOutputPartition class. + See a href=zebra_mapreduce.html#Multiple+Table+OutputsMultiple Table Outputs./a/td + /tr + /table +/section + + section + titleTableInputFormat /title + table + trthStatic/ththMethod/ththDescription/th/tr + tr + tdyes/td + tdvoid setInputPaths(JobConf, Path... paths) /td + tdSet the paths to the input table /td + + /tr + tr + tdyes/td + tdPath[] getInputPaths(JobConf) /td + tdGet the comma-separated paths to the input table or table union /td +