[1/2] git commit: CRUNCH-12: Fix package names in documentation.

jwills Wed, 11 Jul 2012 12:06:16 -0700

Updated Branches:
  refs/heads/master 5d841a4a5 -> 204aeb1dd


CRUNCH-12: Fix package names in documentation.

Rename packages from com.cloudera.crunch to org.apache.crunch.
Quote apostrophes because Markdown is very picky.
Fix other minor errors.

Signed-off-by: jwills <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/incubator-crunch/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-crunch/commit/204aeb1d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-crunch/tree/204aeb1d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-crunch/diff/204aeb1d

Branch: refs/heads/master
Commit: 204aeb1ddf3b44cdc00237a8b0be9e7e07888d40
Parents: 15e24a2
Author: Matthias Friedrich <[email protected]>
Authored: Wed Jul 11 19:42:33 2012 +0200
Committer: jwills <[email protected]>
Committed: Wed Jul 11 11:35:25 2012 -0700

----------------------------------------------------------------------
 src/site/markdown/index.md |   69 +++++++++++++++++++--------------------
 1 files changed, 34 insertions(+), 35 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-crunch/blob/204aeb1d/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/index.md b/src/site/markdown/index.md
index f5bae20..924dfc6 100644
--- a/src/site/markdown/index.md
+++ b/src/site/markdown/index.md
@@ -3,7 +3,7 @@
 ## Introduction
 
 Crunch is a Java library for writing, testing, and running MapReduce 
pipelines, based on
-Google's FlumeJava. Its goal is to make pipelines that are composed of many 
user-defined
+Google\'s FlumeJava. Its goal is to make pipelines that are composed of many 
user-defined
 functions simple to write, easy to test, and efficient to run.
 
 ## Build and Installation
@@ -11,10 +11,8 @@ functions simple to write, easy to test, and efficient to 
run.
 Crunch uses Maven for dependency management. The code in the examples/ 
subdirectory relies
 on the top-level crunch libraries. In order to execute the included WordCount 
application, run:
 
-    mvn install
-    cd examples/
-    mvn package
-    hadoop jar target/crunch-examples-0.2.0-job.jar 
com.cloudera.crunch.examples.WordCount <inputfile> <outputdir>
+    mvn clean install
+    hadoop jar examples/target/crunch-examples-*-job.jar 
org.apache.crunch.examples.WordCount <inputfile> <outputdir>
 
 ## High Level Concepts
 
@@ -43,20 +41,20 @@ joins.
 
 Every Crunch pipeline starts with a `Pipeline` object that is used to 
coordinate building the pipeline and executing the underlying MapReduce
 jobs. For efficiency, Crunch uses lazy evaluation, so it will only construct 
MapReduce jobs from the different stages of the pipelines when
-the Pipeline object's `run` or `done` methods are called.
+the Pipeline object\'s `run` or `done` methods are called.
 
 ## A Detailed Example
 
 Here is the classic WordCount application using Crunch:
 
-    import com.cloudera.crunch.DoFn;
-    import com.cloudera.crunch.Emitter;
-    import com.cloudera.crunch.PCollection;
-    import com.cloudera.crunch.PTable;
-    import com.cloudera.crunch.Pipeline;
-    import com.cloudera.crunch.impl.mr.MRPipeline;
-    import com.cloudera.crunch.lib.Aggregate;
-    import com.cloudera.crunch.type.writable.Writables;
+    import org.apache.crunch.DoFn;
+    import org.apache.crunch.Emitter;
+    import org.apache.crunch.PCollection;
+    import org.apache.crunch.PTable;
+    import org.apache.crunch.Pipeline;
+    import org.apache.crunch.impl.mr.MRPipeline;
+    import org.apache.crunch.lib.Aggregate;
+    import org.apache.crunch.types.writable.Writables;
 
     public class WordCount {
       public static void main(String[] args) throws Exception {
@@ -78,7 +76,7 @@ Here is the classic WordCount application using Crunch:
       }
     }
 
-Let's walk through the example line by line.
+Let\'s walk through the example line by line.
 
 ### Step 1: Creating a Pipeline and referencing a text file
 
@@ -89,7 +87,7 @@ that is used to tell Hadoop where to find the code that is 
used in the pipeline
 We now need to tell the Pipeline about the inputs it will be consuming. The 
Pipeline interface
 defines a `readTextFile` method that takes in a String and returns a 
PCollection of Strings.
 In addition to text files, Crunch supports reading data from SequenceFiles and 
Avro container files,
-via the `SequenceFileSource` and `AvroFileSource` classes defined in the 
com.cloudera.crunch.io package.
+via the `SequenceFileSource` and `AvroFileSource` classes defined in the 
org.apache.crunch.io package.
 
 Note that each PCollection is a _reference_ to a source of data- no data is 
actually loaded into a
 PCollection on the client machine.
@@ -116,25 +114,26 @@ in the output PCollection is serialized. While Crunch 
takes advantage of Java Ge
 compile-time type safety, the generic type information is not available at 
runtime. Crunch needs to know
 how to map the records stored in each PCollection into a Hadoop-supported 
serialization format in order
 to read and write data to disk. Two serialization implementations are 
supported in crunch via the
-`PTypeFamily` interface: a Writable-based system that is defined in the 
com.cloudera.crunch.type.writable
-package, and an Avro-based system that is defined in the 
com.cloudera.crunch.type.avro package. Each
+`PTypeFamily` interface: a Writable-based system that is defined in the 
org.apache.crunch.types.writable
+package, and an Avro-based system that is defined in the 
org.apache.crunch.types.avro package. Each
 implementation provides convenience methods for working with the common PTypes 
(Strings, longs, bytes, etc.)
 as well as utility methods for creating PTypes from existing Writable classes 
or Avro schemas.
 
 ### Step 3: Counting the words
 
-Out of Crunch's simple primitive operations, we can build arbitrarily complex 
chains of operations in order
+Out of Crunch\'s simple primitive operations, we can build arbitrarily complex 
chains of operations in order
 to perform higher-level operations, like aggregations and joins, that can work 
on any type of input data.
-Let's look at the implementation of the `Aggregate.count` function:
+Let\'s look at the implementation of the `Aggregate.count` function:
 
-    package com.cloudera.crunch.lib;
+    package org.apache.crunch.lib;
 
-    import com.cloudera.crunch.CombineFn;
-    import com.cloudera.crunch.MapFn;
-    import com.cloudera.crunch.PCollection;
-    import com.cloudera.crunch.PTable;
-    import com.cloudera.crunch.Pair;
-    import com.cloudera.crunch.type.PTypeFamily;
+    import org.apache.crunch.CombineFn;
+    import org.apache.crunch.MapFn;
+    import org.apache.crunch.PCollection;
+    import org.apache.crunch.PGroupedTable;
+    import org.apache.crunch.PTable;
+    import org.apache.crunch.Pair;
+    import org.apache.crunch.types.PTypeFamily;
     
     public class Aggregate {
     
@@ -170,14 +169,14 @@ and the number one by extending the `MapFn` convenience 
subclass of DoFn, and us
 PTable instance, with the key being the PType of the PCollection and the value 
being the Long
 implementation for this PTypeFamily.
 
-The next line features the second of Crunch's four operations, `groupByKey`. 
The groupByKey
+The next line features the second of Crunch\'s four operations, `groupByKey`. 
The groupByKey
 operation may only be applied to a PTable, and returns an instance of the 
`PGroupedTable`
 interface, which references the grouping of all of the values in the PTable 
that have the same key.
 The groupByKey operation is what triggers the reduce phase of a MapReduce 
within Crunch.
 
-The last line in the function returns the output of the third of Crunch's four 
operations,
+The last line in the function returns the output of the third of Crunch\'s 
four operations,
 `combineValues`. The combineValues operator takes a `CombineFn` as an 
argument, which is a
-specialized subclass of DoFn that operates on an implementation of Java's 
Iterable interface. The
+specialized subclass of DoFn that operates on an implementation of Java\'s 
Iterable interface. The
 use of combineValues (as opposed to parallelDo) signals to Crunch that the 
CombineFn may be used to
 aggregate values for the same key on the map side of a MapReduce job as well 
as the reduce side.
 
@@ -185,10 +184,10 @@ aggregate values for the same key on the map side of a 
MapReduce job as well as
 
 The Pipeline object also provides a `writeTextFile` convenience method for 
indicating that a
 PCollection should be written to a text file. There are also output targets 
for SequenceFiles and
-Avro container files, available in the com.cloudera.crunch.io package.
+Avro container files, available in the org.apache.crunch.io package.
 
 After you are finished constructing a pipeline and specifying the output 
destinations, call the
-pipeline's blocking `run` method in order to compile the pipeline into one or 
more MapReduce
+pipeline\'s blocking `run` method in order to compile the pipeline into one or 
more MapReduce
 jobs and execute them.
 
 ## Writing Your Own Pipelines
@@ -198,11 +197,11 @@ This section discusses the different steps of creating 
your own Crunch pipelines
 ### Writing a DoFn
 
 The DoFn class is designed to keep the complexity of the MapReduce APIs out of 
your way when you
-don't need them while still keeping them accessible when you do.
+don\'t need them while still keeping them accessible when you do.
 
 #### Serialization
 
-First, all DoFn instances are required to be `java.io.Serializable`. This is a 
key aspect of Crunch's design:
+First, all DoFn instances are required to be `java.io.Serializable`. This is a 
key aspect of Crunch\'s design:
 once a particular DoFn is assigned to the Map or Reduce stage of a MapReduce 
job, all of the state
 of that DoFn is serialized so that it may be distributed to all of the nodes 
in the Hadoop cluster that
 will be running that task. There are two important implications of this for 
developers:
@@ -231,7 +230,7 @@ are associated with a MapReduce stage, so that they may be 
accessed during initi
 ### Performing Cogroups and Joins
 
 In Crunch, cogroups and joins are performed on PTable instances that have the 
same key type. This section walks through
-the basic flow of a cogroup operation, explaining how this higher-level 
operation is composed of Crunch's four primitives.
+the basic flow of a cogroup operation, explaining how this higher-level 
operation is composed of Crunch\'s four primitives.
 In general, these common operations are provided as part of the core Crunch 
library or in extensions, you do not need
 to write them yourself. But it can be useful to understand how they work under 
the covers.

[1/2] git commit: CRUNCH-12: Fix package names in documentation.

Reply via email to