GEODE-121: Document Change: Rename GemFire to Geode Rename "GemFire" to "Geode". But keep "gemfire" in package names, APIs till related Geode packages are changed.
Project: http://git-wip-us.apache.org/repos/asf/incubator-geode/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-geode/commit/60d074cc Tree: http://git-wip-us.apache.org/repos/asf/incubator-geode/tree/60d074cc Diff: http://git-wip-us.apache.org/repos/asf/incubator-geode/diff/60d074cc Branch: refs/heads/develop Commit: 60d074cc7adbafc4651e4326dc917e6f67f1a19e Parents: 047ef62 Author: Jianxia Chen <jche...@apache.org> Authored: Wed Jul 15 14:16:23 2015 -0700 Committer: Jianxia Chen <jche...@apache.org> Committed: Wed Jul 15 14:16:23 2015 -0700 ---------------------------------------------------------------------- gemfire-spark-connector/README.md | 47 ++++++++------------ gemfire-spark-connector/doc/10_demos.md | 32 +++++++------- gemfire-spark-connector/doc/1_building.md | 3 +- gemfire-spark-connector/doc/2_quick.md | 50 +++++++++++----------- gemfire-spark-connector/doc/3_connecting.md | 18 ++++---- gemfire-spark-connector/doc/4_loading.md | 37 +++++++++------- gemfire-spark-connector/doc/5_rdd_join.md | 16 +++---- gemfire-spark-connector/doc/6_save_rdd.md | 22 +++++----- gemfire-spark-connector/doc/7_save_dstream.md | 10 ++--- gemfire-spark-connector/doc/8_oql.md | 14 +++--- gemfire-spark-connector/doc/9_java_api.md | 22 +++++----- 11 files changed, 131 insertions(+), 140 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/README.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/README.md b/gemfire-spark-connector/README.md index 125cef7..d6e76e8 100644 --- a/gemfire-spark-connector/README.md +++ b/gemfire-spark-connector/README.md @@ -1,45 +1,32 @@ -#Spark GemFire Connector +#Spark Geode Connector -Note: GemFire is now an open source project [Geode](http://projectgeode.org). - -Spark GemFire Connector let's you connect Spark to GemFire, expose GemFire regions as Spark -RDDs, save Spark RDDs to GemFire and execute GemFire OQL queries in your Spark applications -and expose results as DataFrames. +Spark Geode Connector let's you connect Spark to Geode, expose Geode regions as Spark +RDDs, save Spark RDDs to Geode and execute Geode OQL queries in your Spark applications +and expose the results as DataFrames. ##Features: - - Expose GemFire region as Spark RDD with GemFire server-side filtering - - RDD join and outer join GemFire region - - Save Spark RDD to GemFire - - Save DStream to GemFire - - Execute GemFire OQL and return DataFrame + + - Expose Geode region as Spark RDD with Geode server-side filtering + - RDD join and outer join Geode region + - Save Spark RDD to Geode + - Save DStream to Geode + - Execute Geode OQL and return DataFrame ##Version and Compatibility -| Connector | Spark | GemFire | Geode | -|-----------|-------|---------|-------| -| 0.5 | 1.3 | 9.0 | ? | -##Download -TBD +Spark Geode Connector supports Spark 1.3. ##Documentation - [Building and testing](doc/1_building.md) - [Quick start](doc/2_quick.md) - - [Connect to GemFire](doc/3_connecting.md) - - [Loading data from GemFire](doc/4_loading.md) - - [RDD Join and Outer Join GemFire Region](doc/5_rdd_join.md) - - [Saving RDD to GemFire](doc/6_save_rdd.md) - - [Saving DStream to GemFire](doc/7_save_dstream.md) - - [GemFire OQL](doc/8_oql.md) + - [Connect to Geode](doc/3_connecting.md) + - [Loading data from Geode](doc/4_loading.md) + - [RDD Join and Outer Join Geode Region](doc/5_rdd_join.md) + - [Saving RDD to Geode](doc/6_save_rdd.md) + - [Saving DStream to Geode](doc/7_save_dstream.md) + - [Geode OQL](doc/8_oql.md) - [Using Connector in Java](doc/9_java_api.md) - [About the demos](doc/10_demos.md) - - [Logging](doc/logging.md) ??? - - [Security] (doc/security.md) ??? - - -##Community: Reporting bugs, mailing list, contributing - (TBD) - ##License: Apache License 2.0 - (TBD) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/10_demos.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/10_demos.md b/gemfire-spark-connector/doc/10_demos.md index ef4ef86..30da687 100644 --- a/gemfire-spark-connector/doc/10_demos.md +++ b/gemfire-spark-connector/doc/10_demos.md @@ -1,29 +1,29 @@ ## About The Demos -The Spark GemFire Connector contains basic demos, as samples, in both Scala +The Spark Geode Connector contains basic demos, as samples, in both Scala and Java. - - Read GemFire region to Spark as a RDD (`RegionToRDDJavaDemo.java`) - - Write Spark pair RDD to GemFire (`PairRDDSaveJavaDemo.java`) - - Write Spark non-pair RDD to GemFire (`RDDSaveJavaDemo.java`) + - Read Geode region to Spark as a RDD (`RegionToRDDJavaDemo.java`) + - Write Spark pair RDD to Geode (`PairRDDSaveJavaDemo.java`) + - Write Spark non-pair RDD to Geode (`RDDSaveJavaDemo.java`) - Read OQL query result as Spark DataFrame (OQLJavaDemo.java) - Network stateful word count (NetworkWordCount.scala) ### Requirements -Running the demo requires a GemFire Cluster. This can be a one +Running the demo requires a Geode Cluster. This can be a one node or multi-node cluster. -Here are the commands that start a two-node GemFire cluster on localhost: +Here are the commands that start a two-node Geode cluster on localhost: First set up environment variables: ``` export JAVA_HOME=<path to JAVA installation> -export GEMFIRE=<path to GemFire installation> +export GEODE=<path to Geode installation> export CONNECTOR=<path to Connector project> -export CLASSPATH=$CLASSPATH:$GEMFIRE/lib/locator-dependencies.jar:$GEMFIRE/lib/server-dependencies.jar:$GEMFIRE/lib/gfsh-dependencies.jar -export PATH=$PATH:$GEMFIRE/bin +export CLASSPATH=$CLASSPATH:$GEODE/lib/locator-dependencies.jar:$GEODE/lib/server-dependencies.jar:$GEODE/lib/gfsh-dependencies.jar +export PATH=$PATH:$GEODE/bin export GF_JAVA=$JAVA_HOME/bin/java Now run gfsh and execute the commands: -$ cd <path to test GemFire cluster instance location> +$ cd <path to test Geode cluster instance location> $ mkdir locator server1 server2 $ gfsh gfsh> start locator --name=locator @@ -38,7 +38,7 @@ gfsh> create region --name=str_str_region --type=REPLICATE --key-constraint=java gfsh> create region --name=str_int_region --type=PARTITION --key-constraint=java.lang.String --value-constraint=java.lang.Integer ``` -And deploy GemFire functions required by the Spark GemFire Connector: +And deploy Geode functions required by the Spark Geode Connector: ``` gfsh> deploy --jar=<path to connector project>/gemfire-functions/target/scala-2.10/gemfire-functions_2.10-0.5.0.jar ``` @@ -47,7 +47,7 @@ gfsh> deploy --jar=<path to connector project>/gemfire-functions/target/scala-2. This section describes how to run `RDDSaveJavaDemo.java`, `PairRDDSaveJavaDemo.java` and `RegionToRDDJavaDemo.java`: ``` -export SPARK_CLASSPATH=$CONNECTOR/gemfire-spark-connector/target/scala-2.10/gemfire-spark-connector_2.10-0.5.0.jar:$GEMFIRE/lib/server-dependencies.jar +export SPARK_CLASSPATH=$CONNECTOR/gemfire-spark-connector/target/scala-2.10/gemfire-spark-connector_2.10-0.5.0.jar:$GEODE/lib/server-dependencies.jar cd <spark 1.3 dir> bin/spark-submit --master=local[2] --class demo.RDDSaveJavaDemo $CONNECTOR/gemfire-spark-demos/basic-demos/target/scala-2.10/basic-demos_2.10-0.5.0.jar locatorHost[port] @@ -58,7 +58,7 @@ bin/spark-submit --master=local[2] --class demo.RegionToRDDJavaDemo $CONNECTOR/g ``` ### Run stateful network word count -This demo shows how to save DStream to GemFire. To run the demo, open 3 Terminals: +This demo shows how to save DStream to Geode. To run the demo, open 3 Terminals: **Terminal-1**, start net cat server: ``` @@ -70,13 +70,13 @@ $ nc -lk 9999 bin/spark-submit --master=local[2] demo.NetworkWordCount $CONNECTOR/gemfire-spark-demos/basic-demos/target/scala-2.10/basic-demos_2.10-0.5.0.jar localhost 9999 locatorHost:port` ``` -Switch to Terminal-1, type some words, and hit `enter` or `return` key, then check word count at **Terminal-3**, which has `gfsh` connected to the GemFire cluster: +Switch to Terminal-1, type some words, and hit `enter` or `return` key, then check word count at **Terminal-3**, which has `gfsh` connected to the Geode cluster: ``` gfsh> query --query="select key, value from /str_int_region.entrySet" ``` -### Shutdown GemFire cluster at the end -Use following command to shutdown the GemFire cluster after playing with +### Shutdown Geode cluster at the end +Use following command to shutdown the Geode cluster after playing with the demos: ``` gfsh> shutdown --include-locators=true http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/1_building.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/1_building.md b/gemfire-spark-connector/doc/1_building.md index 007d9f1..fd03277 100644 --- a/gemfire-spark-connector/doc/1_building.md +++ b/gemfire-spark-connector/doc/1_building.md @@ -27,9 +27,8 @@ sbt test // unit tests sbt it:test // integration tests ``` -Integration tests start up a GemFire cluster and starts up Spark in local mode. +Integration tests start a Geode cluster and Spark in local mode. Please make sure you've done following before you run `sbt it:test`: - run`sbt package` - - set environment variable `GEMFIRE` to point to a GemFire installation. Next: [Quick Start](2_quick.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/2_quick.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/2_quick.md b/gemfire-spark-connector/doc/2_quick.md index b05302b..ec331c3 100644 --- a/gemfire-spark-connector/doc/2_quick.md +++ b/gemfire-spark-connector/doc/2_quick.md @@ -1,33 +1,33 @@ ## 5 Minutes Quick Start Guide In this quick start guide, you will learn how to use Spark shell to test Spark -GemFire Connector functionalities. +Geode Connector functionalities. ### Prerequisites -Before you start, you should have basic knowledge of GemFire and Apache Spark. -Please refer to [GemFire Documentation](http://gemfire.docs.pivotal.io/latest/userguide/index.html) +Before you start, you should have basic knowledge of Geode and Spark. +Please refer to [Geode Documentation](http://geode.incubator.apache.org/docs/) and [Spark Documentation](https://spark.apache.org/docs/latest/index.html) for -the details. If you are new to GemFire, this -[tutorial](http://gemfire.docs.pivotal.io/latest/userguide/index.html#getting_started/gemfire_tutorial/chapter_overview.html) +details. If you are new to Geode, this +[Quick Start Guide](http://geode-docs.cfapps.io/docs/getting_started/15_minute_quickstart_gfsh.html) is a good starting point. -You need 2 terminals to follow along, one for GemFire `gfsh`, and one for Spark shell. Set up Jdk 1.7 on both of them. +You need 2 terminals to follow along, one for Geode shell `gfsh`, and one for Spark shell. Set up Jdk 1.7 on both of them. -### GemFire `gfsh` terminal -In this terminal, start GemFire cluster, deploy Connector's gemfire-function jar, and create demo regions. +### Geode `gfsh` terminal +In this terminal, start Geode cluster, deploy Spark Geode Connector's gemfire-function jar, and create demo regions. Set up environment variables: ``` export JAVA_HOME=<path to JAVA installation> -export GEMFIRE=<path to GemFire installation> -export CONNECTOR=<path to Spark GemFire Connector project (parent dir of this file)> -export CLASSPATH=$CLASSPATH:$GEMFIRE/lib/locator-dependencies.jar:$GEMFIRE/lib/server-dependencies.jar:$GEMFIRE/lib/gfsh-dependencies.jar -export PATH=$PATH:$GEMFIRE/bin +export GEODE=<path to GEODE installation> +export CONNECTOR=<path to Spark GEODE Connector project (parent dir of this file)> +export CLASSPATH=$CLASSPATH:$GEODE/lib/locator-dependencies.jar:$GEODE/lib/server-dependencies.jar:$GEODE/lib/gfsh-dependencies.jar +export PATH=$PATH:$GEODE/bin export GF_JAVA=$JAVA_HOME/bin/java ``` -Start GemFire cluster with 1 locator and 2 servers: +Start Geode cluster with 1 locator and 2 servers: ``` gfsh gfsh>start locator --name=locator1 --port=55221 @@ -41,7 +41,7 @@ gfsh>create region --name=str_str_region --type=PARTITION --key-constraint=java. gfsh>create region --name=int_str_region --type=PARTITION --key-constraint=java.lang.Integer --value-constraint=java.lang.String ``` -Deploy Connector's gemfire-function jar (`gemfire-functions_2.10-0.5.0.jar`): +Deploy Spark Geode Connector's gemfire-function jar (`gemfire-functions_2.10-0.5.0.jar`): ``` gfsh>deploy --jar=<path to connector project>/gemfire-functions/target/scala-2.10/gemfire-functions_2.10-0.5.0.jar ``` @@ -49,7 +49,7 @@ gfsh>deploy --jar=<path to connector project>/gemfire-functions/target/scala-2.1 ### Spark shell terminal In this terminal, setup Spark environment, and start Spark shell. -Set GemFire locator property in Spark configuration: add +Set Geode locator property in Spark configuration: add following to `<spark-dir>/conf/spark-defaults.conf`: ``` spark.gemfire.locators=localhost[55221] @@ -69,24 +69,24 @@ under the same directory to `log4j.properties` and update the file. Start spark-shell: ``` -bin/spark-shell --master local[*] --jars $CONNECTOR/gemfire-spark-connector/target/scala-2.10/gemfire-spark-connector_2.10-0.5.0.jar,$GEMFIRE/lib/server-dependencies.jar +bin/spark-shell --master local[*] --jars $CONNECTOR/gemfire-spark-connector/target/scala-2.10/gemfire-spark-connector_2.10-0.5.0.jar,$GEODE/lib/server-dependencies.jar ``` -Check GemFire locator property in the Spark shell: +Check Geode locator property in the Spark shell: ``` scala> sc.getConf.get("spark.gemfire.locators") res0: String = localhost[55221] ``` -In order to enable GemFire specific functions, you need to import +In order to enable Geode specific functions, you need to import `io.pivotal.gemfire.spark.connector._` ``` scala> import io.pivotal.gemfire.spark.connector._ import io.pivotal.gemfire.spark.connector._ ``` -### Save Pair RDD to GemFire -In the Spark shell, create a simple pair RDD and save it to GemFire: +### Save Pair RDD to Geode +In the Spark shell, create a simple pair RDD and save it to Geode: ``` scala> val data = Array(("1", "one"), ("2", "two"), ("3", "three")) data: Array[(String, String)] = Array((1,one), (2,two), (3,three)) @@ -98,7 +98,7 @@ scala> distData.saveToGemfire("str_str_region") 15/02/17 07:11:54 INFO DAGScheduler: Job 0 finished: runJob at GemFireRDDFunctions.scala:29, took 0.341288 s ``` -Verify the data is saved in GemFile using `gfsh`: +Verify the data is saved in Geode using `gfsh`: ``` gfsh>query --query="select key,value from /str_str_region.entries" @@ -116,8 +116,8 @@ key | value NEXT_STEP_NAME : END ``` -### Save Non-Pair RDD to GemFire -Saving non-pair RDD to GemFire requires an extra function that converts each +### Save Non-Pair RDD to Geode +Saving non-pair RDD to Geode requires an extra function that converts each element of RDD to a key-value pair. Here's sample session in Spark shell: ``` scala> val data2 = Array("a","ab","abc") @@ -150,7 +150,7 @@ key | value NEXT_STEP_NAME : END ``` -### Expose GemFire Region As RDD +### Expose Geode Region As RDD The same API is used to expose both replicated and partitioned region as RDDs. ``` scala> val rdd = sc.gemfireRegion[String, String]("str_str_region") @@ -173,6 +173,6 @@ Note: use the right type of region key and value, otherwise you'll get ClassCastException. -Next: [Connecting to GemFire](3_connecting.md) +Next: [Connecting to Geode](3_connecting.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/3_connecting.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/3_connecting.md b/gemfire-spark-connector/doc/3_connecting.md index bac9785..8428657 100644 --- a/gemfire-spark-connector/doc/3_connecting.md +++ b/gemfire-spark-connector/doc/3_connecting.md @@ -1,11 +1,11 @@ -## Connecting to GemFire +## Connecting to Geode -There are two ways to connect Spark to Gemfire: - - Specify GemFire connection properties via `SparkConf`. - - Specify GemFire connection properties via `GemFireConnectionConf`. +There are two ways to connect Spark to Geode: + - Specify Geode connection properties via `SparkConf`. + - Specify Geode connection properties via `GemFireConnectionConf`. -### Specify GemFire connection properties via `SparkConf` -The only required GemFire connection property is `spark.gemfire.locators`. +### Specify Geode connection properties via `SparkConf` +The only required Geode connection property is `spark.gemfire.locators`. This can be specified in `<spark dir>/conf/spark-defaults.conf` or in Spark application code. In the following examples, we assume you want to provide 3 extra properties: `security-client-auth-init`, `security-username`, and @@ -31,7 +31,7 @@ val sparkConf = new SparkConf() After this, you can use all connector APIs without providing `GemfireConnectionConf`. -### Specify GemFire connection properties via `GemFireConnectionConf` +### Specify Geode connection properties via `GemFireConnectionConf` Here's the code that creates `GemFireConnectionConf` with the same set of properties as the examples above: ``` @@ -48,8 +48,8 @@ After this, you can use all connector APIs that require `GemFireConnectionConf`. ### Notes about locators - You can specify locator in two formats: `host[port]` or `host:port`. For example `192.168.1.47[10334]` or `192.168.1.47:10334` - - If your GemFire cluster has multiple locators, list them all and separated + - If your Geode cluster has multiple locators, list them all and separated by `,`. For example: `host1:10334,host2:10334`. -Next: [Loading Data from GemFire](4_loading.md) +Next: [Loading Data from Geode](4_loading.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/4_loading.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/4_loading.md b/gemfire-spark-connector/doc/4_loading.md index fb03660..b67a96e 100644 --- a/gemfire-spark-connector/doc/4_loading.md +++ b/gemfire-spark-connector/doc/4_loading.md @@ -1,6 +1,6 @@ -## Loading Data from GemFire +## Loading Data from Geode -To expose full data set of a GemFire region as a Spark +To expose full data set of a Geode region as a Spark RDD, call `gemfireRegion` method on the SparkContext object. ``` @@ -8,14 +8,14 @@ val rdd = sc.gemfireRegion("region path") ``` Or with specific `GemfireConectionConf` object instance (see -[Connecting to Gemfire](3_connecting.md) for how to create GemfireConectionConf): +[Connecting to Geode](3_connecting.md) for how to create GemfireConectionConf): ``` val rdd = sc.gemfireRegion("region path", connConf) ``` -## GemFire RDD Partitions +## Geode RDD Partitions -GemFire has two region types: **replicated**, and +Geode has two region types: **replicated**, and **partitioned** region. Replicated region has full dataset on each server, while partitioned region has its dataset spanning upon multiple servers, and may have duplicates for high @@ -28,9 +28,9 @@ represents a replicated region. For a `GemFireRegionRDD` that represents a partitioned region, there are many potential ways to create RDD partitions. So far, we have implemented ServerSplitsPartitioner, which will split the bucket set -on each GemFire server into two RDD partitions by default. +on each Geode server into two RDD partitions by default. The number of splits is configurable, the following shows how to set -three partitions per GemFire server: +three partitions per Geode server: ``` import io.pivotal.gemfire.spark.connector._ @@ -43,17 +43,21 @@ val rdd2 = sc.gemfireRegion[String, Int]("str_int_region", connConf, opConf) ``` -## GemFire Server-Side Filtering -Server-side filtering allow exposing partial dataset of a GemFire region -as a RDD, this reduces the amount of data transferred from GemFire to +## Geode Server-Side Filtering +Server-side filtering allow exposing partial dataset of a Geode region +as a RDD, this reduces the amount of data transferred from Geode to Spark to speed up processing. ``` val rdd = sc.gemfireRegion("<region path>").where("<where clause>") ``` -The above call is translated to OQL query `select key, value from /<region path>.entries where <where clause>`, then the query is executed for each RDD partition. Note: the RDD partitions are created the same way as described in the section above. +The above call is translated to OQL query `select key, value from /<region path>.entries where <where clause>`, then +the query is executed for each RDD partition. Note: the RDD partitions are created the same way as described in the +section above. -In the following demo, javabean class `Emp` is used, it has 5 attributes: `id`, `lname`, `fname`, `age`, and `loc`. In order to make `Emp` class available on GemFire servers, we need to deploy a jar file that contains `Emp` class, now build the `emp.jar`, deploy it and create region `emps` in `gfsh`: +In the following demo, javabean class `Emp` is used, it has 5 attributes: `id`, `lname`, `fname`, `age`, and `loc`. +In order to make `Emp` class available on Geode servers, we need to deploy a jar file that contains `Emp` class, +now build the `emp.jar`, deploy it and create region `emps` in `gfsh`: ``` zip $CONNECTOR/gemfire-spark-demos/basic-demos/target/scala-2.10/basic-demos_2.10-0.5.0.jar \ -i "demo/Emp.class" --out $CONNECTOR/emp.jar @@ -62,9 +66,12 @@ gfsh gfsh> deploy --jar=<path to connector project>/emp.jar gfsh> create region --name=emps --type=PARTITION ``` -Note: The `Emp.class` is availble in `basic-demos_2.10-0.5.0.jar`. But that jar file depends on many scala and spark classes that are not available on GemFire servers' classpath. So use the above `zip` command to create a jar file that only contains `Emp.class`. +Note: The `Emp.class` is availble in `basic-demos_2.10-0.5.0.jar`. But that jar file depends on many scala and spark +classes that are not available on Geode servers' classpath. So use the above `zip` command to create a jar file that +only contains `Emp.class`. -Now in Spark shell, generate some random `Emp` records, and save them to region `emps` (remember to add `emp.jar` to Spark shell classpath before starting Spark shell): +Now in Spark shell, generate some random `Emp` records, and save them to region `emps` (remember to add `emp.jar` to +Spark shell classpath before starting Spark shell): ``` import io.pivotal.gemfire.spark.connector._ import scala.util.Random @@ -98,4 +105,4 @@ rdd1s.foreach(println) (6,Emp(6, Miller, Jerry, 30, NY)) ``` -Next: [RDD Join and Outer Join GemFire Region](5_rdd_join.md) +Next: [RDD Join and Outer Join Geode Region](5_rdd_join.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/5_rdd_join.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/5_rdd_join.md b/gemfire-spark-connector/doc/5_rdd_join.md index edc86e8..40b14ae 100644 --- a/gemfire-spark-connector/doc/5_rdd_join.md +++ b/gemfire-spark-connector/doc/5_rdd_join.md @@ -1,12 +1,12 @@ -## RDD Join and Outer Join GemFire Region +## RDD Join and Outer Join Geode Region -The Spark GemFire Connector suports using any RDD as a source -of a join and outer join with a GemFire region through APIs +The Spark Geode Connector suports using any RDD as a source +of a join and outer join with a Geode region through APIs `joinGemfireRegion[K, V]` and `outerJoinGemfireRegion[K, V]`. Those two APIs execute a single `region.getAll` call for every partition of the source RDD, so no unnecessary data will be requested or transferred. This means a join or outer join between any RDD and -a GemFire region can be performed without full region scan, and the +a Geode region can be performed without full region scan, and the source RDD's partitioning and placement for data locality are used. Please note that the two type parameters `[K, V]` are the type @@ -14,7 +14,7 @@ of key/value pair of region entries, they need to be specified to make result RDD has correct type. The region `emps` that is created and populated in -[GemFire Server-Side Filtering](4_loading.md) will be used in the +[Geode Server-Side Filtering](4_loading.md) will be used in the following examples. ### RDD[(K, V1)] join and outer join Region[K, V2] @@ -98,7 +98,7 @@ In this case, the source RDD is still a pair RDD, but it has different key type. Use API `rdd.joinGemfireRegion[K2, V2](regionPath, func)` and `rdd.outerJoinGemfireRegion[K2, V2](regionPath, func)` do the join and outer join, where `func` is the function to generate key from (k, v) -pair, the element of source RDD, to join with GemFire region. +pair, the element of source RDD, to join with Geode region. Prepare a source RDD `d3`: ``` @@ -169,7 +169,7 @@ rdd3o.foreach(println) Use API `rdd.joinGemfireRegion[K, V](regionPath, func)` and `rdd.outerJoinGemfireRegion[K, V](regionPath, func)` do the join and outer join, where `func` is the function to generate key from -`t`, the element of source RDD, to join with GemFire region. +`t`, the element of source RDD, to join with Geode region. Prepare a source RDD `d4`: ``` @@ -234,4 +234,4 @@ rdd4o.foreach(println) ``` -Next: [Saving RDD to GemFire](6_save_join.md) +Next: [Saving RDD to Geode](6_save_join.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/6_save_rdd.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/6_save_rdd.md b/gemfire-spark-connector/doc/6_save_rdd.md index 8516955..1ebc027 100644 --- a/gemfire-spark-connector/doc/6_save_rdd.md +++ b/gemfire-spark-connector/doc/6_save_rdd.md @@ -1,16 +1,16 @@ -## Saving RDD to GemFire +## Saving RDD to Geode -It is possible to save any RDD to a GemFire region. The requirements are: +It is possible to save any RDD to a Geode region. The requirements are: - the object class of the elements contained by the RDD is - (1) available on the classpath of GemFire servers + (1) available on the classpath of Geode servers (2) and serializable. - the target region exists. -To save an RDD to an existing GemFire region, import +To save an RDD to an existing Geode region, import `io.pivotal.gemfire.spark.connector._` and call the `saveToGemfire` method on RDD. -### Save RDD[(K, V)] to GemFire +### Save RDD[(K, V)] to Geode For pair RDD, i.e., RDD[(K, V)], the pair is treated as key/value pair. ``` val data = Array(("1","one"),("2","two"),("3","three")) @@ -19,7 +19,7 @@ rdd.saveToGemfire("str_str_region") ``` If you create GemFireConnectionConf as described in -[Connecting to Gemfire](3_connecting.md), the last statement becomes: +[Connecting to Geode](3_connecting.md), the last statement becomes: ``` rdd.saveToGemFire("str_str_region", connConf) ``` @@ -40,14 +40,14 @@ key | value 2 | two ``` -Note that GemFire regions require unique keys, so if the pair RDD +Note that Geode regions require unique keys, so if the pair RDD contains duplicated keys, those pairs with the same key are overwriting each other, and only one of them appears in the final dataset. -### Save RDD[T] to GemFire -To save non-pair RDD to GemFire, a function (`f: T => K`) that creates keys +### Save RDD[T] to Geode +To save non-pair RDD to Geode, a function (`f: T => K`) that creates keys from elements of RDD, and is used to convert RDD element `T` to pair `(f(T), T)`, -then the pair is save to GemFire. +then the pair is save to Geode. ``` val data2 = Array("a","ab","abc") @@ -58,4 +58,4 @@ rdd2.saveToGemfire("str_int_region", e => (e, e.length)) ``` -Next: [Saving DStream to GemFire](7_save_dstream.md) +Next: [Saving DStream to Geode](7_save_dstream.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/7_save_dstream.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/7_save_dstream.md b/gemfire-spark-connector/doc/7_save_dstream.md index 17903fa..ecc793b 100644 --- a/gemfire-spark-connector/doc/7_save_dstream.md +++ b/gemfire-spark-connector/doc/7_save_dstream.md @@ -1,8 +1,8 @@ -## Saving DStream to GemFire +## Saving DStream to Geode Spark Streaming extends the core API to allow high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources such as Akka, Kafka, Flume, Twitter, ZeroMQ, TCP sockets, etc. -Results can be stored in GemFire. +Results can be stored in Geode. ### A Simple Spark Streaming App: Stateful Network Word Count @@ -46,8 +46,8 @@ ssc.start() ssc.awaitTermination() // Wait for the computation to terminate ``` -#### Spark Streaming With GemFire -Now let's save the running word count to GemFire region `str_int_region`, which +#### Spark Streaming With Geode +Now let's save the running word count to Geode region `str_int_region`, which simply replace print() with saveToGemfire(): ``` @@ -65,4 +65,4 @@ See [Spark Streaming Programming Guide] more details about Sarpk streaming programming. -Next: [GemFire OQL](8_oql.md) +Next: [Geode OQL](8_oql.md) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/8_oql.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/8_oql.md b/gemfire-spark-connector/doc/8_oql.md index bad1f3c..f409698 100644 --- a/gemfire-spark-connector/doc/8_oql.md +++ b/gemfire-spark-connector/doc/8_oql.md @@ -1,7 +1,7 @@ -## GemFire OQL Query -Spark GemFire Connector lets us run GemFire OQL queries in Spark applications -to retrieve data from GemFire. The query result is a Spark DataFrame. Note -that as of Spark 1.3, SchemaRDD is deprecated. Spark GemFire Connector does +## Geode OQL Query +Spark Geode Connector lets us run Geode OQL queries in Spark applications +to retrieve data from Geode. The query result is a Spark DataFrame. Note +that as of Spark 1.3, SchemaRDD is deprecated. Spark Geode Connector does not support SchemaRDD. An instance of `SQLContext` is required to run OQL query. @@ -24,7 +24,7 @@ val SQLResult = sqlContext.sql("SELECT * FROM customer WHERE id > 100") ##Serialization If the OQL query involves User Defined Type (UDT), and the default Java -serializer is used, then the UDT on GemFire must implement `java.io.Serializable`. +serializer is used, then the UDT on Geode must implement `java.io.Serializable`. If KryoSerializer is preferred, as described in [Spark Documentation] (https://spark.apache.org/docs/latest/tuning.html), you can configure @@ -50,9 +50,7 @@ Use the following options to start Spark shell: ``` ## References -[GemFire OQL Documentation](http://gemfire.docs.pivotal.io/latest/userguide/index.html#developing/querying_basics/chapter_overview.html) - -[GemFire OQL Examples](http://gemfire.docs.pivotal.io/latest/userguide/index.html#getting_started/quickstart_examples/querying.html) +[Geode OQL Documentation](http://geode-docs.cfapps.io/docs/developing/querying_basics/chapter_overview.html) [Spark SQL Documentation](https://spark.apache.org/docs/latest/sql-programming-guide.html) http://git-wip-us.apache.org/repos/asf/incubator-geode/blob/60d074cc/gemfire-spark-connector/doc/9_java_api.md ---------------------------------------------------------------------- diff --git a/gemfire-spark-connector/doc/9_java_api.md b/gemfire-spark-connector/doc/9_java_api.md index f13fb9b..b9ac91e 100644 --- a/gemfire-spark-connector/doc/9_java_api.md +++ b/gemfire-spark-connector/doc/9_java_api.md @@ -1,13 +1,13 @@ ## Using Connector in Java -This section describes how to access the functionality of Spark GemFire +This section describes how to access the functionality of Spark Geode Connector when you write your Spark applications in Java. It is assumed that you already familiarized yourself with the previous sections and -understand how the Spark GemFire Connector works. +understand how the Spark Geode Connector works. ### Prerequisites -The best way to use the Spark GemFire Connector Java API is to statically +The best way to use the Spark Geode Connector Java API is to statically import all of the methods in `GemFireJavaUtil`. This utility class is -the main entry point for Spark GemFire Connector Java API. +the main entry point for Spark Geode Connector Java API. ``` import static io.pivotal.gemfire.spark.connector.javaapi.GemFireJavaUtil.*; ``` @@ -19,8 +19,8 @@ conf.set(GemFireLocatorPropKey, "192.168.1.47[10334]") JavaSparkContext jsc = new JavaSparkContext(conf); ``` -### Accessing GemFire region in Java -GemFire region is exposed as `GemFireJavaRegionRDD<K,V>`(subclass of +### Accessing Geode region in Java +Geode region is exposed as `GemFireJavaRegionRDD<K,V>`(subclass of `JavaPairRDD<K, V>`): ``` GemFireJavaRegionRDD<Int, Emp> rdd1 = javaFunctions(jsc).gemfireRegion("emps") @@ -46,7 +46,7 @@ JavaPairRDD<Tuple2<String, Integer>, Option<Emp>> rdd3o = ``` -### Saving JavaPairRDD to GemFire +### Saving JavaPairRDD to Geode Saving JavaPairRDD is straightforward: ``` List<Tuple2<String, String>> data = new ArrayList<>(); @@ -56,7 +56,7 @@ data.add(new Tuple2<>("9", "nine")); // create JavaPairRDD JavaPairRDD<String, String> rdd1 = jsc.parallelizePairs(data); -// save to GemFire +// save to Geode javaFunctions(rdd1).saveToGemfire("str_str_region"); ``` @@ -70,11 +70,11 @@ data2.add(new Tuple2<>("13", "thirteen")); // create JavaRDD<Tuple2<K,V>> JavaRDD<Tuple2<String, String>> rdd2 = jsc.parallelize(data2); -// save to GemFire +// save to Geode javaFunctions(toJavaPairRDD(rdd2)).saveToGemfire("str_str_region"); ``` -### Saving JavaRDD to GemFire +### Saving JavaRDD to Geode Similar to Scala version, a function is required to generate key/value pair from RDD element. The following `PairFunction` generate a `<String, Integer>` pair from `<String>`: @@ -114,7 +114,7 @@ JavaDStream<String> ds2 = ... javaFunctions(ds2).saveToGemFire("str_int_region", pairFunc); ``` -### Using GemFire OQL +### Using Geode OQL There are two gemfireOQL Java APIs, with and without GemFireConnectionConf. Here is an example without GemFireConnectionConf, it will use default