Update pig readme
Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2dc27a17 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2dc27a17 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2dc27a17 Branch: refs/heads/trunk Commit: 2dc27a17567fa448aae335e74cc46ab94339eba4 Parents: db68e03 Author: Brandon Williams <brandonwilli...@apache.org> Authored: Sat May 26 10:50:00 2012 -0500 Committer: Brandon Williams <brandonwilli...@apache.org> Committed: Sat May 26 10:50:00 2012 -0500 ---------------------------------------------------------------------- examples/pig/README.txt | 19 +++++++++++++++++-- 1 files changed, 17 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cassandra/blob/2dc27a17/examples/pig/README.txt ---------------------------------------------------------------------- diff --git a/examples/pig/README.txt b/examples/pig/README.txt index 3bdbf10..57b8f57 100644 --- a/examples/pig/README.txt +++ b/examples/pig/README.txt @@ -1,7 +1,8 @@ A Pig storage class that reads all columns from a given ColumnFamily, or writes properly formatted results into a ColumnFamily. -Setup: +Getting Started +=============== First build and start a Cassandra server with the default configuration and set the PIG_HOME and JAVA_HOME environment @@ -31,7 +32,6 @@ for input and output: * PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing * PIG_OUTPUT_PARTITIONER : cluster partitioner for writing - Then you can run it like this: examples/pig$ bin/pig_cassandra -x local example-script.pig @@ -70,3 +70,18 @@ Which will copy the ColumnFamily. Note that the destination ColumnFamily must already exist for this to work. See the example in test/ to see how schema is inferred. + +Advanced Options +================ + +The following environment variables default to false but can be set to true to enable them: + +PIG_WIDEROW_INPUT: this enables loading of rows with many columns without + incurring memory pressure. All columns will be in a bag and indexes are not + supported. + +PIG_USE_SECONDARY: this allows easy use of secondary indexes within your + script, by appending every index to the schema as 'index_$name', allowing + filtering of loaded rows with a statement like "FILTER rows BY index_color eq + 'blue'" if you have an index called 'color' defined. +