Fixed Documentation

Project: http://git-wip-us.apache.org/repos/asf/vxquery/repo
Commit: http://git-wip-us.apache.org/repos/asf/vxquery/commit/e791fe39
Tree: http://git-wip-us.apache.org/repos/asf/vxquery/tree/e791fe39
Diff: http://git-wip-us.apache.org/repos/asf/vxquery/diff/e791fe39

Branch: refs/heads/steven/hdfs
Commit: e791fe399fae38ef785a34b31e8e63c80b777848
Parents: 66ff50a
Author: Steven Jacobs <[email protected]>
Authored: Wed May 18 14:30:51 2016 -0700
Committer: Steven Jacobs <[email protected]>
Committed: Wed May 18 14:30:51 2016 -0700

----------------------------------------------------------------------
 src/site/apt/user_query.apt      |  1 +
 src/site/apt/user_query_hdfs.apt | 51 +++++++++++++++++------------------
 2 files changed, 26 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/vxquery/blob/e791fe39/src/site/apt/user_query.apt
----------------------------------------------------------------------
diff --git a/src/site/apt/user_query.apt b/src/site/apt/user_query.apt
index a022825..e9447b2 100644
--- a/src/site/apt/user_query.apt
+++ b/src/site/apt/user_query.apt
@@ -48,6 +48,7 @@ vxq.bat
 -showrp                    : Show Runtime plan
 -showtet                   : Show translated expression tree
 -timing                    : Produce timing information
+-hdfs-conf VAL             : The folder containing the HDFS configuration files
 ----------------------------------------
 
 * Java Options

http://git-wip-us.apache.org/repos/asf/vxquery/blob/e791fe39/src/site/apt/user_query_hdfs.apt
----------------------------------------------------------------------
diff --git a/src/site/apt/user_query_hdfs.apt b/src/site/apt/user_query_hdfs.apt
index 12b04e8..fa736b8 100644
--- a/src/site/apt/user_query_hdfs.apt
+++ b/src/site/apt/user_query_hdfs.apt
@@ -18,25 +18,20 @@ Executing a Query in HDFS
 
 * 1. Connecting VXQuery with HDFS
 
-    The only configuration you need to define, is the ip address of the 
node(s) that
-    you want to run the queries.
-
-    This information should be defined in the <local.xml> or <cluster.xml> 
file at 
-    <vxquery-server/src/main/resources/conf/> .
-
-    You can find the ip of each node located in the /etc/hosts file.There 
should be at
-    least two different ips defined in that file.One for localhost and one 
with the hostname
-    of the node.The correct one for this configuration is the one of the host.
-
-    For example:
-    An ubuntu /etc/hosts file could look like this:
-
------------------------------
-127.0.0.1       localhost
-127.0.1.1       node1
-----------------------------
+  In order to read HDFS data, VXQuery needs access to the HDFS configuration
+  directory, which contains:
+  
+    core-site.xml
+    hdfs-site.xml
+    mapred-site.xml
     
-    The ip <127.0.1.1> along with the hostname <node1> should be defined in 
the <local.xml> or <cluster.xml> file.
+  Some systems may automatically set this directory as a system environment
+  variable ("HADOOP_CONF_DIR"). If this is the case, VXQuery will retrieve
+  this automatically when attempting to perform HDFS queries.
+  
+  When this variable is not set, users will need to provide this directory as
+  a Command Line Option when executing VXQuery:
+    -hdfs-conf /path/to/hdfs/conf_folder
 
 
 * 2. Running the Query
@@ -53,9 +48,9 @@ Executing a Query in HDFS
 
 ** a. Reading them as whole files.
 
-  For this option you only need to change the path to files.To define that 
your 
+  For this option you only need to change the path to files. To define that 
your 
   file(s) exist and should be read from HDFS you must add <"hdfs:/"> in front 
-  of the path.VXQuery will read the path of the files you request in your 
query 
+  of the path. VXQuery will read the path of the files you request in your 
query 
   and try to locate them.
 
 
@@ -63,7 +58,8 @@ Executing a Query in HDFS
   to make sure that
 
 
-  a) The configuration path is correctly set to the one of your HDFS system.
+  a) The environmental variable is set for "HADOOP_CONF_DIR" or you pass the 
+  directory location using -hdfs-conf
 
 
   b) The path defined in your query begins with <hdfs://> and the full path to 
@@ -106,17 +102,20 @@ return $x/title
 ** b. Reading them block by block
 
 
-  In order to use that option you need to modify your query.Instead of using 
the 
+  In order to use that option you need to modify your query. Instead of using 
the 
   <collection> or <doc> function to define your input file(s) you need to use 
   <collection-with-tag>.
 
 
   <collection-with-tag> accepts two arguments, one is the path to the HDFS 
directory 
   you have stored your input files, and the second is a specific <<tag>> that 
exists 
-  in the input file(s).This is the tag of the element that contains the fields 
that 
+  in the input file(s). This is the tag of the element that contains the 
fields that 
   your query is looking for.
 
   Other than these arguments, you do not need to change anything else in the 
query.
+  
+  Note: since this strategy is optimized to read block by block, the result 
will 
+  include all elements with the given tag, regardless of depth within the xml 
tree.
 
 
 *** Example
@@ -169,13 +168,13 @@ return $x/title
 
 
 ----------------------------
-for $x in collectionwithtag("hdfs://user/hduser/store","book")/book
+for $x in collection-with-tag("hdfs://user/hduser/store","book")/book
 where $x/year>2004
 return $x/title
 ----------------------------
 
 
   Take notice that I defined the path to the directory containing the file(s) 
-  and not the file, <collection-with-tag> expects path to the directory. I also
-  added the </book> after the function.This is also needed, like <collection> 
and
+  and not the file, <collection-with-tag> expects the path to the directory. I 
also
+  added the </book> after the function. This is also needed, like <collection> 
and
   <doc> functions, for the query to be parsed correctly.

Reply via email to