[jena-site] 01/02: More on xloader

andy Mon, 15 Nov 2021 09:05:39 -0800

This is an automated email from the ASF dual-hosted git repository.

andy pushed a commit to branch more-doc
in repository https://gitbox.apache.org/repos/asf/jena-site.git


commit 018c0a824a738cd9fd8732b5d62cd70a00744525
Author: Andy Seaborne <[email protected]>
AuthorDate: Mon Nov 15 17:03:29 2021 +0000

    More on xloader
---
 source/documentation/tdb/commands.md    | 53 +++++++++++++++++++--------------
 source/documentation/tdb/faqs.md        | 26 ++++++++++++++--
 source/documentation/tdb/tdb-xloader.md | 29 ++++++++++--------
 3 files changed, 71 insertions(+), 37 deletions(-)

diff --git a/source/documentation/tdb/commands.md 
b/source/documentation/tdb/commands.md
index baa2e27..9a9af88 100644
--- a/source/documentation/tdb/commands.md
+++ b/source/documentation/tdb/commands.md
@@ -13,7 +13,7 @@ title: TDB Command-line Utilities
 -   [TDB Commands](#tdb-commands)
     -   [Store description](#store-description)
     -   [tdbloader](#tdbloader)
-    -   [tdbloader2](#tdbloader2)
+    -   [TDB xloader](#tdb-xloader)
     -   [tdbquery](#tdbquery)
     -   [tdbdump](#tdbdump)
     -   [tdbstats](#tdbstats)
@@ -98,10 +98,37 @@ are loaded into the dataset according to the name or the 
default graph.
 Bulk loader and index builder. Performs bulk load operations more
 efficiently than simply reading RDF into a TDB-back model.
 
+### tdb.xloader
+
+`tdb1.xloader` and `tdb2.xloader` are bulk loaders for very large data for TDB1
+and TDB2.
+
+See [TDB xloader](./tdb-xloader.html) for more information. These loaders only
+work on Linux and Mac OS/X since it relies on some Unix system utilities.
+
+### `tdbquery`
+
+Invoke a SPARQL query on a store. Use `--time` for timing
+information. The store is attached on each run of this command so
+timing includes some overhead not present in a running system.
+
+Details about query execution can be obtained -- see notes on the
+[TDB Optimizer](optimizer.html#investigating-what-is-going-on).
+
+### `tdbdump`
+
+Dump the store in
+[N-Quads](http://www.w3.org/TR/n-quads/)
+format.
+
+### `tdbstats`
+
+Produce a statistics for the dataset. See the
+[TDB Optimizer description.](optimizer.html#statistics-rule-file).
+
 ### `tdbloader2`
 
-Bulk loader and index builder. Faster than `tdbloader` but only works
-on Linux and Mac OS/X since it relies on some Unix system utilities.
+*This has been replace by  [TDB xloader](./tdb-xloader.html).*
 
 This bulk loader can only be used to create a database. It may
 overwrite existing data. It requires accepts the `--loc` argument and a
@@ -130,23 +157,3 @@ If you are building a large dataset (i.e. gigabytes of 
input data) you may
 wish to have the [PipeViewer](http://www.ivarch.com/programs/pv.shtml)
 tool installed on your system as this will provide extra progress information 
 during the indexing phase of the build.
-
-### `tdbquery`
-
-Invoke a SPARQL query on a store. Use `--time` for timing
-information. The store is attached on each run of this command so
-timing includes some overhead not present in a running system.
-
-Details about query execution can be obtained -- see notes on the
-[TDB Optimizer](optimizer.html#investigating-what-is-going-on).
-
-### `tdbdump`
-
-Dump the store in
-[N-Quads](http://www.w3.org/TR/n-quads/)
-format.
-
-### tdbstats
-
-Produce a statistics for the dataset. See the
-[TDB Optimizer description.](optimizer.html#statistics-rule-file).
diff --git a/source/documentation/tdb/faqs.md b/source/documentation/tdb/faqs.md
index b7f9f19..e7479c8 100644
--- a/source/documentation/tdb/faqs.md
+++ b/source/documentation/tdb/faqs.md
@@ -4,6 +4,8 @@ title: TDB FAQs
 
 ## FAQs
 
+
+-   [What are TDB1 and TDB2?](#tdv1-tdb2)
 -   [Does TDB support Transactions?](#transactions)
 -   [Can I share a TDB dataset between multiple applications?](#multi-jvm)
 -   [What is the *Impossibly Large Object* 
exception?](#impossibly-large-object)
@@ -18,6 +20,15 @@ title: TDB FAQs
 -   [What is the *Unable to check TDB lock owner, the lock file contents 
appear to be for a TDB2 database. Please try loading this location as a TDB2 
database* error?](#tdb2-lock)
 -   [My question isn't answered here?](#not-answered)
 
+<a name="tdb1-tdb2></a>
+## TDB1 and TDB2
+
+TDB2 is a later generation of database for Jena. It is more robust and can
+handle large update transactions.
+
+These are different databases systems - the have different on-disk file formats
+and databases for one are not compatible with other database engine.
+
 <a name="transactions"></a>
 ## Does TDB support transactions?
 
@@ -37,11 +48,11 @@ transactionally.
 ## Can I share a TDB dataset between multiple applications?
 
 Multiple applications, running in multiple JVMs, using the same
-file databases is **not** supported and has a high risk of data corruption.  
Once corrupted a database cannot be repaired
+file databases is **not** supported and has a high risk of data corruption.  
Once corrupted, a database cannot be repaired
 and must be rebuilt from the original source data. Therefore there **must** be 
a single JVM
 controlling the database directory and files.
 
-From 1.1.0 onwards TDB includes automatic prevention of multi-JVM usage which 
prevents this under most circumstances and helps
+TDB includes automatic prevention of multi-JVM usage which prevents this under 
most circumstances and helps
 protect your data from corruption.
 
 If you wish to share a TDB dataset between applications use our 
[Fuseki](../fuseki2/) component which provides a 
@@ -77,11 +88,22 @@ As noted above to resolve this problem you **must** rebuild 
your database from t
 be repaired. This is why we **strongly** recommend you use 
[transactions](tdb_transactions.html) since this protects your dataset against 
 corruption.
 
+## What is `tdb.xloader`?
+
+`tdb1.xloader` and `tdb2.xloader` are bulk laodrs for very large dataset that
+take several hours to load.
+
+See [TDB xloader](./tdb-xloader.html) for more information.
+
 <a name="tdbloader-vs-tdbloader2"></a>
 ## What is the different between `tdbloader` and `tdbloader2`?
 
+`tdbloader2` has been replaced by `tdb1.xloader` and `tdb2.xloader` for TDB1 
and TDB2 respectively.
+
+
 `tdbloader` and `tdbloader2` differ in how they build databases.
 
+
 `tdbloader` is Java based and uses the same TDB APIs that you would use in 
your own Java code to perform the data load.  The advantage of this is that
 it supports incremental loading of data into a TDB database.  The downside is 
that the loader will be slower for initial database builds.
 
diff --git a/source/documentation/tdb/tdb-xloader.md 
b/source/documentation/tdb/tdb-xloader.md
index 443e18e..c6feaec 100644
--- a/source/documentation/tdb/tdb-xloader.md
+++ b/source/documentation/tdb/tdb-xloader.md
@@ -7,16 +7,19 @@ is stability and reliability for long running loading, 
running on modest and
 
 xloader is not a replacement for regular TDB1 and TDB2 loaders.
 
-"tdb1.xloader" was called "tdbloader2" and has some improvements.
+There are two scripts to load data using the xlaoder subsystem.
+
+"tdb1.xloader", which was called "tdbloader2" and has some improvements.
 
 It is not as fast as other TDB loaders on dataset where the general loaders 
work
 on without encountering progressive slowdown.
 
-The xloaders for TDB1 and TDB2 are not identical. The TDB2 is more capable; it
-is based on the same design approach with further refinements to building the
-node table and to reduce the total amount of temporary file space used.
+The xloaders for TDB1 and TDB2 are not identical. The TDB2 xlaoder is more
+capable; it is based on the same design approach with further refinements to
+building the node table and to reduce the total amount of temporary file space
+used.
 
-The xloader does not run on MS Windows. It uses and external sort program from
+The xloader does not run on MS Windows. It uses an external sort program from
 unix - `sort(1)`.
 
 The xloader only builds a fresh database from empty.
@@ -30,22 +33,24 @@ or
 
 `tdb1.xloader --loc DIRECTORY` FILE...
 
-Additioally, there is an argument `--tmpdir` to use a different directory for
+Additionally, there is an argument `--tmpdir` to use a different directory for
 temporary files.
 
-`FILE` is any RDF syntax supported by Jena.
+`FILE` is any RDF syntax supported by Jena. Syntax is detemined by file
+extension and can include an addtional ".gz" or ".bz2" for compresses files.
 
 ### Advice
 
-`xloader` uses a lot of temporary disk space. 
-
 To avoid a load failing due to a syntax or other data error, it is advisable to
 run `riot --check` on the data first. Parsing is faster than loading.
 
-If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at
-this stage by adding `--stream rdf-thrift` to the riot checking run.
-Parsing RDF Thrift is faster than parsing N-Triples although the bulk of the 
loading process is not limited by parser speed.
+The TDB databases will take up a lot of disk space and in addition during
+loading `xloader` uses a significant amout of temporary disk space.
 
+If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at
+this stage by adding `--stream rdf-thrift` to the riot checking run.  Parsing
+RDF Thrift is faster than parsing N-Triples although the bulk of the loading
+process is not limited by parser speed.
 
 Do not capture the bulk loader output in a file on the same disk as the 
database
 or temporary directory; it slows loading down.

[jena-site] 01/02: More on xloader

Reply via email to