This is an automated email from the ASF dual-hosted git repository. andy pushed a commit to branch more-doc in repository https://gitbox.apache.org/repos/asf/jena-site.git
commit 018c0a824a738cd9fd8732b5d62cd70a00744525 Author: Andy Seaborne <[email protected]> AuthorDate: Mon Nov 15 17:03:29 2021 +0000 More on xloader --- source/documentation/tdb/commands.md | 53 +++++++++++++++++++-------------- source/documentation/tdb/faqs.md | 26 ++++++++++++++-- source/documentation/tdb/tdb-xloader.md | 29 ++++++++++-------- 3 files changed, 71 insertions(+), 37 deletions(-) diff --git a/source/documentation/tdb/commands.md b/source/documentation/tdb/commands.md index baa2e27..9a9af88 100644 --- a/source/documentation/tdb/commands.md +++ b/source/documentation/tdb/commands.md @@ -13,7 +13,7 @@ title: TDB Command-line Utilities - [TDB Commands](#tdb-commands) - [Store description](#store-description) - [tdbloader](#tdbloader) - - [tdbloader2](#tdbloader2) + - [TDB xloader](#tdb-xloader) - [tdbquery](#tdbquery) - [tdbdump](#tdbdump) - [tdbstats](#tdbstats) @@ -98,10 +98,37 @@ are loaded into the dataset according to the name or the default graph. Bulk loader and index builder. Performs bulk load operations more efficiently than simply reading RDF into a TDB-back model. +### tdb.xloader + +`tdb1.xloader` and `tdb2.xloader` are bulk loaders for very large data for TDB1 +and TDB2. + +See [TDB xloader](./tdb-xloader.html) for more information. These loaders only +work on Linux and Mac OS/X since it relies on some Unix system utilities. + +### `tdbquery` + +Invoke a SPARQL query on a store. Use `--time` for timing +information. The store is attached on each run of this command so +timing includes some overhead not present in a running system. + +Details about query execution can be obtained -- see notes on the +[TDB Optimizer](optimizer.html#investigating-what-is-going-on). + +### `tdbdump` + +Dump the store in +[N-Quads](http://www.w3.org/TR/n-quads/) +format. + +### `tdbstats` + +Produce a statistics for the dataset. See the +[TDB Optimizer description.](optimizer.html#statistics-rule-file). + ### `tdbloader2` -Bulk loader and index builder. Faster than `tdbloader` but only works -on Linux and Mac OS/X since it relies on some Unix system utilities. +*This has been replace by [TDB xloader](./tdb-xloader.html).* This bulk loader can only be used to create a database. It may overwrite existing data. It requires accepts the `--loc` argument and a @@ -130,23 +157,3 @@ If you are building a large dataset (i.e. gigabytes of input data) you may wish to have the [PipeViewer](http://www.ivarch.com/programs/pv.shtml) tool installed on your system as this will provide extra progress information during the indexing phase of the build. - -### `tdbquery` - -Invoke a SPARQL query on a store. Use `--time` for timing -information. The store is attached on each run of this command so -timing includes some overhead not present in a running system. - -Details about query execution can be obtained -- see notes on the -[TDB Optimizer](optimizer.html#investigating-what-is-going-on). - -### `tdbdump` - -Dump the store in -[N-Quads](http://www.w3.org/TR/n-quads/) -format. - -### tdbstats - -Produce a statistics for the dataset. See the -[TDB Optimizer description.](optimizer.html#statistics-rule-file). diff --git a/source/documentation/tdb/faqs.md b/source/documentation/tdb/faqs.md index b7f9f19..e7479c8 100644 --- a/source/documentation/tdb/faqs.md +++ b/source/documentation/tdb/faqs.md @@ -4,6 +4,8 @@ title: TDB FAQs ## FAQs + +- [What are TDB1 and TDB2?](#tdv1-tdb2) - [Does TDB support Transactions?](#transactions) - [Can I share a TDB dataset between multiple applications?](#multi-jvm) - [What is the *Impossibly Large Object* exception?](#impossibly-large-object) @@ -18,6 +20,15 @@ title: TDB FAQs - [What is the *Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database* error?](#tdb2-lock) - [My question isn't answered here?](#not-answered) +<a name="tdb1-tdb2></a> +## TDB1 and TDB2 + +TDB2 is a later generation of database for Jena. It is more robust and can +handle large update transactions. + +These are different databases systems - the have different on-disk file formats +and databases for one are not compatible with other database engine. + <a name="transactions"></a> ## Does TDB support transactions? @@ -37,11 +48,11 @@ transactionally. ## Can I share a TDB dataset between multiple applications? Multiple applications, running in multiple JVMs, using the same -file databases is **not** supported and has a high risk of data corruption. Once corrupted a database cannot be repaired +file databases is **not** supported and has a high risk of data corruption. Once corrupted, a database cannot be repaired and must be rebuilt from the original source data. Therefore there **must** be a single JVM controlling the database directory and files. -From 1.1.0 onwards TDB includes automatic prevention of multi-JVM usage which prevents this under most circumstances and helps +TDB includes automatic prevention of multi-JVM usage which prevents this under most circumstances and helps protect your data from corruption. If you wish to share a TDB dataset between applications use our [Fuseki](../fuseki2/) component which provides a @@ -77,11 +88,22 @@ As noted above to resolve this problem you **must** rebuild your database from t be repaired. This is why we **strongly** recommend you use [transactions](tdb_transactions.html) since this protects your dataset against corruption. +## What is `tdb.xloader`? + +`tdb1.xloader` and `tdb2.xloader` are bulk laodrs for very large dataset that +take several hours to load. + +See [TDB xloader](./tdb-xloader.html) for more information. + <a name="tdbloader-vs-tdbloader2"></a> ## What is the different between `tdbloader` and `tdbloader2`? +`tdbloader2` has been replaced by `tdb1.xloader` and `tdb2.xloader` for TDB1 and TDB2 respectively. + + `tdbloader` and `tdbloader2` differ in how they build databases. + `tdbloader` is Java based and uses the same TDB APIs that you would use in your own Java code to perform the data load. The advantage of this is that it supports incremental loading of data into a TDB database. The downside is that the loader will be slower for initial database builds. diff --git a/source/documentation/tdb/tdb-xloader.md b/source/documentation/tdb/tdb-xloader.md index 443e18e..c6feaec 100644 --- a/source/documentation/tdb/tdb-xloader.md +++ b/source/documentation/tdb/tdb-xloader.md @@ -7,16 +7,19 @@ is stability and reliability for long running loading, running on modest and xloader is not a replacement for regular TDB1 and TDB2 loaders. -"tdb1.xloader" was called "tdbloader2" and has some improvements. +There are two scripts to load data using the xlaoder subsystem. + +"tdb1.xloader", which was called "tdbloader2" and has some improvements. It is not as fast as other TDB loaders on dataset where the general loaders work on without encountering progressive slowdown. -The xloaders for TDB1 and TDB2 are not identical. The TDB2 is more capable; it -is based on the same design approach with further refinements to building the -node table and to reduce the total amount of temporary file space used. +The xloaders for TDB1 and TDB2 are not identical. The TDB2 xlaoder is more +capable; it is based on the same design approach with further refinements to +building the node table and to reduce the total amount of temporary file space +used. -The xloader does not run on MS Windows. It uses and external sort program from +The xloader does not run on MS Windows. It uses an external sort program from unix - `sort(1)`. The xloader only builds a fresh database from empty. @@ -30,22 +33,24 @@ or `tdb1.xloader --loc DIRECTORY` FILE... -Additioally, there is an argument `--tmpdir` to use a different directory for +Additionally, there is an argument `--tmpdir` to use a different directory for temporary files. -`FILE` is any RDF syntax supported by Jena. +`FILE` is any RDF syntax supported by Jena. Syntax is detemined by file +extension and can include an addtional ".gz" or ".bz2" for compresses files. ### Advice -`xloader` uses a lot of temporary disk space. - To avoid a load failing due to a syntax or other data error, it is advisable to run `riot --check` on the data first. Parsing is faster than loading. -If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at -this stage by adding `--stream rdf-thrift` to the riot checking run. -Parsing RDF Thrift is faster than parsing N-Triples although the bulk of the loading process is not limited by parser speed. +The TDB databases will take up a lot of disk space and in addition during +loading `xloader` uses a significant amout of temporary disk space. +If desired, the data can be converted to [RDF Thrift](../io/rdf-binary.html) at +this stage by adding `--stream rdf-thrift` to the riot checking run. Parsing +RDF Thrift is faster than parsing N-Triples although the bulk of the loading +process is not limited by parser speed. Do not capture the bulk loader output in a file on the same disk as the database or temporary directory; it slows loading down.
