[
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608534#comment-14608534
]
ASF GitHub Bot commented on JENA-977:
-------------------------------------
GitHub user rvesse opened a pull request:
https://github.com/apache/jena/pull/84
[JENA-977] tdbloader2 script rewrite
This pull request contains a substantial rewrite of the `tdbloader2`
scripts to make them more user friendly, flexible and robust.
## Dev Environment Notes
Previously it was a pain to run the scripts in a dev environment because
they assume a class path of `$JENA_HOME/lib/*` which does not exist in a dev
environment. Therefore the POM for the distribution module was updated to use
the maven dependencies plugin to generate the `lib/` directory during a
`package` phase and clean it up during the `clean` phase which makes it much
easier to set `JENA_HOME` to your working copy distribution module directory
and enhance the scripts.
## Script Changes
The script changes are fairly extensive covering a number of areas. The
existing two scripts were split into four:
- `tdbloader2` - Main entry point which coordinates running the other
scripts
- `tdbloader2data` - Script which runs the data phase of the build
- `tdbloader2index` - Script which runs the index phase of the build
- `tdbloader2common` - Script which provides functions common to all scripts
The now defunct `tdbloader2worker` script was removed, there was also
outdated and broken scripts in `jena-tdb/bin/` which were also removed
### Symbolic Link and relative path handling
In rewriting the scripts some bugs with current treatment of `JENA_HOME`
were addressed:
- If `JENA_HOME` is not set it tries to locate it from the scripts path but
if the script is symbolic linked then it uses `readlink -f` however the `-f`
option has completely different meaning on BSD/OS X so could fail in some
cases. The scripts now all contain a `resolveLink` function which handles the
OS specific behaviour appropriately.
- If `JENA_HOME` is itself set to a symbolic link then the scripts could
fail to invoke the other scripts, if `JENA_HOME` is a symbolic link it is now
resolved appropriately
There were also similar bugs that could occur if the database location
given or data file paths were themselves relative and/or symbolic links. At
various points the scripts will now resolve symbolic links and make paths
absolute which makes the scripts less error prone.
### Option Handling
The scripts now all support a variety of user friendly options and has
built-in help for those. The main script `tdbloader2` accepts all the options
and handles passing relevant options through to the appropriate child scripts
as necessary.
All options that previously were only exposed via environment variables are
now exposed as command line options. For some the existing environment
variables (`JVM_ARGS` and `SORT_ARGS`) are still honoured if these options are
not otherwise specified.
Each of the tdbloader2 scripts now provides a `printUsage` function which
contains a detailed and user-friendly help summary. A user can view this by
running with the `-h` or `--help` option on each script.
### Incremental Builds
A `--phase` option is now supported on `tdbloader2` which takes a value of
`all`, `data` or `index`. `all` does a full build and is the default behaviour
if phase is omitted.
The other two perform the appropriately named phase of the build. This
allows a build to be done in smaller incremental steps and also allows for the
index phase of the build to be restarted which is useful because in my
experience if you get past the data phase then the index phase has far more
scope for error.
### Indexing Improvements
There have been a lot of improvements made to the indexing scripts:
- Warns if it looks like the disk where sort is storing temporary files may
be too full
- Aborts if there is insufficient free disk space to sort an input file
- Warns if a given sort is likely to be external, adds additional warnings
if the same sort may be short of disk space on the disk where sort is storing
temporary files
- Provides progress reporting for sort when running in the foreground
provided that the `pv` ([PipeViewer](http://www.ivarch.com/programs/pv.shtml))
tool is available
### Debugging
All scripts now support `--debug` and `--trace` options which add extra
output
- `--debug` will add various additional debugging output during a build
- `--trace` will set `set -x` on the scripts
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/jena JENA-977
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/jena/pull/84.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #84
----
commit d92e336263da3f0f2a58dfc24cb9b5f23449cc5c
Author: Rob Vesse <[email protected]>
Date: 2015-06-25T15:56:29Z
Initial work on refactoring tdbloader2 scripts (JENA-977)
- Better option processing
- Split tdbloader2worker into a data and index phase script
- Support only running a specific phase
commit 7b61a144854d81acbd180b5debfd5c8638d2af57
Author: Rob Vesse <[email protected]>
Date: 2015-06-25T16:04:36Z
Further tweak new tdbloader2 scripts (JENA-977)
- Add proper usage to tdbloader2
- Check for temporary data files needed for index phase in
tdbloader2index
commit a96b0164c43142791ac030e5332b3f54df6fb4ba
Author: Rob Vesse <[email protected]>
Date: 2015-06-26T11:25:57Z
Further refactoring of tdbloader2 scripts (JENA-977)
- Proper usage summaries in all scripts
- -k/--keep-work option instead of hidden environment variable
for keeping work
- Short forms for all options
commit 7770596bc94613409fe2753240b603ae22a38b57
Author: Rob Vesse <[email protected]>
Date: 2015-06-26T15:15:18Z
Various further improvements to the scripts (JENA-977)
- Validate sort temporary directory when indexing and WARN if the disk
it is on is low on space (10% or less free)
- Support --debug and --trace flags in all scripts, add various debug
output throughout scripts
- Fix a bug with not detecting sort failure when pv is used to monitor
progress
- Fix a bug in size calculations used for progress monitoring and sort
failure detection
This commit includes some temporary DEV changes that will be reverted
later
commit 3c59213e273711836628d9d030df23dac142ee1b
Author: Rob Vesse <[email protected]>
Date: 2015-06-29T12:12:03Z
Fix script usage in dev environment (JENA-977)
This commit enhances the distribution module to make it much easier to
use in dev environments. The dependency plugin is used with the
copy-dependencies goal to produce the lib/ directory during a package
phase and then clean plugin is configured to clean the lib/ directory
during a clean. This means that developers can now set JENA_HOME to the
distribution module directory in their working copy and provided they
have done a mvn package all the scripts should work.
This also allows the temporary hacks in the new tdbloader2 scripts to be
removed so these scripts now run against Jena 3 libraries and don't need
the path to the new scripts to be hacked.
commit c55c1f74b4571eee2c9e333967b5671e862adff7
Author: Rob Vesse <[email protected]>
Date: 2015-06-29T16:21:18Z
Further refactoring of tdbloader2 scripts (JENA-977)
- Move common functions into tdbloader2common script
- Remove duplicated definitions from other scripts and source in the new
common script
- Add helper function for getting drive information
- Add check in tdbloader2index script which will abort the build if
there is insufficient free space to sort the data file since the
sorted output will be same size in the input so if there are fewer
bytes free than the size of the input we can abort early
commit a7ac2797856bf60476204b8997b5a5bf4cfa15c5
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T12:44:29Z
Further improvements to tdbloader2 scripts (JENA-977)
- Auto-detection of JENA_HOME now exports it so it is visible to the
child scripts
- Force making database directory path absolute and resolving any
symbolic links in the path
- Additional checks in tdbloader2index to warn if sort is going to be
external and it may run out of temporary disk space for the sort
commit cc4a80ac3c44d738a8904ac91b1ece71b446d74a
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T13:25:46Z
Check for return codes from children in tdbloader2 (JENA-977)
Ensures that the main script checks for the return code of the child
scripts and aborts if they fail
commit d4a0bc50a6d82ab5bbb43ab90e65216e5b165621
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T14:04:50Z
Finish up first pass of work on tdbloader2 script refactoring (JENA-977)
- Add options for setting the JVM and sort arguments that do not rely on
environment variables. NB - For backwards compatibility the existing
environment variables are still honoured if the new command line
options are not used
- Improve some error messages
- Explicitly support -- for separating data files from options for cases
where file names may be confused
commit f64dbdcb6ac77cfb6654916e43797fdca3d4fb5c
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T14:33:09Z
Ensure data file paths are absolute (JENA-977)
This commit improves the tdbloader2 script to ensure that data file
paths are made absolute and any symbolic links are resolved.
commit d9ff26ec96b6cbf15d6649704dbcfe7f1d8d09eb
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T14:59:33Z
Fix bug where JENA_HOME is a symbolic link (JENA-977)
This commit fixes a bug that can occur when JENA_HOME is a symbolic
link, the scripts need to resolve the link as otherwise they cannot
source the common function scripts successfully.
Scripts now also bail out if they can't find the common functions script
to source.
commit c25ad5d800779ca829a7bde581f98d62c417719b
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T15:04:42Z
Minor clean up of OS type testing (JENA-977)
commit 12dc2cc66640e432a4e2f5b45ebf2fb16c995440
Author: Rob Vesse <[email protected]>
Date: 2015-06-30T15:08:52Z
Final pieces of tdbloader2 script clean up (JENA-977)
- Fix white space inconsistencies in tdbloader2 scripts
- Removed defunct tdbloader2worker script
- Removed defunct and broken scripts from jena-tdb/bin/
----
> tdbloader2 script refactoring
> -----------------------------
>
> Key: JENA-977
> URL: https://issues.apache.org/jira/browse/JENA-977
> Project: Apache Jena
> Issue Type: Improvement
> Components: TDB
> Affects Versions: Jena 2.13.0
> Reporter: Rob Vesse
> Assignee: Rob Vesse
> Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables
> wherever possible)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)