[ 
https://issues.apache.org/jira/browse/JENA-977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608534#comment-14608534
 ] 

ASF GitHub Bot commented on JENA-977:
-------------------------------------

GitHub user rvesse opened a pull request:

    https://github.com/apache/jena/pull/84

    [JENA-977] tdbloader2 script rewrite

    This pull request contains a substantial rewrite of the `tdbloader2` 
scripts to make them more user friendly, flexible and robust.
    
    ## Dev Environment Notes
    
    Previously it was a pain to run the scripts in a dev environment because 
they assume a class path of `$JENA_HOME/lib/*` which does not exist in a dev 
environment.  Therefore the POM for the distribution module was updated to use 
the maven dependencies plugin to generate the `lib/` directory during a 
`package` phase and clean it up during the `clean` phase which makes it much 
easier to set `JENA_HOME` to your working copy distribution module directory 
and enhance the scripts.
    
    ## Script Changes
    
    The script changes are fairly extensive covering a number of areas.  The 
existing two scripts were split into four:
    
    - `tdbloader2` - Main entry point which coordinates running the other 
scripts
    - `tdbloader2data` - Script which runs the data phase of the build
    - `tdbloader2index` - Script which runs the index phase of the build
    - `tdbloader2common` - Script which provides functions common to all scripts
    
    The now defunct `tdbloader2worker` script was removed, there was also 
outdated and broken scripts in `jena-tdb/bin/` which were also removed
    
    ### Symbolic Link and relative path handling
    
    In rewriting the scripts some bugs with current treatment of `JENA_HOME` 
were addressed:
    
    - If `JENA_HOME` is not set it tries to locate it from the scripts path but 
if the script is symbolic linked then it uses `readlink -f` however the `-f` 
option has completely different meaning on BSD/OS X so could fail in some 
cases.  The scripts now all contain a `resolveLink` function which handles the 
OS specific behaviour appropriately.
    - If `JENA_HOME` is itself set to a symbolic link then the scripts could 
fail to invoke the other scripts, if `JENA_HOME` is a symbolic link it is now 
resolved appropriately
    
    There were also similar bugs that could occur if the database location 
given or data file paths were themselves relative and/or symbolic links.  At 
various points the scripts will now resolve symbolic links and make paths 
absolute which makes the scripts less error prone.
    
    ### Option Handling
    
    The scripts now all support a variety of user friendly options and has 
built-in help for those.  The main script `tdbloader2` accepts all the options 
and handles passing relevant options through to the appropriate child scripts 
as necessary.
    
    All options that previously were only exposed via environment variables are 
now exposed as command line options.  For some the existing environment 
variables (`JVM_ARGS` and `SORT_ARGS`) are still honoured if these options are 
not otherwise specified.
    
    Each of the tdbloader2 scripts now provides a `printUsage` function which 
contains a detailed and user-friendly help summary.  A user can view this by 
running with the `-h` or `--help` option on each script.
    
    ### Incremental Builds
    
    A `--phase` option is now supported on `tdbloader2` which takes a value of 
`all`, `data` or `index`.  `all` does a full build and is the default behaviour 
if phase is omitted.
    
    The other two perform the appropriately named phase of the build.  This 
allows a build to be done in smaller incremental steps and also allows for the 
index phase of the build to be restarted which is useful because in my 
experience if you get past the data phase then the index phase has far more 
scope for error.
    
    ### Indexing Improvements
    
    There have been a lot of improvements made to the indexing scripts:
    
    - Warns if it looks like the disk where sort is storing temporary files may 
be too full
    - Aborts if there is insufficient free disk space to sort an input file
    - Warns if a given sort is likely to be external, adds additional warnings 
if the same sort may be short of disk space on the disk where sort is storing 
temporary files
    - Provides progress reporting for sort when running in the foreground 
provided that the `pv` ([PipeViewer](http://www.ivarch.com/programs/pv.shtml)) 
tool is available
    
    ### Debugging
    
    All scripts now support `--debug` and `--trace` options which add extra 
output
    
    - `--debug` will add various additional debugging output during a build 
    - `--trace` will set `set -x` on the scripts
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/jena JENA-977

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/jena/pull/84.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #84
    
----
commit d92e336263da3f0f2a58dfc24cb9b5f23449cc5c
Author: Rob Vesse <[email protected]>
Date:   2015-06-25T15:56:29Z

    Initial work on refactoring tdbloader2 scripts (JENA-977)
    
    - Better option processing
    - Split tdbloader2worker into a data and index phase script
    - Support only running a specific phase

commit 7b61a144854d81acbd180b5debfd5c8638d2af57
Author: Rob Vesse <[email protected]>
Date:   2015-06-25T16:04:36Z

    Further tweak new tdbloader2 scripts (JENA-977)
    
    - Add proper usage to tdbloader2
    - Check for temporary data files needed for index phase in
      tdbloader2index

commit a96b0164c43142791ac030e5332b3f54df6fb4ba
Author: Rob Vesse <[email protected]>
Date:   2015-06-26T11:25:57Z

    Further refactoring of tdbloader2 scripts (JENA-977)
    
    - Proper usage summaries in all scripts
    - -k/--keep-work option instead of hidden environment variable
      for keeping work
    - Short forms for all options

commit 7770596bc94613409fe2753240b603ae22a38b57
Author: Rob Vesse <[email protected]>
Date:   2015-06-26T15:15:18Z

    Various further improvements to the scripts (JENA-977)
    
    - Validate sort temporary directory when indexing and WARN if the disk
      it is on is low on space (10% or less free)
    - Support --debug and --trace flags in all scripts, add various debug
      output throughout scripts
    - Fix a bug with not detecting sort failure when pv is used to monitor
      progress
    - Fix a bug in size calculations used for progress monitoring and sort
      failure detection
    
    This commit includes some temporary DEV changes that will be reverted
    later

commit 3c59213e273711836628d9d030df23dac142ee1b
Author: Rob Vesse <[email protected]>
Date:   2015-06-29T12:12:03Z

    Fix script usage in dev environment (JENA-977)
    
    This commit enhances the distribution module to make it much easier to
    use in dev environments.  The dependency plugin is used with the
    copy-dependencies goal to produce the lib/ directory during a package
    phase and then clean plugin is configured to clean the lib/ directory
    during a clean.  This means that developers can now set JENA_HOME to the
    distribution module directory in their working copy and provided they
    have done a mvn package all the scripts should work.
    
    This also allows the temporary hacks in the new tdbloader2 scripts to be
    removed so these scripts now run against Jena 3 libraries and don't need
    the path to the new scripts to be hacked.

commit c55c1f74b4571eee2c9e333967b5671e862adff7
Author: Rob Vesse <[email protected]>
Date:   2015-06-29T16:21:18Z

    Further refactoring of tdbloader2 scripts (JENA-977)
    
    - Move common functions into tdbloader2common script
    - Remove duplicated definitions from other scripts and source in the new
      common script
    - Add helper function for getting drive information
    - Add check in tdbloader2index script which will abort the build if
      there is insufficient free space to sort the data file since the
      sorted output will be same size in the input so if there are fewer
      bytes free than the size of the input we can abort early

commit a7ac2797856bf60476204b8997b5a5bf4cfa15c5
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T12:44:29Z

    Further improvements to tdbloader2 scripts (JENA-977)
    
    - Auto-detection of JENA_HOME now exports it so it is visible to the
      child scripts
    - Force making database directory path absolute and resolving any
      symbolic links in the path
    - Additional checks in tdbloader2index to warn if sort is going to be
      external and it may run out of temporary disk space for the sort

commit cc4a80ac3c44d738a8904ac91b1ece71b446d74a
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T13:25:46Z

    Check for return codes from children in tdbloader2 (JENA-977)
    
    Ensures that the main script checks for the return code of the child
    scripts and aborts if they fail

commit d4a0bc50a6d82ab5bbb43ab90e65216e5b165621
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T14:04:50Z

    Finish up first pass of work on tdbloader2 script refactoring (JENA-977)
    
    - Add options for setting the JVM and sort arguments that do not rely on
      environment variables.  NB - For backwards compatibility the existing
      environment variables are still honoured if the new command line
      options are not used
    - Improve some error messages
    - Explicitly support -- for separating data files from options for cases
      where file names may be confused

commit f64dbdcb6ac77cfb6654916e43797fdca3d4fb5c
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T14:33:09Z

    Ensure data file paths are absolute (JENA-977)
    
    This commit improves the tdbloader2 script to ensure that data file
    paths are made absolute and any symbolic links are resolved.

commit d9ff26ec96b6cbf15d6649704dbcfe7f1d8d09eb
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T14:59:33Z

    Fix bug where JENA_HOME is a symbolic link (JENA-977)
    
    This commit fixes a bug that can occur when JENA_HOME is a symbolic
    link, the scripts need to resolve the link as otherwise they cannot
    source the common function scripts successfully.
    
    Scripts now also bail out if they can't find the common functions script
    to source.

commit c25ad5d800779ca829a7bde581f98d62c417719b
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T15:04:42Z

    Minor clean up of OS type testing (JENA-977)

commit 12dc2cc66640e432a4e2f5b45ebf2fb16c995440
Author: Rob Vesse <[email protected]>
Date:   2015-06-30T15:08:52Z

    Final pieces of tdbloader2 script clean up (JENA-977)
    
    - Fix white space inconsistencies in tdbloader2 scripts
    - Removed defunct tdbloader2worker script
    - Removed defunct and broken scripts from jena-tdb/bin/

----


> tdbloader2 script refactoring
> -----------------------------
>
>                 Key: JENA-977
>                 URL: https://issues.apache.org/jira/browse/JENA-977
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>    Affects Versions: Jena 2.13.0
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>             Fix For: Jena 2.13.1, Jena 3.0.0
>
>
> As noted on the dev list the current scripts are a little rough around the 
> edges, work items include:
> - Splitting data and index phase into separate scripts
> - Being able to restart a build from a later phase
> - Progress monitoring for the sort portion of indexing
> - Warning if sort is using a disk where you may have insufficient space
> - Better usage summaries
> - Better argument handling (avoid relying on magic environment variables 
> wherever possible)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to