jenkins-bot has submitted this change and it was merged.

Change subject: Add new REST API data loader, which is faster.
......................................................................


Add new REST API data loader, which is faster.

Bug: T126956
Change-Id: I79363dc83eea44a1590dd4d3f4098bcab4a8f185
---
A dist/src/script/loadRestAPI.sh
M docs/getting-started.md
2 files changed, 62 insertions(+), 6 deletions(-)

Approvals:
  Smalyshev: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/dist/src/script/loadRestAPI.sh b/dist/src/script/loadRestAPI.sh
new file mode 100755
index 0000000..318547b
--- /dev/null
+++ b/dist/src/script/loadRestAPI.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+
+HOST=http://localhost:9999
+CONTEXT=bigdata
+LOAD_PROP_FILE=/tmp/$$.properties
+NAMESPACE=wdq
+
+pushd `dirname $0` > /dev/null
+SCRIPTPATH=`pwd`
+popd > /dev/null
+
+export NSS_DATALOAD_PROPERTIES="$SCRIPTPATH/RWStore.properties"
+
+while getopts h:c:n:d: option
+do
+  case "${option}"
+  in
+    h) HOST=${OPTARG};;
+    c) CONTEXT=${OPTARG};;
+    n) NAMESPACE=${OPTARG};;
+    d) LOCATION=${OPTARG};;
+  esac
+done
+
+if [ -z "$LOCATION" ]
+then
+  echo "Usage: $0 -d <directory>] [-n <namespace>] [-h <host>] [-c <context>]"
+  exit 1
+fi
+
+#Probably some unused properties below, but copied all to be safe.
+
+cat <<EOT >> $LOAD_PROP_FILE
+quiet=false
+verbose=0
+closure=false
+durableQueues=true
+#Needed for quads
+#defaultGraph=
+com.bigdata.rdf.store.DataLoader.flush=false
+com.bigdata.rdf.store.DataLoader.bufferCapacity=100000
+com.bigdata.rdf.store.DataLoader.queueCapacity=10
+#Namespace to load
+namespace=$NAMESPACE
+#Files to load
+fileOrDirs=$LOCATION
+#Property file (if creating a new namespace)
+propertyFile=$NSS_DATALOAD_PROPERTIES
+EOT
+
+echo "Loading with properties..."
+cat $LOAD_PROP_FILE
+
+curl -X POST --data-binary @${LOAD_PROP_FILE} --header 
'Content-Type:text/plain' $HOST/$CONTEXT/dataloader
+#Let the output go to STDOUT/ERR to allow script redirection
+
+rm -f $LOAD_PROP_FILE
diff --git a/docs/getting-started.md b/docs/getting-started.md
index a69084f..a2e8def 100644
--- a/docs/getting-started.md
+++ b/docs/getting-started.md
@@ -55,24 +55,23 @@
 * Download the dump file from 
https://dumps.wikimedia.org/wikidatawiki/entities/ (for subdirectory `20150427` 
the filename will be something like `wikidata-20150427-all-BETA.ttl.gz`) into 
the `data` directory.
 * Pre-process the dump with Munger utility:
 ```
-$ ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data -l en -s
+$ mkdir data/split
+$ ./munge.sh -f data/wikidata-20150427-all-BETA.ttl.gz -d data/split -l en -s
 ```
 The option `-l en` only imports English labels.  The option `-s` skips the 
sitelinks, for smaller storage and better performance.
 If you need labels in other languages, either add them to the list - `-l 
en,de,ru` - or skip the language option altogether. If you need sitelinks, 
remove the `-s` option.
 
 * The Munger will produce a lot of data files named like 
`wikidump-000000001.ttl.gz`, `wikidump-000000002.ttl.gz`, etc. To load these 
files, you can use the following script:
 ```
-$ ./loadData.sh -n wdq -d `pwd`/data
+$ ./loadRestAPI.sh -n wdq -d `pwd`/data/split
 ```
 
 This will load the data files one by one into the Blazegraph data store. Note 
that you need `curl` to be installed for it to work.
 
-You can also specify which files to load:
+You can also load specific files:
 ```
-$ ./loadData.sh -n wdq -d `pwd`/data -s 1 -e 3
+$ ./loadRestAPI.sh -n wdq -d `pwd`/data/split/wikidump-000000001.ttl.gz
 ```
-This will load files from with numbers from 1 to 3.
-
 
 ## Run updater
 

-- 
To view, visit https://gerrit.wikimedia.org/r/273539
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I79363dc83eea44a1590dd4d3f4098bcab4a8f185
Gerrit-PatchSet: 1
Gerrit-Project: wikidata/query/rdf
Gerrit-Branch: master
Gerrit-Owner: Smalyshev <smalys...@wikimedia.org>
Gerrit-Reviewer: Smalyshev <smalys...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to