jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/373918 )

Change subject: Cache regex check results
......................................................................


Cache regex check results

SparqlHelper::matchesRegularExpression is changed to be a wrapper around
a new function containing the previous code, caching its result. This
should hopefully reduce the number of requests to the query service.

Currently, there are 1786 format constraints defined on Wikidata, for a
total of 77,862,938 format constraint checks on statements. This was
determined with a script that may be found in paste P5921, and does not
include the statement count for the property P2093 (author name string),
which cannot be counted on WDQS. According to SQID, there are 24,150,684
statements for that property [1], so the total count is approximately
one hundred million format check results which might be cached.
(However, most of these checks will probably never be performed since
the corresponding items are rarely visited, and the set of checks that
are performed frequently is likely much smaller.)

To avoid putting arbitrary user input into the cache key, both regex and
text are hashed. If desired, the hash for the regex can be “reversed”
with a WDQS query like the following:

    SELECT ?property ?regex WHERE {
      ?property p:P2302 [
        ps:P2302 wd:Q21502404;
        pq:P1793 ?regex
      ].
      FILTER(SHA256(?regex) = "...")
    }

When the property is known, the hash for the value can similarly be
“reversed”, as long as the property doesn’t have too many statements and
isn’t a “Commons link” property (those map to URIs on the query
service).

[1]: https://tools.wmflabs.org/sqid/#/view?id=P2093

Bug: T173696
Change-Id: Iaaac950b483aaff83aa13b9f1b7d5090cd6c627f
---
M includes/ConstraintCheck/Helper/SparqlHelper.php
M includes/ConstraintReportFactory.php
M tests/phpunit/Helper/SparqlHelperTest.php
3 files changed, 103 insertions(+), 11 deletions(-)

Approvals:
  Aaron Schulz: Looks good to me, approved
  Krinkle: Looks good to me, but someone else must approve
  jenkins-bot: Verified



diff --git a/includes/ConstraintCheck/Helper/SparqlHelper.php 
b/includes/ConstraintCheck/Helper/SparqlHelper.php
index 13400b4..5bb6e48 100644
--- a/includes/ConstraintCheck/Helper/SparqlHelper.php
+++ b/includes/ConstraintCheck/Helper/SparqlHelper.php
@@ -6,6 +6,7 @@
 use IBufferingStatsdDataFactory;
 use MediaWiki\MediaWikiServices;
 use MWHttpRequest;
+use WANObjectCache;
 use Wikibase\DataModel\Entity\EntityIdParser;
 use Wikibase\DataModel\Entity\EntityIdParsingException;
 use Wikibase\DataModel\Statement\Statement;
@@ -43,6 +44,11 @@
        private $entityIdParser;
 
        /**
+        * @var WANObjectCache
+        */
+       private $cache;
+
+       /**
         * @var IBufferingStatsdDataFactory
         */
        private $dataFactory;
@@ -50,10 +56,12 @@
        public function __construct(
                Config $config,
                RdfVocabulary $rdfVocabulary,
-               EntityIdParser $entityIdParser
+               EntityIdParser $entityIdParser,
+               WANObjectCache $cache
        ) {
                $this->config = $config;
                $this->entityIdParser = $entityIdParser;
+               $this->cache = $cache;
 
                $this->entityPrefix = $rdfVocabulary->getNamespaceUri( 
RdfVocabulary::NS_ENTITY );
                $this->prefixes = <<<EOT
@@ -179,6 +187,40 @@
         * @throws ConstraintParameterException if the $regex is invalid
         */
        public function matchesRegularExpression( $text, $regex ) {
+               // caching wrapper around matchesRegularExpressionWithSparql
+               return (bool)$this->cache->getWithSetCallback(
+                       $this->cache->makeKey(
+                               'WikibaseQualityConstraints', // extension
+                               'regex', // action
+                               'WDQS-Java', // regex flavor
+                               hash( 'sha256', $regex ),
+                               hash( 'sha256', $text )
+                       ),
+                       WANObjectCache::TTL_DAY,
+                       function() use ( $text, $regex ) {
+                               $this->dataFactory->increment( 
'wikibase.quality.constraints.regex.cachemiss' );
+                               // convert to int because boolean false is 
interpreted as value not found
+                               return 
(int)$this->matchesRegularExpressionWithSparql( $text, $regex );
+                       },
+                       [
+                               // avoid querying cache servers multiple times 
in a request
+                               // (e. g. when checking format of a reference 
URL used multiple times on an entity)
+                               'pcTTL' => WANObjectCache::TTL_PROC_LONG,
+                       ]
+               );
+       }
+
+       /**
+        * This function is only public for testing purposes;
+        * use matchesRegularExpression, which is equivalent but caches results.
+        *
+        * @param string $text
+        * @param string $regex
+        * @return boolean
+        * @throws SparqlHelperException if the query times out or some other 
error occurs
+        * @throws ConstraintParameterException if the $regex is invalid
+        */
+       public function matchesRegularExpressionWithSparql( $text, $regex ) {
                $textStringLiteral = $this->stringLiteral( $text );
                $regexStringLiteral = $this->stringLiteral( '^' . $regex . '$' 
);
 
diff --git a/includes/ConstraintReportFactory.php 
b/includes/ConstraintReportFactory.php
index 9f0046a..322cb24 100644
--- a/includes/ConstraintReportFactory.php
+++ b/includes/ConstraintReportFactory.php
@@ -208,7 +208,8 @@
                                $sparqlHelper = new SparqlHelper(
                                        $this->config,
                                        $this->rdfVocabulary,
-                                       $this->entityIdParser
+                                       $this->entityIdParser,
+                                       
MediaWikiServices::getInstance()->getMainWANObjectCache()
                                );
                        } else {
                                $sparqlHelper = null;
diff --git a/tests/phpunit/Helper/SparqlHelperTest.php 
b/tests/phpunit/Helper/SparqlHelperTest.php
index b767d0f..fdc9714 100644
--- a/tests/phpunit/Helper/SparqlHelperTest.php
+++ b/tests/phpunit/Helper/SparqlHelperTest.php
@@ -2,7 +2,9 @@
 
 namespace WikibaseQuality\ConstraintReport\Test\Helper;
 
+use HashBagOStuff;
 use HashConfig;
+use WANObjectCache;
 use Wikibase\DataModel\Entity\EntityIdValue;
 use Wikibase\DataModel\Entity\ItemId;
 use Wikibase\DataModel\Entity\ItemIdParser;
@@ -39,7 +41,8 @@
                                                          
'http://www.wikidata.org/entity/',
                                                          
'http://www.wikidata.org/wiki/Special:EntityData/'
                                                  ),
-                                                 new ItemIdParser()
+                                                 new ItemIdParser(),
+                                                 WANObjectCache::newEmpty()
                                          ] )
                                          ->setMethods( [ 'runQuery' ] )
                                          ->getMock();
@@ -76,7 +79,8 @@
                                                          
'http://www.wikidata.org/entity/',
                                                          
'http://www.wikidata.org/wiki/Special:EntityData/'
                                                  ),
-                                                 new ItemIdParser()
+                                                 new ItemIdParser(),
+                                                 WANObjectCache::newEmpty()
                                          ] )
                                          ->setMethods( [ 'runQuery' ] )
                                          ->getMock();
@@ -110,7 +114,7 @@
                );
        }
 
-       public function testMatchesRegularExpression() {
+       public function testMatchesRegularExpressionWithSparql() {
                $text = '"&quot;\'\\\\"<&lt;'; // "&quot;'\\"<&lt;
                $regex = '\\"\\\\"\\\\\\"'; // \"\\"\\\"
                $query = 'SELECT (REGEX("\\"&quot;\'\\\\\\\\\\"<&lt;", 
"^\\\\\\"\\\\\\\\\\"\\\\\\\\\\\\\\"$") AS ?matches) {}';
@@ -124,12 +128,12 @@
                        ->with( $this->equalTo( $query ) )
                        ->willReturn( [ 'results' => [ 'bindings' => [ [ 
'matches' => [ 'value' => 'false' ] ] ] ] ] );
 
-               $result = $sparqlHelper->matchesRegularExpression( $text, 
$regex );
+               $result = $sparqlHelper->matchesRegularExpressionWithSparql( 
$text, $regex );
 
                $this->assertFalse( $result );
        }
 
-       public function testMatchesRegularExpressionBadRegex() {
+       public function testMatchesRegularExpressionWithSparqlBadRegex() {
                $text = '';
                $regex = '(.{2,5)?';
                $query = 'SELECT (REGEX("", "^(.{2,5)?$") AS ?matches) {}';
@@ -145,9 +149,9 @@
                        ->willReturn( [ 'results' => [ 'bindings' => [ [] ] ] ] 
);
 
                try {
-                       call_user_func_array( [ $sparqlHelper, 
'matchesRegularExpression' ], [ $text, $regex ] );
+                       call_user_func_array( [ $sparqlHelper, 
'matchesRegularExpressionWithSparql' ], [ $text, $regex ] );
                        $this->assertTrue( false,
-                               "matchesRegularExpression should have thrown a 
ConstraintParameterException with message ⧼$messageKey⧽." );
+                               "matchesRegularExpressionWithSparql should have 
thrown a ConstraintParameterException with message ⧼$messageKey⧽." );
                } catch ( ConstraintParameterException $exception ) {
                        $checkResult = new CheckResult(
                                $this->getMock( Context::class ),
@@ -161,6 +165,49 @@
        }
 
        /**
+        * @dataProvider haveCache
+        */
+       public function testMatchesRegularExpressionCache( $haveCache ) {
+               $text = '/.';
+               $regex = '.?';
+               $query = 'SELECT (REGEX("/.", "^.?$") AS ?matches) {}';
+               $cache = $haveCache ?
+                       new WANObjectCache( [ 'cache' => new HashBagOStuff() ] 
) :
+                       WANObjectCache::newEmpty();
+               $sparqlHelper = $this->getMockBuilder( SparqlHelper::class )
+                       ->setConstructorArgs( [
+                               $this->getDefaultConfig(),
+                               new RdfVocabulary(
+                                       'http://www.wikidata.org/entity/',
+                                       
'http://www.wikidata.org/wiki/Special:EntityData/'
+                               ),
+                               new ItemIdParser(),
+                               $cache
+                       ] )
+                       ->setMethods( [ 'runQuery' ] )
+                       ->getMock();
+
+               $sparqlHelper->expects( $haveCache ? $this->once() : 
$this->exactly( 2 ) )
+                       ->method( 'runQuery' )
+                       ->with( $this->equalTo( $query ) )
+                       ->willReturn( [ 'results' => [ 'bindings' => [ [ 
'matches' => [ 'value' => 'false' ] ] ] ] ] );
+
+               $result1 = $sparqlHelper->matchesRegularExpression( $text, 
$regex );
+               $cache->clearProcessCache();
+               $result2 = $sparqlHelper->matchesRegularExpression( $text, 
$regex );
+
+               $this->assertFalse( $result1 );
+               $this->assertFalse( $result2 );
+       }
+
+       public function haveCache() {
+               return [
+                       'with cache' => [ true ],
+                       'without cache' => [ false ],
+               ];
+       }
+
+       /**
         * @dataProvider isTimeoutProvider
         */
        public function testIsTimeout( $content, $expected ) {
@@ -170,7 +217,8 @@
                                'http://www.wikidata.org/entity/',
                                
'http://www.wikidata.org/wiki/Special:EntityData/'
                        ),
-                       new ItemIdParser()
+                       new ItemIdParser(),
+                       WANObjectCache::newEmpty()
                );
 
                $actual = $sparqlHelper->isTimeout( $content );
@@ -191,7 +239,8 @@
                                'http://www.wikidata.org/entity/',
                                
'http://www.wikidata.org/wiki/Special:EntityData/'
                        ),
-                       new ItemIdParser()
+                       new ItemIdParser(),
+                       WANObjectCache::newEmpty()
                );
                $content = '(x+x+)+y';
 

-- 
To view, visit https://gerrit.wikimedia.org/r/373918
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Iaaac950b483aaff83aa13b9f1b7d5090cd6c627f
Gerrit-PatchSet: 6
Gerrit-Project: mediawiki/extensions/WikibaseQualityConstraints
Gerrit-Branch: master
Gerrit-Owner: Lucas Werkmeister (WMDE) <lucas.werkmeis...@wikimedia.de>
Gerrit-Reviewer: Aaron Schulz <asch...@wikimedia.org>
Gerrit-Reviewer: Daniel Kinzler <daniel.kinz...@wikimedia.de>
Gerrit-Reviewer: Jonas Kress (WMDE) <jonas.kr...@wikimedia.de>
Gerrit-Reviewer: Krinkle <krinklem...@gmail.com>
Gerrit-Reviewer: Ladsgroup <ladsgr...@gmail.com>
Gerrit-Reviewer: Lucas Werkmeister (WMDE) <lucas.werkmeis...@wikimedia.de>
Gerrit-Reviewer: WMDE-leszek <leszek.mani...@wikimedia.de>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to