On Mon, Apr 11, 2011 at 11:11 PM, Dan McGee <[email protected]> wrote: > On Tue, Apr 12, 2011 at 1:04 AM, elij <[email protected]> wrote: >> On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <[email protected]> wrote: >>> The majority of "real world" info requests [1] come in hefty batches. We >>> would be better served to handle these in one request rather than >>> multiple by allowing AUR clients to send multiple arguments. >>> >>> This enables things like this to work: >>> >>> http://aur.test/rpc.php?type=info&arg[]=cups-xerox&arg[]=cups-mc2430dl&arg[]=10673 >> >> This kind of breaks the query format convention (thanks php!). >> It might be cleaner to use a separator (like a space) to break up args >> (and simply split on space). >> >> http://aur.test/rpc.php?type=info&arg=cups-xerox+cups-mc1430dl+10673 >> >> (note using plus signs here as an example since spaces or %20s are >> harder to 'see') > > I know it breaks "convention", but introducing another seemed a bit > silly to me. All old queries work 100% however, if that wasn't clear. > If we want to expand this to msearch or search down the road (which > seemed totally plausible to me when designing the code here), where a > space has totally different meaning and you would not want to split on > it, this new syntax would not work at all.
That makes sense. You have convinced me. :) >>> Note to RPC users: unfortunately due to the asinine design of PHP, you >>> unfortunately have to use the 'arg[]' syntax if you want more than one >>> query argument, or you will only get the package satisfying the last arg >>> you pass. >>> >>> [1] Rough data from April 11, 2011, with a total hit count of 1,109,163: >>> 12 /login.php >>> 13 /rpc.php?type=sarch >>> 15 /rpc.php?type=msearch >>> 16 /pingserver.php >>> 16 /rpc.php >>> 22 /logout.php >>> 163 /passreset.php >>> 335 /account.php >>> 530 /pkgsubmit.php >>> 916 /rss2.php >>> 3838 /index.php >>> 6752 /rss.php >>> 9699 / >>> 42478 /rpc.php?type=search >>> 184737 /packages.php >>> 681725 /rpc.php?type=info >>> >>> That means a whopping 61.5% of our requests were for info over the RPC >>> interface; package pages are a distant second at only 16.7%. >>> >>> Signed-off-by: Dan McGee <[email protected]> >>> --- >>> web/lib/aurjson.class.php | 59 >>> ++++++++++++++++++++++++++++++++++++-------- >>> 1 files changed, 48 insertions(+), 11 deletions(-) >>> >>> diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php >>> index 9b0e1a0..57096d8 100644 >>> --- a/web/lib/aurjson.class.php >>> +++ b/web/lib/aurjson.class.php >>> @@ -101,6 +101,36 @@ class AurJSON { >>> } >>> >>> /** >>> + * Parse the args to the info function. We may have a string or an >>> array, >>> + * so do the appropriate thing. Within the elements, both * package IDs >>> + * and package names are valid; sort them into the relevant arrays and >>> + * escape/quote the names. >>> + * @param $args the arg string or array to parse. >>> + * @return mixed An array containing 'ids' and 'names'. >>> + **/ >>> + private function parse_info_args($args) { >>> + if (!is_array($args)) { >>> + $args = array($args); >>> + } >>> + >>> + $id_args = array(); >>> + $name_args = array(); >>> + foreach ($args as $arg) { >>> + if (!$arg) { >>> + continue; >>> + } >>> + if (is_numeric($arg)) { >>> + $id_args[] = intval($arg); >>> + } else { >>> + $escaped = mysql_real_escape_string($arg, $this->dbh); >>> + $name_args[] = "'" . $escaped . "'"; >>> + } >>> + } >>> + >>> + return array('ids' => $id_args, 'names' => $name_args); >>> + } >>> + >>> + /** >>> * Performs a fulltext mysql search of the package database. >>> * @param $keyword_string A string of keywords to search with. >>> * @return mixed Returns an array of package matches. >>> @@ -129,21 +159,28 @@ class AurJSON { >>> **/ >>> private function info($pqdata) { >>> $fields = implode(',', self::$fields); >>> + $args = $this->parse_info_args($pqdata); >>> + $ids = $args['ids']; >>> + $names = $args['names']; >>> >>> - $base_query = "SELECT {$fields} " . >>> - " FROM Packages WHERE "; >>> + if (!$ids && !$names) { >>> + return $this->json_error('Invalid query arguments'); >>> + } >>> >>> - if ( is_numeric($pqdata) ) { >>> - // just using sprintf to coerce the pqd to an int >>> - // should handle sql injection issues, since sprintf will >>> - // bork if not an int, or convert the string to a number 0 >>> - $query_stub = "ID={$pqdata}"; >>> + $query = "SELECT {$fields} " . >>> + " FROM Packages WHERE "; >>> + if ($ids) { >>> + $ids_value = implode(',', $args['ids']); >>> + $query .= "ID IN ({$ids_value})"; >> >> I thought 'where in' queries in mysql were known to be terribly slow >> compared to using 'where ='. > > Nope, at least not in this case- if things are indexed properly you > have no problems, and we aren't using a subquery for the IN clause. > mysql is at least smart enough to recognize the columns being filtered > on are unique so it knows the row count is extremely limited before it > even runs the query. Oh, that's right. It was an IN with a *subquery* that was the pathological case. Nice. > mysql> explain select * from Packages where id in (10673, 6941) or > Name in ('cups-xerox', 'cups-mc2430dl'); > +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ > | id | select_type | table | type | possible_keys | key > | key_len | ref | rows | Extra > | > +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ > | 1 | SIMPLE | Packages | index_merge | PRIMARY,Name | > Name,PRIMARY | 66,4 | NULL | 4 | Using sort_union(Name,PRIMARY); > Using where | > +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+ > 1 row in set (0.00 sec) >
