Greetings, I've been plugging away on search v1 for a while now. As part 
of that change, the syntax for search is going to be expanded a fair 
bit. I thought I'd outline here what my plans are for the new search 
command, and get feedback before I turn to the implementation. Please 
remember that this is my attempt to outline everything I might possibly 
want to put into search. It's possible that some of these features may 
disappear if the implementation is too difficult, or I think there isn't 
much demand for some non-trivial feature. If there are features, from an 
API or user perspective, either in this list or not, please let me know 
and I'll do my best to accommodate everyones desires.

 From a high level, the syntax will remain roughly the same:
pkg search [options] [tokens/query]

API:
Search will move into the client API. It will provide an option about 
whether only packages should be returned, or whether entire actions 
should be sent back.

I think there may also be an option for a client to grab a list of 
possible search tokens from the server. This will allow the client to 
suggest expansions on it's own, instead of querying the server as the 
user types. Considering the number of search tokens, we may want to pare 
this down, or determine someway to update it, rather than simply dumping 
the entire batch each time since a simple list of tokens is 47M 
uncompressed for the server. Compression drops that to 15M, but that's 
still a lot of information to send repeatedly across the wire. Perhaps 
this feature isn't needed, but I suspect we can be clever about things 
and find a way to send only the new and deleted tokens after an initial 
transfer.

Output:
By default, search will display a list of packages instead of the 4 
tuple it currently returns. There will be an option to toggle to set the 
output to be the old format. In  addition, displaying the entire text of 
matching actions will be possible.

By default search will be limited to returning the first N results. N is 
up for debate, but I suggest 100.

Options:
The simplest change is that remote search will become the default 
instead of local search. This seems to conform more with both the 
majority of use cases for most users (they're searching for things to 
install, not to find what's already on their system) as well as matching 
what other packaging systems' search seems to do.

New options -p, -a, a third (-4?), and -o will be added. -p (the default 
behavior) will tell the search command to only display the package 
names, and not the detailed action info. -a will tell search to display 
the entire action as text to the user. -4 (or whatever the chosen name 
is) will cause the previous information to be displayed. -o can be 
repeated, and functions much as the pkg contents -o option works. It 
allows the user total control over what will be given as a result. Note 
that unlike the current behavior of contents, all the information about 
an action will be listed on a single line.

New options -z(?), and --start-point will be added. -z will allow the 
user to determine the number of answers returned. --start-point will 
allow the user to specify starting at some result at a point after the 
initial one. This will allow users (especially things like the BUI) to 
easily paginate the results and will prevent the server from getting 
blocked on one query for an excessive amount of time.

Tokens/query:
Here is where the majority of visible change will take place. The query 
syntax is being vastly expanded. The simplest change is that multiple 
tokens will be allowed, and implicitly anded together. In addition, 
structured queries are being supported, more detail is provided below. 
Boolean queries will be added as well. I also plan/hope to be able to 
search against an incorporation, though that may have to wait till v2.

Because search will now display at both a package and an action level, I 
believe it makes sense to distinguish between those uses in the query 
syntax. Thus, I'm suggesting instead of 'and', and 'or' there will be 
'pand', 'por', and 'aand', 'aor' (for pacakge-op and action-op). The 
package ops will come with some restrictions, they cannot be used unless 
the user is asking for a package list, instead of an action list. The 
idea of pand (or example) is that it returns the list of packages 
containing actions that satisfy each of its components. The domains of 
these operations would be both lists of packages, and lists of actions. 
When provided two sets of packages, it simply takes the intersection (or 
union in the case of 'or') of the sets. When actions are provided, it 
first converts the sets of actions into a set of packages, each of which 
contains at least one of the actions in the original set. 'aand' and 
'aor' (which might become just 'and' and 'or') would have a domain of 
actions. They allow the user to specify an action which has two features 
(for example, an action with both "fun" and "games" in it). When the and 
operation happens implicitly (for example in pkg search foo bar baz) I 
suggest that whether a pand or aand is used depends on whether the user 
has request only package information, or some form of action level 
information. 'pnot' and 'anot' may or may not be added, depending both 
on demand and the difficulty of implementing those efficiently.

Another new feature is that instead of being limited to searching the 
text of all actions, users will be able to specify the action type, 
subtype (which I'll explain in a moment b/c that may be a poor name), or 
both over which they want to search. If a user wanted to list all files 
that are delivered into /usr/bin, they could do 'file.path:/usr/bin/*'. 
If a user wanted a list of all files, they could do "file:*". Or, if a 
user wanted to find all actions (directory, file, or link) which 
delivered into /usr/bin, they could search on "*.path:/usr/bin/*". As a 
final example, suppose a user wanted to search the descriptions of 
packages for "game", they would use "set.description:game". Some details 
of the syntax need to be worked out. It's quite possible that "." as a 
separator won't work well there. This will also include things like 
searching for a particular version string.

I would also like to add searching against an incorporation. Especially 
for publishing, it appears that having this as part of the API would be 
useful. Also, it may allow us to better control the noise that's 
currently spit to a users screen by only returning the version(s) of 
actions/packages which satisfy the incorporations currently installed on 
their machine by default. This would only be used for remote search, and 
might well end up as an option, rather than as part of the query space.



That's roughly what I have in mind to do. Depending on how difficult the 
various features turn out to be, and what time pressure appears, some 
may be dropped, or others added based on feedback here.

Thanks for the help,
Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to