Hello all,

I'd like to request some advice/thoughs/help on selecting a name for a module. The working name is 'Regexp::Query', but as it happens the realization of the original idea has moved slightly ahead of that so I'm no longer confident on the
suitability...

The Scratched Itch:

I have a large number of commandline tools, and quite frequently I want the user
to be able to express, with some flag(s), a selection among something.

Example: the user gives the command

  SomeCommand /some/path

and it will scan the path and for all files it finds it will do something useful. Now, I also want to provide flags for the command such that they can say

  SomeCommand -exclude 'some_regexp' /some/path

Obviously not a problem, and I also provide the reverse if that is more
convenient:

  SomeCommand -include 'another_regexp' /some/path

and this can be extended so flags can be given multiple times and interweaved:

  SomeCommand -include 'rx1' -exclude 'rx2' -include 'rx3' /some/path

coupled with rules that then shrinks the target set based on rx1, shrinks that
set using rx2 etc etc. I think you get the idea.

What I found however is that it becomes hard to string together regexps to find the exact subset you want. In fact, while regexps are powerful, they're not that suited to easily mix multiple of them, and some expressions are basically
impossible, especially to provide a commandline interface to them...

Thus, instead I'd like to provide a more capable way for a user to provide a more complex query, i.e. where it'd be possible to use AND/OR/NOT, including
parenthesized, e.g. something very contrived:

  (
    REGEXP/some_rx_1/ AND REGEXP/some_rx_2/
  ) OR
  (
    REGEXP/some_rx_3/ AND NOT REGEXP/some_rx_4/
  ) OR
  NOT (
        REGEXP/some_rx_5/ OR NOT REGEXP/some_rx_6/
      )

Basically, feed 'something' the query and a list of scalars and get back a list of the subset of scalars that fulfills the query. In short, behaving like a
grep, you might say.

This was my original goal, and stopping there, arguably calling such a module
"Regexp::Query" could make sense IMHO.

===

However, moving beyond this I realized two things:

1) It would be generically useful to allow simple numerical comparisons on the
scalars, i.e. the usual suspects ==, !=, >, >=, <, <=
2) Why only scalars? With some additions it can handle lists of raw hashes, or
even arbitrary objects

The first now provides the opportunity to write a queries like this:

  EQ(42) OR GTE(99) OR REGEXP(1\d\d\d2)

This is not so interesting with a list of plain scalars, but adding the second capability, I introduce the notion of 'fields'. With plain hashes, this equates to plainly looking up the keys, but with objects we can provide a special hash which has keys corresponding to small anonymous subs that knowns how to dig out
the data we want. Given a query like this:

  name.REGEXP(^A) and age.GT(30)

and having any kind of Perl objects that has these 'fields', we can do this:

  my $fa = FieldAccessor(
             {
               name => sub { $_[0]->getName() },
               age => sub { $_[0]->calculateAge() },
             });

which, passed along with the list of objects, will perform the query.

Given the last features, I feel I'm slightly beyond just something in the Regexp:: namespace, but I have no good feel for what it could be instead. Pretty generic in a way, so just plain 'Data::Query' perhaps (ignoring for a second that that name appears to be taken for some module I at present don't quite
understand what it's for...perhaps there's a fit...?)

Maybe I should just stick with the R::Q naming...:-)

Any feedback helpful.

TIA,

ken1 (CPAN id: KNTH)


Reply via email to