> Q: I'm assuming any glob patterns would implicitly be anchored to the end
of
> the path string (as they are in bash)?

Yes. In ctags, '(<pattern>)' matches to file names not path names, like
'.c.h'.

> Yes I know... In fact after originally looking at global and ctags
> I thought how potentially dangerous ctags's --force-language option
> was and that's why I called my extension suffixless_langmap.
> My intention was  that this option wouldn't force anything but instead
> provide a default language when there wasn't a file suffix.
>
> For example, in project include directories you quite often get other
> artefacts like .c, .texi, .html (I know that these get excluded) and
> .inc files (MSVS). If the --force-language override option is used on
> those include directories then files with a suffix don't automatically
> get handled the way they should. Instead you'd possibly have to put in
> additional more specific --force-language overrides to reinstate default
> behaviour for certain extensions. E.g.:

You are right. It is a important point. You should be able to finely
control.

How about using a 'file list' instead of a direct path.

--language-force=<lang>:<file list>

File list is a file which lists file names.

e.g.
[cppfiles]
+-----------------------------
|include/c++/4.8/algorithm
|include/c++/4.8/bits/stl_algo.h
|include/c++/5.1/algorithm

$ gtags --language-force=cpp:cppfiles

You can use find(1) command to make a file list.
This will satisfy your request too, because find(1) has both glob and
regex. :)

New priority:
[high]
1. --language-force=<lang>:<file list>
2. langmap=<lang>:<suffix or glob pattern list>
[low]

What do you think?

> If/when someone comes to work on this, my patch is probably still worth
> a look as 70-80% of it is done with respect to the proposal above.
> Either way some of it may be of use.

Thank you so much.

Regards,
Shigio


2016-10-05 4:09 GMT+09:00 Cooper, Anthony <[email protected]>:

> SECURITY CLASSIFICATION: OFFICIAL
>
>
> Good morning :-)
>
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Shigio YAMAGUCHI
> > Sent: 04 October 2016 01:19
> > To: Cooper, Anthony
> > Cc: [email protected]
> > Subject: Re: GNU Global Parsing Suffixless Files Patch
> >
> > Good morning :)
> > I understood regex version of --language-force is very powerful.
> > However, it seems too powerful for us to manage it completely.
> >
> > How about releasing the real path version and '()' syntax first?
> > It's simple and easy to understand, and is similar to ctags.
> > At the stage now, no one can judge whether regex version is needed,
> > because no one has used even the real path version.
> >
> > >        E.g. If I had:
> > >        Default: \
> > >        :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
> > >                               --force-language='cpp\:(^\\./Microsoft
> Visual)':
> > >
> > > Then this would say match all files ending in sys and treat them as
> > > yacc and any suffixless files with a path starting with `./Microsoft
> > > Visual' are to be treated as cpp files.
> >
> > Using the real path version and '()' syntax, that is realized easily
> like this:
> >
> >         [gtags.conf]
> >         :langmap=yacc\:(*sys):
> >
> >         $ gtags --force-language='yacc:Microsoft Visual'
> >
>
> A very minor point: the `Microsoft Visual' examples are different as my RE
> only matches at the head of the path.
>
> I guess I get nervous putting in more limited matching mechanisms inside
> an option that is designed to override the normal default/sane behaviour; I
> would like to be as precise as possible in my overrides. Also most would
> use the simple substring match, but regex's are there for edge cases that
> we haven't thought of. Most devs are comfortable with REs.
>
> Q: I'm assuming any glob patterns would implicitly be anchored to the end
> of the path string (as they are in bash)?
>
> > > One thing to note, made in the man page and help text, is this
> > > switch won't affect any files with a suffux, which some people might
> > > expect with `force' in the name of the switch.
> >
> > In ctags, --language-force option ignores suffixes. I'd like to follow
> > ctags method.
>
> Yes I know... In fact after originally looking at global and ctags I
> thought how potentially dangerous ctags's --force-language option was and
> that's why I called my extension suffixless_langmap. My intention was  that
> this option wouldn't force anything but instead provide a default language
> when there wasn't a file suffix.
>
> For example, in project include directories you quite often get other
> artefacts like .c, .texi, .html (I know that these get excluded) and .inc
> files (MSVS). If the --force-language override option is used on those
> include directories then files with a suffix don't automatically get
> handled the way they should. Instead you'd possibly have to put in
> additional more specific --force-language overrides to reinstate default
> behaviour for certain extensions. E.g.:
>
>         --force-language=cpp:include --force-language=c:.c
> --force-language=makefile:([Mm]akefile) ...
>
> However with REs you could be more selective in your initial
> --force-language setting and avoid the subsequent detailed extension
> overrides.
>
>         --force-language='cpp:(/include/(.*/)*[^/.]?$)'
>
> In a glob pattern as far as I'm aware there's no way of saying `select
> files not containing a period' :-(.
>
> >
> > $ ctags --language-force=c test.php # test.php is treated as C source
> > file
> >
> > How about setting the following priority?
> > (This --language-force is the real path version)
> >
> > [high]
> > 1. --language-force=<lang>:<file>
> > 2. --language-force=<lang>:<directory>
> > 3. langmap=<lang>:<suffix or glob pattern list> [low]
> >
> > e.g.
> > [gtags.conf]
> > :langmap=c\:.x([Mm]ake):
> >
> > $ gtags --language-force=perl:dir1 --language-force=php:php.x
> >
> > ./
> >  |-dir1/
> >  |  |-test.x    => perl by --language-force=perl:dir1
> >  |  |-Make      => perl by --language-force=perl:dir1
> >  |  |-php.x     => php by --language-force=php:php.x
> >  |-dir2
> >     |-test.x    => c by langmap=c\:.x([Mm]ake):
> >     |-Make => c by langmap=c\:.x([Mm]ake):
> >
>
> The priorities look fine to me.
>
> Whilst I think it's a _bit_ of a pity not to have REs for the reasons
> pointed out above, none of the issues are insurmountable with a glob
> implementation, just possibly less obvious? But more consistent as you say
> with ctags. So as you say start off with globs and see :-).
>
> Many thanks for being so helpful and constructive, it is appreciated as is
> Global.
>
> If/when someone comes to work on this, my patch is probably still worth a
> look as 70-80% of it is done with respect to the proposal above. Either way
> some of it may be of use.
>
> Regards,
>
> Tony.
>
> > > Did you correctly receive the new patch for 6.5.5?
> >
> > Sorry but I did not read that at all. I would like to discuss about
> > the specification not about the implementation.
> >
> > Regards,
> > Shigio
> >
> >
> > 2016-10-03 21:34 GMT+09:00 Cooper, Anthony
> > <[email protected]>:
> >
> >
> >       SECURITY CLASSIFICATION: OFFICIAL
> >
> >
> >       Good morning :-) (See comments below)
> >
> >       > -----Original Message-----
> >       > From: [email protected] [mailto:[email protected]] On
> Behalf Of
> >       > Shigio YAMAGUCHI
> >       > Sent: 01 October 2016 00:17
> >       > To: Cooper, Anthony
> >       > Cc: [email protected]
> >       > Subject: Re: GNU Global Parsing Suffixless Files Patch
> >       >
> >       > Before implementation, I would like to make clear the
> specification.
> >       >
> >       > > Assorted projects I've come across have include and Include
> (the
> >       > > example below is a trivial but a real one relating to
> MS-Windows)
> >       > > and some even have include dirs names XInclude or something
> > similar
> >       > > (can't remember the project now, wasn't X11 but probably an X
> > client).
> >       >
> >       > Let me ask a couple of questions, please.
> >       >
> >       >
> >       > Q1: Is the following (1) and (2) equal?
> >       >
> >       >         (1) --language-force='cpp:([Ii]nclude)'
> >       >         (2) --language-force='cpp:include' --language-
> > force='cpp:Include'
> >       >
> >       >     If so, you think that (1) is better than (2) since it is
> shorter?
> >
> >       Yes precisely. Although perhaps I gave a rather weak example. A
> > stronger case would be when differentiating between say:
> >               /usr/include/C++/4.8/algorithm
> >               /usr/include/C++/5.1/algorithm
> >               /usr/include/C++/..../algorithm
> >       And:
> >               ./project/helper-programs/algorithm/sort/qsort  <- script
> or
> > binary
> >
> >       Or to match:
> >               .../include/sys
> >       But not:
> >               .../include/system_errors
> >
> >       If I wanted to catch the first set of files in both example without
> > tripping up over the second then I could do --language-
> > force=cpp:(algorithm\$)  and --language-force=cpp:(sys\$).
> >
> >       >
> >       > Q2: Does (1) above match to the followings?
> >       >
> >       >         ./XXXincludeYYY/
> >       >         ./XXXincludeYYY.php
> >       >         ./project/include/release/
> >       >         ./project/include/release/test.php
> >
> >       Yes. The matching is a dumb substring or regex match on the path
> > string available around where decide_lang() is called. No anchoring by
> > default.
> >
> >       >
> >       > Q3: Regex '^' and '$' are available? If so, what does they mean?
> >
> >       Yes they are. `^' would mean start matching at the beginning of the
> > path and `$' would mean match the end of the path (particularly useful
> > for just picking up matches against a file name as directories in
> > themselves aren't processed beyond traversal). File globbing doesn't
> > make ^ and $ available and I have come across other
> > programs/situations where I have been frustrated by this for want of a
> regex. E.g. If I had:
> >               Default: \
> >               :GTAGS_OPTIONS=--force-language=yacc\:(sys\$): \
> >                                      --force-language='cpp\:(^\\./Microsoft
> Visual)':
> >       Then this would say match all files ending in sys and treat them as
> > yacc and any suffixless files with a path starting with `./Microsoft
> > Visual' are to be treated as cpp files.
> >
> >       One thing to note, made in the man page and help text, is this
> switch
> > won't affect any files with a suffux, which some people might expect
> > with `force' in the name of the switch.
> >
> >       Did you correctly receive the new patch for 6.5.5?
> >
> >       Many thanks once again :-).
> >
> >       Regards Tony.
> >       >
> >       > Regards,
> >       > Shigio
> >       >
> >       > --
> >       >
> >       > Shigio YAMAGUCHI <[email protected]>
> >       > PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE
> > DDA3
> >       >
> >       >
> >       >
> > __________________________________________________________
> >
> >       > ____________
> >       > This email has been scanned by the Symantec Email Security.cloud
> > service.
> >       > For more information please visit http://www.symanteccloud.com
> >       >
> > __________________________________________________________
> >       > ____________
> >
> >
> >       ****************************************************
> > ************************
> >       Communications with GCHQ may be monitored and/or recorded
> >       for system efficiency and other lawful purposes. Any views or
> >       opinions expressed in this e-mail do not necessarily reflect GCHQ
> >       policy.  This email, and any attachments, is intended for the
> >       attention of the addressee(s) only. Its unauthorised use,
> >       disclosure, storage or copying is not permitted.  If you are not
> the
> >       intended recipient, please notify [email protected].
> >
> >       This information is exempt from disclosure under the Freedom of
> >       Information Act 2000 and may be subject to exemption under
> >       other UK information legislation. Refer disclosure requests to
> >       GCHQ on 01242 221491 ext 30306 (non-secure) or email
> >       [email protected]
> >
> >       ****************************************************
> > ************************
> >
> >
> >
> >
> >
> >
> > --
> >
> > Shigio YAMAGUCHI <[email protected]>
> > PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE DDA3
> >
> >
> > __________________________________________________________
> > ____________
> > This email has been scanned by the Symantec Email Security.cloud service.
> > For more information please visit http://www.symanteccloud.com
> > __________________________________________________________
> > ____________
>
>


-- 
Shigio YAMAGUCHI <[email protected]>
PGP fingerprint: D1CB 0B89 B346 4AB6 5663  C4B6 3CA5 BBB3 57BE DDA3
_______________________________________________
Bug-global mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-global

Reply via email to