On 18/06/2018 12:33, Chris Angelico wrote:
On Mon, Jun 18, 2018 at 9:16 PM, Bart <b...@freeuk.com> wrote:

What will those look like? If copyright/licence comments have their own
specific syntax, then they just become another token which has to be
recognised.

If they have specific syntax, they're not comments, are they?

So how is it possible for ANY program to determine what kind of comments they are?

I've used 'smart' comments myself, which contain special information, but are also designed to be very easily detected by the simplest of programs which scan the source code. For that purpose, they might start with a special prefix so that it is not necessary to parse the special information, but just to detect the prefix.

For example, comments that start with #T# (and in my case, that begin at the start of a line). Funnily enough, this also provided type information (although for different purposes than what is discussed here).


The main complication I can see is that, if this is really a one-time
source-to-source translator so that you will be working with the result,
then usually you will want to keep the comments.

Then it is a question of more precisely defining the task that such a
translator is to perform.

Right, exactly. So you need to do an actual smart parse, which - as
mentioned - is functionally equivalent whether you're stripping
comments or some lexical token.

The subject is type annotation. Presumably there is some way to distinguish such a type annotation within a comment from a regular comment? Such as the marker I suggested above.

Then the tokeniser just needs to detect that kind of comment rather than need to understand the contents.

Although the tokeniser will need to work a little differently by maintaining the positions of all tokens within the line, information that is usually discarded.

--
bart
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to