[issue32198] \b reports false-positives in Indic strings involving combining marks
New submission from Shriramana Sharma <samj...@gmail.com>: Code: import re cons_taml = "[கஙசஞடணதநபமயரலவழளறன]" print(re.findall("\\b" + cons_taml + "ை|ஐ", "ஐவர் பையன் இசை சிவிகை இல்லை இவ்ஐ")) cons_deva = "[कखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसह]" print(re.findall("\\b" + cons_deva + "ै|ऐ", "ऐषमः तैलम् ईडै समीशै ईक्षै ईक्ऐ")) Specs: Kubuntu Xenial 64 bit Python 3.5.2 (default, Nov 23 2017, 16:37:01) [GCC 5.4.0 20160609] on linux Actual Output: ['ஐ', 'பை', 'கை', 'லை', 'ஐ'] ['ऐ', 'तै', 'शै', 'षै', 'ऐ'] Expected Output: ['ஐ', 'பை'] ['ऐ', 'तै'] Rationale: The formulated RE desires to identify words *starting* with the vowel /ai/ (\u0BC8 ை in Tamil script and \u0948 ै in Devanagari as vowel sign or \u0B90 ஐ \u0910 ऐ as independent vowel). ஐவர் பையன் and ऐषमः तैलम् are the only words fitting this criterion. \b is defined to mark a word boundary and is here applied at the beginning of the RE. Observation: There seems to be some assumption that only GC=Lo characters constitute words. Hence the false positives at ச ி வ ி (க ை) and स म ी (श ै) where the ி and ी are vowel signs, and இ ல ் (ல ை) and ई क ् (ष ै) where the ் and ् are virama characters or vowel cancelling signs. In Indic, such GC=Mc and GC=Mn characters are inalienable parts of words. They should be properly identified as parts of words and no word boundary answering to \b should be generated at their positions. -- components: Regular Expressions messages: 307430 nosy: ezio.melotti, jamadagni, mrabarnett priority: normal severity: normal status: open title: \b reports false-positives in Indic strings involving combining marks type: behavior versions: Python 3.5 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32198> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10984] argparse add_mutually_exclusive_group should accept existing arguments to register conflicts
Changes by Shriramana Sharma samj...@gmail.com: -- nosy: +jamadagni ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10984 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10984] argparse add_mutually_exclusive_group should accept existing arguments to register conflicts
Shriramana Sharma added the comment: I also wish to see argparse allowing me to define a group of arguments that conflict with another argument or another group of arguments and FWIW I feel the help output should be like: prog [ --conflicter | [ --opt1 ] [ --opt2 ] ] where --conflicter conflicts with --opt1 and --opt2 but those two don't conflict with each other and all are optional. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10984 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20433] add aliasedname() and namedaliases() methods to unicodedata module
New submission from Shriramana Sharma: Currently we have unicodedata.name() which returns the formal character name of the character chr as per the second column in UnicodeData.txt from http://www.unicode.org/Public/UNIDATA/. However, there are a few characters where the formal character name has spelling mistakes. Also, the control characters in the Basic Latin and Latin-1 blocks aren't really given meaningful character names. In one case, that of FEFF, the formal name ZERO WIDTH NO-BREAK SPACE refers to a deprecated usage of the character (and the alternate name BYTE ORDER MARK refers to the recommended usage). In all these cases, improved names are provided as stable aliases in NameAliases.txt from the same UNIDATA source. These are also part of the stable standard and are intended to alleviate the naming situation w.r.t. the above issues. For the stability, see: http://www.unicode.org/policies/stability_policy.html#Formal_Name_Alias Hence it would be most useful if the unicodedata module would add an aliasedname() method with the same signature as name() to provide the official aliased name in the case of characters with aliases, and when a character does not have an alias, to provide the same output as name(). As of Py 3.3, unicodedata.lookup() already uses/supports NameAliases.txt for returning the character given the name. The present requirement is to use it for returning the name given the character. Note that NameAliases.txt has abbreviated names for some characters (where the third column reads abbreviation). While these would be useful for lookup(), they would not be useful to be returned for aliasedname(). For instance, one would prefer to see SPACE returned for 0020 rather than SP. So these entries should be disregarded for aliasedname(). Also, NameAliases.txt has multiple entries for some characters even after discarding the abbreviation entries. In these cases, the first entry should be used (for want of a better rule). It is presumed that these are provided in some order of preference. It should be noted that discussion on this topic on the unicore (Unicode members) mailing list (on the thread When normative aliases exist... started 2014-01-21) indicates that the order of entries is subject to change although the entries themselves will not be removed. In this case, the first non-abbreviation entry may change. This is acceptable for the behaviour of aliasedname(). Also note that aliases may be defined in future. Thus the string returned by aliasedname() for a given character is not guaranteed to be the same, but whatever is returned by it will surely be valid to use with lookup(). Those who desire a single immutable name and do not require the improvements provided by the aliases should use name() and not aliasedname(). Finally, for extended support, a namealiases() function should return all the aliases together with their types, allowing the user full choice of the desired but official alias. The attached code should clarify the required behaviour. (It is not a patch, just an illustration.) -- components: Unicode files: aliasedname.py messages: 209618 nosy: ezio.melotti, haypo, jamadagni priority: normal severity: normal status: open title: add aliasedname() and namedaliases() methods to unicodedata module type: enhancement versions: Python 3.3 Added file: http://bugs.python.org/file33788/aliasedname.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue20433 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17991] ctypes.c_char gives a misleading error when passed a one-character unicode string
Shriramana Sharma added the comment: I came upon this too. In Python 2 it used to expect a one character string. Apparently the same error message has been carried forward to Python 3 too, though now the actual expected input is either a one character bytes type and not a str type, or an int corresponding to the ord() value of that char. Minimal demonstration: $ python Python 2.7.4 (default, Apr 19 2013, 18:28:01) [GCC 4.7.3] on linux2 Type help, copyright, credits or license for more information. from ctypes import * class test ( Structure ) : ... _fields_ = [ ( ch, c_char ) ] ... a = test() a.ch = ord('a') Traceback (most recent call last): File stdin, line 1, in module TypeError: one character string expected a.ch = 'c' a.ch 'c' $ python3 Python 3.3.1 (default, Apr 17 2013, 22:30:32) [GCC 4.7.3] on linux Type help, copyright, credits or license for more information. from ctypes import * class test ( Structure ) : ... _fields_ = [ ( ch, c_char ) ] ... a = test() a.ch = 'c' Traceback (most recent call last): File stdin, line 1, in module TypeError: one character string expected a.ch = b'c' a.ch b'c' a.ch = ord('c') a.ch b'c' -- nosy: +jamadagni ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17991 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6386] importing yields unexpected results when initial script is a symbolic link
Shriramana Sharma added the comment: I'm sorry but I don't get why this is a WONTFIX. I reported what is (now) apparently a dup: issue 18067. Just like the OP of this bug, I feel that in doing testing and such, one would naturally symlink and expect the library in the *current* directory to be imported. And about the CWD, I have demonstrated in issue 18067 how the CWD is in fact reported to be the directory of the *source* of the symlink (i.e. the dir containing the symlink inode) and not the *target* of the symlink. This is precisely what is frustrating about this bug: the fact that Python does not import something from a directory which it reports to be the current directory as per os.getcwd(). While I myself lack the internal CPython code knowledge to fix this, I can't imagine this would be too difficult to fix, given that os.getcwd() already reports the correct current directory -- in setting up the import path list, you just have to use that i.o. whatever else you are using now. Thanks. -- nosy: +jamadagni ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6386 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6386] importing yields unexpected results when initial script is a symbolic link
Shriramana Sharma added the comment: The current behaviour is also needed to sanely support Python scripts symlinked from Linux /bin directories. OK that clinched it for me -- I can't argue against that! And obviously it is not meaningful to copy/symlink *all* the current-directory modules a particular script depends upon to the symlink directory as well. And searching both directories (containing the source and target of the symlink) is not good for security I guess. And I also checked the contents of sys.path with my test case -- and sure enough the directory corresponding to the actual output was printed. Just that sys.path is different from os.getcwd() needs some effort to bring into mind. So I think someone should please clearly mention this behaviour in the documentation under http://docs.python.org/3/tutorial/modules.html#the-module-search-path and the Py2 equivalent. Specifically this point needs clarification: the directory containing the input script (or the current directory). It would be best to remove the text in parantheses (which immediately makes one think of os.getcwd()) and add a clarification like: Note that on filesystems that support symlinks, this means the directory containing the actual script file. Symlinks to the script may be present elsewhere and may be used to invoke the script, but the directories containing those symlinks will *not* be searched for dependency modules. Thank you very much for these clarifications and for your work on Python! Please do add the above documentation clarification, though. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6386 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18067] In considering the current directory for importing modules, Python does not honour the output of os.getcwd()
New submission from Shriramana Sharma: Hello. I first asked about this at https://groups.google.com/d/topic/comp.lang.python/ZOGwXGU_TV0/discussion and am only posting this issue due to no reply there. I am using Python 2.7.4 and Python 3.3.1 (default packages) on Kubuntu Raring. On both I experience this same bug. This bug has to do with a Python script called via a symlink. To illustrate it I have included a minimal test case tarball as an attachment. Just run sh test.sh at root of the extracted tree to see what's happening. I am quite unpleasantly surprised that when one calls a Python script via a symlink, and that script asks for a module to be imported, Python searches the directory in which the *target* of the link exists, and uses the version of the library present *there* if any, and raises an exception if not. It even follows a chain of symlinks. Any version of the library present in the same directory as the (user-called) symlink is ignored. This is totally counter-intuitive behaviour and should be treated as a bug and fixed. This is all the more frustrating since running print(os.getcwd()) from the same script correctly prints the current directory in which the *symlink* and not its target exists. (See output of the attached scripts.) Now the symlink is only a user-level file system convenience indicating that I create a virtual file in one place pointing to another file elsewhere. Whatever the rest of the contents of the directory containing the other file is immaterial to me -- I am only interested in the one file I am symlinking to. I am executing a script from a given directory. os.getcwd() correctly prints the path of that directory. I also have a library in that same directory for the script to import. I would expect Python to honour the output of os.getcwd() in doing import too. I read through http://docs.python.org/3/reference/import.html and didn't seem to find any explanation for the current illogical behaviour. (Please point out if I have missed it.) (Note that the same behaviour does not happen with hardlinks, probably since the filesystem itself shows the whole file as existing at the current location.) Output of the test.sh script: Trying english/run.py CWD: /tmp/symlink-bug/english Hello Shriramana! Trying english/run-link.py symlinked to ./run.py CWD: /tmp/symlink-bug/english Traceback (most recent call last): File run-link.py, line 3, in module from greet import greet ImportError: No module named greet Trying english/run-link-link.py symlinked to ./run-link.py symlinked to english/run.py CWD: /tmp/symlink-bug/english Hello Shriramana! Trying sanskrit/run-slink.py symlinked to english/run.py CWD: /tmp/symlink-bug/sanskrit Hello Shriramana! Trying sanskrit/run-hlink.py hardlinked to english/run.py CWD: /tmp/symlink-bug/sanskrit Namaste Shriramana! Expected output: (see esp items marked 1 and 2 below): Trying english/run.py CWD: /tmp/symlink-bug/english Hello Shriramana! 1 Trying english/run-link.py symlinked to ./run.py CWD: /tmp/symlink-bug/english Hello Shriramana! Trying english/run-link-link.py symlinked to ./run-link.py symlinked to english/run.py CWD: /tmp/symlink-bug/english Hello Shriramana! 2 Trying sanskrit/run-slink.py symlinked to english/run.py CWD: /tmp/symlink-bug/sanskrit Namaste Shriramana! Trying sanskrit/run-hlink.py hardlinked to english/run.py CWD: /tmp/symlink-bug/sanskrit Namaste Shriramana! -- components: Extension Modules files: symlink-bug.tar.gz messages: 190098 nosy: jamadagni priority: normal severity: normal status: open title: In considering the current directory for importing modules, Python does not honour the output of os.getcwd() Added file: http://bugs.python.org/file30386/symlink-bug.tar.gz ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18067 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6294] Improve shutdown exception ignored message
Changes by Shriramana Sharma samj...@gmail.com: -- nosy: +jamadagni ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6294 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com