[issue32198] \b reports false-positives in Indic strings involving combining marks

2017-12-02 Thread Shriramana Sharma

New submission from Shriramana Sharma <samj...@gmail.com>:

Code:

import re
cons_taml = "[கஙசஞடணதநபமயரலவழளறன]"
print(re.findall("\\b" + cons_taml + "ை|ஐ", "ஐவர் பையன் இசை சிவிகை இல்லை இவ்ஐ"))
cons_deva = "[कखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसह]"
print(re.findall("\\b" + cons_deva + "ै|ऐ", "ऐषमः तैलम् ईडै समीशै ईक्षै ईक्ऐ"))

Specs:
Kubuntu Xenial 64 bit
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux

Actual Output:
['ஐ', 'பை', 'கை', 'லை', 'ஐ']
['ऐ', 'तै', 'शै', 'षै', 'ऐ']

Expected Output:
['ஐ', 'பை']
['ऐ', 'तै']

Rationale:

The formulated RE desires to identify words *starting* with the vowel /ai/ 
(\u0BC8 ை in Tamil script and \u0948 ै in Devanagari as vowel sign or \u0B90 ஐ 
\u0910 ऐ as independent vowel). ஐவர் பையன் and ऐषमः तैलम् are the only words 
fitting this criterion. \b is defined to mark a word boundary and is here 
applied at the beginning of the RE.

Observation:

There seems to be some assumption that only GC=Lo characters constitute words. 
Hence the false positives at ச ி வ ி (க ை) and स म ी (श ै) where the ி and ी 
are vowel signs, and இ ல ் (ல ை) and ई क ् (ष ै) where the ் and ् are virama 
characters or vowel cancelling signs.

In Indic, such GC=Mc and GC=Mn characters are inalienable parts of words. They 
should be properly identified as parts of words and no word boundary answering 
to \b should be generated at their positions.

--
components: Regular Expressions
messages: 307430
nosy: ezio.melotti, jamadagni, mrabarnett
priority: normal
severity: normal
status: open
title: \b reports false-positives in Indic strings involving combining marks
type: behavior
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32198>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10984] argparse add_mutually_exclusive_group should accept existing arguments to register conflicts

2014-02-28 Thread Shriramana Sharma

Changes by Shriramana Sharma samj...@gmail.com:


--
nosy: +jamadagni

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10984
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10984] argparse add_mutually_exclusive_group should accept existing arguments to register conflicts

2014-02-28 Thread Shriramana Sharma

Shriramana Sharma added the comment:

I also wish to see argparse allowing me to define a group of arguments that 
conflict with another argument or another group of arguments and FWIW I feel 
the help output should be like:

prog [ --conflicter | [ --opt1 ] [ --opt2 ] ]

where --conflicter conflicts with --opt1 and --opt2 but those two don't 
conflict with each other and all are optional.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10984
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20433] add aliasedname() and namedaliases() methods to unicodedata module

2014-01-28 Thread Shriramana Sharma

New submission from Shriramana Sharma:

Currently we have unicodedata.name() which returns the formal character name of 
the character chr as per the second column in UnicodeData.txt from 
http://www.unicode.org/Public/UNIDATA/.

However, there are a few characters where the formal character name has 
spelling mistakes. Also, the control characters in the Basic Latin and Latin-1 
blocks aren't really given meaningful character names. In one case, that of 
FEFF, the formal name ZERO WIDTH NO-BREAK SPACE refers to a deprecated usage of 
the character (and the alternate name BYTE ORDER MARK refers to the recommended 
usage).

In all these cases, improved names are provided as stable aliases in 
NameAliases.txt from the same UNIDATA source. These are also part of the stable 
standard and are intended to alleviate the naming situation w.r.t. the above 
issues. For the stability, see: 
http://www.unicode.org/policies/stability_policy.html#Formal_Name_Alias

Hence it would be most useful if the unicodedata module would add an 
aliasedname() method with the same signature as name() to provide the official 
aliased name in the case of characters with aliases, and when a character does 
not have an alias, to provide the same output as name().

As of Py 3.3, unicodedata.lookup() already uses/supports NameAliases.txt for 
returning the character given the name. The present requirement is to use it 
for returning the name given the character.

Note that NameAliases.txt has abbreviated names for some characters (where the 
third column reads abbreviation). While these would be useful for lookup(), 
they would not be useful to be returned for aliasedname(). For instance, one 
would prefer to see SPACE returned for 0020 rather than SP. So these 
entries should be disregarded for aliasedname().

Also, NameAliases.txt has multiple entries for some characters even after 
discarding the abbreviation entries. In these cases, the first entry should be 
used (for want of a better rule). It is presumed that these are provided in 
some order of preference.

It should be noted that discussion on this topic on the unicore (Unicode 
members) mailing list (on the thread When normative aliases exist... started 
2014-01-21) indicates that the order of entries is subject to change although 
the entries themselves will not be removed. In this case, the first 
non-abbreviation entry may change. This is acceptable for the behaviour of 
aliasedname(). Also note that aliases may be defined in future. Thus the string 
returned by aliasedname() for a given character is not guaranteed to be the 
same, but whatever is returned by it will surely be valid to use with lookup(). 
Those who desire a single immutable name and do not require the improvements 
provided by the aliases should use name() and not aliasedname().

Finally, for extended support, a namealiases() function should return all the 
aliases together with their types, allowing the user full choice of the desired 
but official alias.

The attached code should clarify the required behaviour. (It is not a patch, 
just an illustration.)

--
components: Unicode
files: aliasedname.py
messages: 209618
nosy: ezio.melotti, haypo, jamadagni
priority: normal
severity: normal
status: open
title: add aliasedname() and namedaliases() methods to unicodedata module
type: enhancement
versions: Python 3.3
Added file: http://bugs.python.org/file33788/aliasedname.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20433
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17991] ctypes.c_char gives a misleading error when passed a one-character unicode string

2013-06-02 Thread Shriramana Sharma

Shriramana Sharma added the comment:

I came upon this too. In Python 2 it used to expect a one character string. 
Apparently the same error message has been carried forward to Python 3 too, 
though now the actual expected input is either a one character bytes type and 
not a str type, or an int corresponding to the ord() value of that char.

Minimal demonstration:

$ python
Python 2.7.4 (default, Apr 19 2013, 18:28:01) 
[GCC 4.7.3] on linux2
Type help, copyright, credits or license for more information.
 from ctypes import *
 class test ( Structure ) :
... _fields_ = [ ( ch, c_char ) ]
... 
 a = test()
 a.ch = ord('a')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: one character string expected
 a.ch = 'c'
 a.ch
'c'
 

$ python3
Python 3.3.1 (default, Apr 17 2013, 22:30:32) 
[GCC 4.7.3] on linux
Type help, copyright, credits or license for more information.
 from ctypes import *
 class test ( Structure ) :
... _fields_ = [ ( ch, c_char ) ]
... 
 a = test()
 a.ch = 'c'
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: one character string expected
 a.ch = b'c'
 a.ch
b'c'
 a.ch = ord('c')
 a.ch
b'c'


--
nosy: +jamadagni

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17991
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6386] importing yields unexpected results when initial script is a symbolic link

2013-05-27 Thread Shriramana Sharma

Shriramana Sharma added the comment:

I'm sorry but I don't get why this is a WONTFIX. I reported what is (now) 
apparently a dup: issue 18067. Just like the OP of this bug, I feel that in 
doing testing and such, one would naturally symlink and expect the library in 
the *current* directory to be imported. 

And about the CWD, I have demonstrated in issue 18067 how the CWD is in fact 
reported to be the directory of the *source* of the symlink (i.e. the dir 
containing the symlink inode) and not the *target* of the symlink. This is 
precisely what is frustrating about this bug: the fact that Python does not 
import something from a directory which it reports to be the current directory 
as per os.getcwd(). 

While I myself lack the internal CPython code knowledge to fix this, I can't 
imagine this would be too difficult to fix, given that os.getcwd() already 
reports the correct current directory -- in setting up the import path list, 
you just have to use that i.o. whatever else you are using now.

Thanks.

--
nosy: +jamadagni

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6386
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6386] importing yields unexpected results when initial script is a symbolic link

2013-05-27 Thread Shriramana Sharma

Shriramana Sharma added the comment:

 The current behaviour is also needed to sanely support Python 
 scripts symlinked from Linux /bin directories.

OK that clinched it for me -- I can't argue against that! And obviously it is 
not meaningful to copy/symlink *all* the current-directory modules a particular 
script depends upon to the symlink directory as well. And searching both 
directories (containing the source and target of the symlink) is not good for 
security I guess.

And I also checked the contents of sys.path with my test case -- and sure 
enough the directory corresponding to the actual output was printed. Just that 
sys.path is different from os.getcwd() needs some effort to bring into mind.

So I think someone should please clearly mention this behaviour in the 
documentation under 
http://docs.python.org/3/tutorial/modules.html#the-module-search-path and the 
Py2 equivalent. Specifically this point needs clarification:

the directory containing the input script (or the current directory).

It would be best to remove the text in parantheses (which immediately makes one 
think of os.getcwd()) and add a clarification like:

Note that on filesystems that support symlinks, this means the directory 
containing the actual script file. Symlinks to the script may be present 
elsewhere and may be used to invoke the script, but the directories containing 
those symlinks will *not* be searched for dependency modules.

Thank you very much for these clarifications and for your work on Python! 
Please do add the above documentation clarification, though.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6386
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18067] In considering the current directory for importing modules, Python does not honour the output of os.getcwd()

2013-05-26 Thread Shriramana Sharma

New submission from Shriramana Sharma:

Hello. I first asked about this at 
https://groups.google.com/d/topic/comp.lang.python/ZOGwXGU_TV0/discussion and 
am only posting this issue due to no reply there.

I am using Python 2.7.4 and Python 3.3.1 (default packages) on Kubuntu Raring. 
On both I experience this same bug.

This bug has to do with a Python script called via a symlink. To illustrate it 
I have included a minimal test case tarball as an attachment. Just run sh 
test.sh at root of the extracted tree to see what's happening.

I am quite unpleasantly surprised that when one calls a Python script via a 
symlink, and that script asks for a module to be imported, Python searches the 
directory in which the *target* of the link exists, and uses the version of the 
library present *there* if any, and raises an exception if not. It even follows 
a chain of symlinks. Any version of the library present in the same directory 
as the (user-called) symlink is ignored. This is totally counter-intuitive 
behaviour and should be treated as a bug and fixed.

This is all the more frustrating since running print(os.getcwd()) from the same 
script correctly prints the current directory in which the *symlink* and not 
its target exists. (See output of the attached scripts.)

Now the symlink is only a user-level file system convenience indicating that I 
create a virtual file in one place pointing to another file elsewhere. Whatever 
the rest of the contents of the directory containing the other file is 
immaterial to me -- I am only interested in the one file I am symlinking to. 

I am executing a script from a given directory. os.getcwd() correctly prints 
the path of that directory. I also have a library in that same directory for 
the script to import. I would expect Python to honour the output of os.getcwd() 
in doing import too.

I read through http://docs.python.org/3/reference/import.html and didn't seem 
to find any explanation for the current illogical behaviour. (Please point out 
if I have missed it.)

(Note that the same behaviour does not happen with hardlinks, probably since 
the filesystem itself shows the whole file as existing at the current location.)

Output of the test.sh script:

 Trying english/run.py 
CWD: /tmp/symlink-bug/english
Hello Shriramana!
 Trying english/run-link.py symlinked to ./run.py 
CWD: /tmp/symlink-bug/english
Traceback (most recent call last):
  File run-link.py, line 3, in module
from greet import greet
ImportError: No module named greet
 Trying english/run-link-link.py symlinked to ./run-link.py symlinked to 
english/run.py 
CWD: /tmp/symlink-bug/english
Hello Shriramana!
 Trying sanskrit/run-slink.py symlinked to english/run.py 
CWD: /tmp/symlink-bug/sanskrit
Hello Shriramana!
 Trying sanskrit/run-hlink.py hardlinked to english/run.py 
CWD: /tmp/symlink-bug/sanskrit
Namaste Shriramana!

Expected output: (see esp items marked 1 and 2 below):

 Trying english/run.py 
CWD: /tmp/symlink-bug/english
Hello Shriramana!
1  Trying english/run-link.py symlinked to ./run.py 
CWD: /tmp/symlink-bug/english
Hello Shriramana!
 Trying english/run-link-link.py symlinked to ./run-link.py symlinked to 
english/run.py 
CWD: /tmp/symlink-bug/english
Hello Shriramana!
2  Trying sanskrit/run-slink.py symlinked to english/run.py 
CWD: /tmp/symlink-bug/sanskrit
Namaste Shriramana!
 Trying sanskrit/run-hlink.py hardlinked to english/run.py 
CWD: /tmp/symlink-bug/sanskrit
Namaste Shriramana!

--
components: Extension Modules
files: symlink-bug.tar.gz
messages: 190098
nosy: jamadagni
priority: normal
severity: normal
status: open
title: In considering the current directory for importing modules, Python does 
not honour the output of os.getcwd()
Added file: http://bugs.python.org/file30386/symlink-bug.tar.gz

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18067
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6294] Improve shutdown exception ignored message

2013-05-19 Thread Shriramana Sharma

Changes by Shriramana Sharma samj...@gmail.com:


--
nosy: +jamadagni

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6294
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com