Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-19 Thread random832
On Wed, Sep 18, 2013, at 16:13, Dave Angel wrote:
 So is the bug in Excel, in Windows, or in the Python library?  Somebody
 is falling down on the job;  if Windows defines the string as ending at
 the first null, then the Python interface should use that when defining
 the text defined with CF_UNICODETEXT.
 
 Or maybe it's an example of ill-defined Windows specs.

I think it's a matter of ill-defined windows specs - the clipboard data
has an exact size, and in principle that size is always used, for
example, for numerous binary formats. But the text clipboard has often
been treated as a C string by a wide variety of different applications,
so one could argue that has become a de facto standard.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-19 Thread Neil Cerutti
On 2013-09-18, Dave Angel da...@davea.name wrote:
 On 18/9/2013 17:40, Neil Hodgson wrote:

 Dave Angel:

 So is the bug in Excel, in Windows, or in the Python library?  Somebody
 is falling down on the job;  if Windows defines the string as ending at
 the first null, then the Python interface should use that when defining
 the text defined with CF_UNICODETEXT.

 Everything is performing correctly. win32clipboard is low-level 
 direct access to the Win32 clipboard API. A higher level API which is 
 more easily used from Python could be defined on top of this if anyone 
 was motivated.

 Neil

 Clearly you miss the point.  If the clipboard API is defined to
 return a null-terminated string, then the problem is in the
 Python library which doesn't do a strlen() (or the
 wide-character equivalent;  I forget its name) on the results.

Python can't really know if you're pasting text or a screenshot
or what.

-- 
Neil Cerutti
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-19 Thread Dave Angel
On 19/9/2013 11:53, Neil Cerutti wrote:

 On 2013-09-18, Dave Angel da...@davea.name wrote:
 On 18/9/2013 17:40, Neil Hodgson wrote:

 Dave Angel:

 So is the bug in Excel, in Windows, or in the Python library?  Somebody
 is falling down on the job;  if Windows defines the string as ending at
 the first null, then the Python interface should use that when defining
 the text defined with CF_UNICODETEXT.

 Everything is performing correctly. win32clipboard is low-level 
 direct access to the Win32 clipboard API. A higher level API which is 
 more easily used from Python could be defined on top of this if anyone 
 was motivated.

 Neil

 Clearly you miss the point.  If the clipboard API is defined to
 return a null-terminated string, then the problem is in the
 Python library which doesn't do a strlen() (or the
 wide-character equivalent;  I forget its name) on the results.

 Python can't really know if you're pasting text or a screenshot
 or what.


 CF_UNICODETEXT  gives it a pretty good clue.

But random832 has already mentioned that it's an ill-specified Windows
interface;  different programs deal with it differently.  No way that a
the Python library should deal with that.

-- 
DaveA


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread random832
On Fri, Sep 13, 2013, at 12:09, random...@fastmail.us wrote:
 Anyway, to match behavior found in other applications when pasting from
 the clipboard, I would suggest using:
 
 if s.contains('\0'): s = s[:s.index('\0')]
 
 Which will also remove non-null bytes after the first null (but if the
 clipboard contains these, it won't be pasted into e.g. notepad).

I did some testing and confirmed you MUST do this - if you copy multiple
things from Excel, it may reuse this buffer and not clear the end of it.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread stephen . boulet
Thanks to everyone for their help. Using everyone's suggestions, this seems to 
work:

import win32clipboard, win32con

def getclipboard():
win32clipboard.OpenClipboard()
s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT)
win32clipboard.CloseClipboard()
if '\0' in s:
s = s[:s.index('\0')]
return s
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread MRAB

On 18/09/2013 20:28, stephen.bou...@gmail.com wrote:

Thanks to everyone for their help. Using everyone's suggestions, this seems to 
work:

import win32clipboard, win32con

def getclipboard():
 win32clipboard.OpenClipboard()
 s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT)
 win32clipboard.CloseClipboard()
 if '\0' in s:
 s = s[:s.index('\0')]
 return s


That'll look for a null character and, if it finds one, then look for
it again. Of course, that probably isn't an issue in practice.

However, an alternative way of doing it is to use the .partition method:

 return s.partition('\0')[0]

--
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread Dave Angel
On 18/9/2013 15:40, MRAB wrote:

 On 18/09/2013 20:28, stephen.bou...@gmail.com wrote:
 Thanks to everyone for their help. Using everyone's suggestions, this seems 
 to work:

 import win32clipboard, win32con

 def getclipboard():
  win32clipboard.OpenClipboard()
  s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT)
  win32clipboard.CloseClipboard()
  if '\0' in s:
  s = s[:s.index('\0')]
  return s

 That'll look for a null character and, if it finds one, then look for
 it again. Of course, that probably isn't an issue in practice.

 However, an alternative way of doing it is to use the .partition method:

   return s.partition('\0')[0]

So is the bug in Excel, in Windows, or in the Python library?  Somebody
is falling down on the job;  if Windows defines the string as ending at
the first null, then the Python interface should use that when defining
the text defined with CF_UNICODETEXT.

Or maybe it's an example of ill-defined Windows specs.

-- 
DaveA


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread Neil Hodgson

Dave Angel:


So is the bug in Excel, in Windows, or in the Python library?  Somebody
is falling down on the job;  if Windows defines the string as ending at
the first null, then the Python interface should use that when defining
the text defined with CF_UNICODETEXT.


   Everything is performing correctly. win32clipboard is low-level 
direct access to the Win32 clipboard API. A higher level API which is 
more easily used from Python could be defined on top of this if anyone 
was motivated.


   Neil

--
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-18 Thread Dave Angel
On 18/9/2013 17:40, Neil Hodgson wrote:

 Dave Angel:

 So is the bug in Excel, in Windows, or in the Python library?  Somebody
 is falling down on the job;  if Windows defines the string as ending at
 the first null, then the Python interface should use that when defining
 the text defined with CF_UNICODETEXT.

 Everything is performing correctly. win32clipboard is low-level 
 direct access to the Win32 clipboard API. A higher level API which is 
 more easily used from Python could be defined on top of this if anyone 
 was motivated.

 Neil

Clearly you miss the point.  If the clipboard API is defined to return a
null-terminated string, then the problem is in the Python library which
doesn't do a strlen() (or the wide-character equivalent;  I forget its
name) on the results.

But there's a big if there.  Somebody is either ill specified or poorly
implemented here.

-- 
DaveA


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-17 Thread stephen . boulet
On Thursday, September 12, 2013 6:01:20 PM UTC-5, stephen...@gmail.com wrote:
 I have an excel file. When I select cells, copy from excel, and then use 
 win32clipboard to get the contents of the clipboard, I have a 131071 
 character string.
 
 
 
 When I save the file as a text file, and use the python 3.3 open command to 
 read its contents, I only have 80684 characters.
 
 
 
 Excel (and  other programs too) appends a bunch of b'\x00' (or similar) 
 characters.
 
 
 
 Is there a pythonic way to strip these out?

Odd behavior from excel. It seems as though the clipboard contents will vary 
according to the clipboard size. For example, in my 13 column row I select 2023 
rows, copy to the clipboard, and 

def getclipboard():
win32clipboard.OpenClipboard()
s = win32clipboard.GetClipboardData(win32con.CF_TEXT)
win32clipboard.CloseClipboard()
return s

s = getclipboard()

len(s) gives me 16298.

If I select one additional row, len(s) balloons to 32767.

In the first case, s[-10:] is b'0.032573\r\n'.

In the second case, s[-10:] is b'\x008\x008\x000\x005\x00\t'

Does anyone know what is going on with that?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-13 Thread stephen . boulet
On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote:
 Stephen Boulet:
 
 
 
   From the clipboard contents copied from the spreadsheet, the characters 
  s[:80684] were the visible cell contents, and s[80684:] all started with 
  b'\x0 and lack any useful info for what I'm trying to accomplish.
 
 
 
 Looks like Excel is rounding up its clipboard allocation to the next 
 
 64K. There used to be good reasons for trying to leave some extra room 
 
 on the clipboard and avoid reallocating the block but I thought that was 
 
 over a long time ago.
 
 
 
 To strip NULs off the end of the string use s.rstrip('\0')

Hm, that gives me a Type str doesn't support the buffer API message.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-13 Thread stephen . boulet
On Friday, September 13, 2013 9:31:45 AM UTC-5, stephen...@gmail.com wrote:
 On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote:
 
  Stephen Boulet:
 
  
 
  
 
  
 
From the clipboard contents copied from the spreadsheet, the characters 
   s[:80684] were the visible cell contents, and s[80684:] all started with 
   b'\x0 and lack any useful info for what I'm trying to accomplish.
 
  
 
  
 
  
 
  Looks like Excel is rounding up its clipboard allocation to the next 
 
  
 
  64K. There used to be good reasons for trying to leave some extra room 
 
  
 
  on the clipboard and avoid reallocating the block but I thought that was 
 
  
 
  over a long time ago.
 
  
 
  
 
  
 
  To strip NULs off the end of the string use s.rstrip('\0')
 
 
 
 Hm, that gives me a Type str doesn't support the buffer API message.

Aha, I need to use str(s, encoding='utf8').rstrip('\0').
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-13 Thread Neil Cerutti
On 2013-09-13, stephen.bou...@gmail.com stephen.bou...@gmail.com wrote:
 On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote:
 Stephen Boulet:
 
 
 
   From the clipboard contents copied from the spreadsheet, the characters 
  s[:80684] were the visible cell contents, and s[80684:] all started with 
  b'\x0 and lack any useful info for what I'm trying to accomplish.
 
 
 
 Looks like Excel is rounding up its clipboard allocation to the next 
 
 64K. There used to be good reasons for trying to leave some extra room 
 
 on the clipboard and avoid reallocating the block but I thought that was 
 
 over a long time ago.
 
 
 
 To strip NULs off the end of the string use s.rstrip('\0')

 Hm, that gives me a Type str doesn't support the buffer API
 message.

Type mismatch. Try:

s.rstrip(b\0)

-- 
Neil Cerutti
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-13 Thread random832
On Fri, Sep 13, 2013, at 10:38, stephen.bou...@gmail.com wrote:
  Hm, that gives me a Type str doesn't support the buffer API message.
 
 Aha, I need to use str(s, encoding='utf8').rstrip('\0').

It's not a solution to your problem, but why aren't you using
CF_UNICODETEXT, particularly if you're using python 3? And if you're
not, utf8 is the incorrect encoding, you should be using encoding='mbcs'
to interact with the CF_TEXT clipboard.

Anyway, to match behavior found in other applications when pasting from
the clipboard, I would suggest using:

if s.contains('\0'): s = s[:s.index('\0')]

Which will also remove non-null bytes after the first null (but if the
clipboard contains these, it won't be pasted into e.g. notepad).
-- 
https://mail.python.org/mailman/listinfo/python-list


Stripping characters from windows clipboard with win32clipboard from excel

2013-09-12 Thread stephen . boulet
I have an excel file. When I select cells, copy from excel, and then use 
win32clipboard to get the contents of the clipboard, I have a 131071 character 
string.

When I save the file as a text file, and use the python 3.3 open command to 
read its contents, I only have 80684 characters.

Excel (and  other programs too) appends a bunch of b'\x00' (or similar) 
characters.

Is there a pythonic way to strip these out?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-12 Thread Steven D'Aprano
On Thu, 12 Sep 2013 16:01:20 -0700, stephen.boulet wrote:

 I have an excel file. When I select cells, copy from excel, and then use
 win32clipboard to get the contents of the clipboard, I have a 131071
 character string.

How exactly are you using win32clipboard, and what exact result are you 
getting? Start with a smaller selection of cells, say, two cells, so the 
amount of data is manageable.

I doubt very much you are getting 131071 *characters*. Perhaps you are 
getting that many *bytes*. Or perhaps you are getting HTML-formatted 
text. Without seeing what is in the clipboard, who can tell? Remember 
that the clipboard can contain multiple versions of the same data.


 When I save the file as a text file, and use the python 3.3 open command
 to read its contents, I only have 80684 characters.

How exactly are you using the open function? The result you get may 
differ drastically depending on what you do.


 Excel (and  other programs too) appends a bunch of b'\x00' (or similar)
 characters.

Append to what? The .xls file? The text file? What tool are you using to 
see this?



 Is there a pythonic way to strip these out?

It's hard to tell from the limited description, but my guess is that you 
are misinterpreting what you are seeing. Part of the evidence for this is 
that you are conflating bytes and characters, e.g. above where you refer 
to the NUL *byte* b'\x00' as a character. Bytes are not characters. Scrub 
that out of your brain. They never were, not even back in the old ASCII 
days (may they soon be forgotten), despite what many people think.

My guess is that when you save the spreadsheet as text, either Excel by 
default, or possibly because you have ticked a checkbox, have set it to 
save using the UTF-16 encoding. That's a good thing (not ideal, ideally 
they ought to use UTF-8, but UTF-16 is not a bad choice).

But when you open the file in Python, Python defaults to UTF-8, which 
means you get an bunch of extraneous NULs when opening the file in text 
mode, or b'\x00' null bytes in binary mode. For example:


py with open('/tmp/rubbish', 'w', encoding='utf-16be') as f:
... f.write('hello world blah blah blah\n')
...
27
py with open('/tmp/rubbish', 'r') as f:   # default encoding is UTF-8
... f.read()
...
'\x00h\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00b\x00l
\x00a\x00h\x00 \x00b\x00l\x00a\x00h\x00 \x00b\x00l\x00a\x00h\x00\n'

If you look carefully, you will see that every character appears to be 
preceded by the NUL control character, \x00. But that's because you've 
used the wrong encoding to decode the bytes in the file back to 
characters. The right way to do this is:

py with open('/tmp/rubbish', 'r', encoding='utf-16be') as f:
... f.read()
...
'hello world blah blah blah\n'


Which encoding should you use? Unfortunately, there is no clean way for 
text files to record which encoding to use inside the file itself, so 
it's often impossible to know for sure. This is why everyone should move 
towards using UTF-8 everywhere. But I digress. 

The fact that you show the NULs as *bytes* b'\x00' rather than characters 
'\x00' suggests that you might be reading the file in binary mode. 
Presumably you did this because you got an error when trying to read it 
in text mode. If you got this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: 
invalid start byte

then you should go back to text mode, but use 'utf-16' as the codec:

open(filename, 'w', encoding='utf-16')


Otherwise, if the NULs appear before the letters, as in my example, use 
'utf-16be'. If they appear after the letters, use 'utf-16le'. The 
existence of three oh-so-very-slightly different versions of UTF-16 is 
why it is a sub-optimal encoding, and it is oh-so-bloody-typical that 
Microsoft screwed it up for everyone by picking it as their default 
implementation. If they had sensibly used UTF-8, you wouldn't be having 
this problem.


Of course, it is possible I've misdiagnosed your problem. I've had to 
guess a lot because you didn't show us what you actually did to get the 
results you say you got.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-12 Thread stephen . boulet
Hi Steven. Here is my code:

import win32clipboard, win32con

def getclipboard():
win32clipboard.OpenClipboard()
s = win32clipboard.GetClipboardData(win32con.CF_TEXT)
win32clipboard.CloseClipboard()
return s

I use this helper function to grab the text on the clipboard and do useful 
things with it.

Sorry about the description; I have stuff to learn about strings and python 3.3.

To get the string length, I just do len(s). There were 10 columns and 700+ rows 
of data, so len(s) returned 80684 from an excel spreadsheet saved as a text 
file.

From the clipboard contents copied from the spreadsheet, the characters 
s[:80684] were the visible cell contents, and s[80684:] all started with 
b'\x0 and lack any useful info for what I'm trying to accomplish.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-12 Thread MRAB

On 13/09/2013 01:58, stephen.bou...@gmail.com wrote:

Hi Steven. Here is my code:

import win32clipboard, win32con

def getclipboard():
 win32clipboard.OpenClipboard()
 s = win32clipboard.GetClipboardData(win32con.CF_TEXT)
 win32clipboard.CloseClipboard()
 return s

I use this helper function to grab the text on the clipboard and do useful 
things with it.

Sorry about the description; I have stuff to learn about strings and python 3.3.

To get the string length, I just do len(s). There were 10 columns and 700+ rows 
of data, so len(s) returned 80684 from an excel spreadsheet saved as a text 
file.


From the clipboard contents copied from the spreadsheet, the characters s[:80684] were 
the visible cell contents, and s[80684:] all started with b'\x0 and lack any 
useful info for what I'm trying to accomplish.



From my own experiments (although not with Excel) it appears that
GetClipboardData returns bytes (a bytestring), not (Unicode) strings.
--
https://mail.python.org/mailman/listinfo/python-list


Re: Stripping characters from windows clipboard with win32clipboard from excel

2013-09-12 Thread Neil Hodgson

Stephen Boulet:


 From the clipboard contents copied from the spreadsheet, the characters s[:80684] were 
the visible cell contents, and s[80684:] all started with b'\x0 and lack any 
useful info for what I'm trying to accomplish.


   Looks like Excel is rounding up its clipboard allocation to the next 
64K. There used to be good reasons for trying to leave some extra room 
on the clipboard and avoid reallocating the block but I thought that was 
over a long time ago.


   To strip NULs off the end of the string use s.rstrip('\0')

   Neil

--
https://mail.python.org/mailman/listinfo/python-list