Re: Stripping characters from windows clipboard with win32clipboard from excel
On Wed, Sep 18, 2013, at 16:13, Dave Angel wrote: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Or maybe it's an example of ill-defined Windows specs. I think it's a matter of ill-defined windows specs - the clipboard data has an exact size, and in principle that size is always used, for example, for numerous binary formats. But the text clipboard has often been treated as a C string by a wide variety of different applications, so one could argue that has become a de facto standard. -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 2013-09-18, Dave Angel da...@davea.name wrote: On 18/9/2013 17:40, Neil Hodgson wrote: Dave Angel: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Everything is performing correctly. win32clipboard is low-level direct access to the Win32 clipboard API. A higher level API which is more easily used from Python could be defined on top of this if anyone was motivated. Neil Clearly you miss the point. If the clipboard API is defined to return a null-terminated string, then the problem is in the Python library which doesn't do a strlen() (or the wide-character equivalent; I forget its name) on the results. Python can't really know if you're pasting text or a screenshot or what. -- Neil Cerutti -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 19/9/2013 11:53, Neil Cerutti wrote: On 2013-09-18, Dave Angel da...@davea.name wrote: On 18/9/2013 17:40, Neil Hodgson wrote: Dave Angel: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Everything is performing correctly. win32clipboard is low-level direct access to the Win32 clipboard API. A higher level API which is more easily used from Python could be defined on top of this if anyone was motivated. Neil Clearly you miss the point. If the clipboard API is defined to return a null-terminated string, then the problem is in the Python library which doesn't do a strlen() (or the wide-character equivalent; I forget its name) on the results. Python can't really know if you're pasting text or a screenshot or what. CF_UNICODETEXT gives it a pretty good clue. But random832 has already mentioned that it's an ill-specified Windows interface; different programs deal with it differently. No way that a the Python library should deal with that. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Fri, Sep 13, 2013, at 12:09, random...@fastmail.us wrote: Anyway, to match behavior found in other applications when pasting from the clipboard, I would suggest using: if s.contains('\0'): s = s[:s.index('\0')] Which will also remove non-null bytes after the first null (but if the clipboard contains these, it won't be pasted into e.g. notepad). I did some testing and confirmed you MUST do this - if you copy multiple things from Excel, it may reuse this buffer and not clear the end of it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Thanks to everyone for their help. Using everyone's suggestions, this seems to work: import win32clipboard, win32con def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT) win32clipboard.CloseClipboard() if '\0' in s: s = s[:s.index('\0')] return s -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 18/09/2013 20:28, stephen.bou...@gmail.com wrote: Thanks to everyone for their help. Using everyone's suggestions, this seems to work: import win32clipboard, win32con def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT) win32clipboard.CloseClipboard() if '\0' in s: s = s[:s.index('\0')] return s That'll look for a null character and, if it finds one, then look for it again. Of course, that probably isn't an issue in practice. However, an alternative way of doing it is to use the .partition method: return s.partition('\0')[0] -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 18/9/2013 15:40, MRAB wrote: On 18/09/2013 20:28, stephen.bou...@gmail.com wrote: Thanks to everyone for their help. Using everyone's suggestions, this seems to work: import win32clipboard, win32con def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_UNICODETEXT) win32clipboard.CloseClipboard() if '\0' in s: s = s[:s.index('\0')] return s That'll look for a null character and, if it finds one, then look for it again. Of course, that probably isn't an issue in practice. However, an alternative way of doing it is to use the .partition method: return s.partition('\0')[0] So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Or maybe it's an example of ill-defined Windows specs. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Dave Angel: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Everything is performing correctly. win32clipboard is low-level direct access to the Win32 clipboard API. A higher level API which is more easily used from Python could be defined on top of this if anyone was motivated. Neil -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 18/9/2013 17:40, Neil Hodgson wrote: Dave Angel: So is the bug in Excel, in Windows, or in the Python library? Somebody is falling down on the job; if Windows defines the string as ending at the first null, then the Python interface should use that when defining the text defined with CF_UNICODETEXT. Everything is performing correctly. win32clipboard is low-level direct access to the Win32 clipboard API. A higher level API which is more easily used from Python could be defined on top of this if anyone was motivated. Neil Clearly you miss the point. If the clipboard API is defined to return a null-terminated string, then the problem is in the Python library which doesn't do a strlen() (or the wide-character equivalent; I forget its name) on the results. But there's a big if there. Somebody is either ill specified or poorly implemented here. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Thursday, September 12, 2013 6:01:20 PM UTC-5, stephen...@gmail.com wrote: I have an excel file. When I select cells, copy from excel, and then use win32clipboard to get the contents of the clipboard, I have a 131071 character string. When I save the file as a text file, and use the python 3.3 open command to read its contents, I only have 80684 characters. Excel (and other programs too) appends a bunch of b'\x00' (or similar) characters. Is there a pythonic way to strip these out? Odd behavior from excel. It seems as though the clipboard contents will vary according to the clipboard size. For example, in my 13 column row I select 2023 rows, copy to the clipboard, and def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_TEXT) win32clipboard.CloseClipboard() return s s = getclipboard() len(s) gives me 16298. If I select one additional row, len(s) balloons to 32767. In the first case, s[-10:] is b'0.032573\r\n'. In the second case, s[-10:] is b'\x008\x008\x000\x005\x00\t' Does anyone know what is going on with that? -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote: Stephen Boulet: From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. Looks like Excel is rounding up its clipboard allocation to the next 64K. There used to be good reasons for trying to leave some extra room on the clipboard and avoid reallocating the block but I thought that was over a long time ago. To strip NULs off the end of the string use s.rstrip('\0') Hm, that gives me a Type str doesn't support the buffer API message. -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Friday, September 13, 2013 9:31:45 AM UTC-5, stephen...@gmail.com wrote: On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote: Stephen Boulet: From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. Looks like Excel is rounding up its clipboard allocation to the next 64K. There used to be good reasons for trying to leave some extra room on the clipboard and avoid reallocating the block but I thought that was over a long time ago. To strip NULs off the end of the string use s.rstrip('\0') Hm, that gives me a Type str doesn't support the buffer API message. Aha, I need to use str(s, encoding='utf8').rstrip('\0'). -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 2013-09-13, stephen.bou...@gmail.com stephen.bou...@gmail.com wrote: On Thursday, September 12, 2013 10:43:46 PM UTC-5, Neil Hodgson wrote: Stephen Boulet: From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. Looks like Excel is rounding up its clipboard allocation to the next 64K. There used to be good reasons for trying to leave some extra room on the clipboard and avoid reallocating the block but I thought that was over a long time ago. To strip NULs off the end of the string use s.rstrip('\0') Hm, that gives me a Type str doesn't support the buffer API message. Type mismatch. Try: s.rstrip(b\0) -- Neil Cerutti -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Fri, Sep 13, 2013, at 10:38, stephen.bou...@gmail.com wrote: Hm, that gives me a Type str doesn't support the buffer API message. Aha, I need to use str(s, encoding='utf8').rstrip('\0'). It's not a solution to your problem, but why aren't you using CF_UNICODETEXT, particularly if you're using python 3? And if you're not, utf8 is the incorrect encoding, you should be using encoding='mbcs' to interact with the CF_TEXT clipboard. Anyway, to match behavior found in other applications when pasting from the clipboard, I would suggest using: if s.contains('\0'): s = s[:s.index('\0')] Which will also remove non-null bytes after the first null (but if the clipboard contains these, it won't be pasted into e.g. notepad). -- https://mail.python.org/mailman/listinfo/python-list
Stripping characters from windows clipboard with win32clipboard from excel
I have an excel file. When I select cells, copy from excel, and then use win32clipboard to get the contents of the clipboard, I have a 131071 character string. When I save the file as a text file, and use the python 3.3 open command to read its contents, I only have 80684 characters. Excel (and other programs too) appends a bunch of b'\x00' (or similar) characters. Is there a pythonic way to strip these out? -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On Thu, 12 Sep 2013 16:01:20 -0700, stephen.boulet wrote: I have an excel file. When I select cells, copy from excel, and then use win32clipboard to get the contents of the clipboard, I have a 131071 character string. How exactly are you using win32clipboard, and what exact result are you getting? Start with a smaller selection of cells, say, two cells, so the amount of data is manageable. I doubt very much you are getting 131071 *characters*. Perhaps you are getting that many *bytes*. Or perhaps you are getting HTML-formatted text. Without seeing what is in the clipboard, who can tell? Remember that the clipboard can contain multiple versions of the same data. When I save the file as a text file, and use the python 3.3 open command to read its contents, I only have 80684 characters. How exactly are you using the open function? The result you get may differ drastically depending on what you do. Excel (and other programs too) appends a bunch of b'\x00' (or similar) characters. Append to what? The .xls file? The text file? What tool are you using to see this? Is there a pythonic way to strip these out? It's hard to tell from the limited description, but my guess is that you are misinterpreting what you are seeing. Part of the evidence for this is that you are conflating bytes and characters, e.g. above where you refer to the NUL *byte* b'\x00' as a character. Bytes are not characters. Scrub that out of your brain. They never were, not even back in the old ASCII days (may they soon be forgotten), despite what many people think. My guess is that when you save the spreadsheet as text, either Excel by default, or possibly because you have ticked a checkbox, have set it to save using the UTF-16 encoding. That's a good thing (not ideal, ideally they ought to use UTF-8, but UTF-16 is not a bad choice). But when you open the file in Python, Python defaults to UTF-8, which means you get an bunch of extraneous NULs when opening the file in text mode, or b'\x00' null bytes in binary mode. For example: py with open('/tmp/rubbish', 'w', encoding='utf-16be') as f: ... f.write('hello world blah blah blah\n') ... 27 py with open('/tmp/rubbish', 'r') as f: # default encoding is UTF-8 ... f.read() ... '\x00h\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00 \x00b\x00l \x00a\x00h\x00 \x00b\x00l\x00a\x00h\x00 \x00b\x00l\x00a\x00h\x00\n' If you look carefully, you will see that every character appears to be preceded by the NUL control character, \x00. But that's because you've used the wrong encoding to decode the bytes in the file back to characters. The right way to do this is: py with open('/tmp/rubbish', 'r', encoding='utf-16be') as f: ... f.read() ... 'hello world blah blah blah\n' Which encoding should you use? Unfortunately, there is no clean way for text files to record which encoding to use inside the file itself, so it's often impossible to know for sure. This is why everyone should move towards using UTF-8 everywhere. But I digress. The fact that you show the NULs as *bytes* b'\x00' rather than characters '\x00' suggests that you might be reading the file in binary mode. Presumably you did this because you got an error when trying to read it in text mode. If you got this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte then you should go back to text mode, but use 'utf-16' as the codec: open(filename, 'w', encoding='utf-16') Otherwise, if the NULs appear before the letters, as in my example, use 'utf-16be'. If they appear after the letters, use 'utf-16le'. The existence of three oh-so-very-slightly different versions of UTF-16 is why it is a sub-optimal encoding, and it is oh-so-bloody-typical that Microsoft screwed it up for everyone by picking it as their default implementation. If they had sensibly used UTF-8, you wouldn't be having this problem. Of course, it is possible I've misdiagnosed your problem. I've had to guess a lot because you didn't show us what you actually did to get the results you say you got. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Hi Steven. Here is my code: import win32clipboard, win32con def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_TEXT) win32clipboard.CloseClipboard() return s I use this helper function to grab the text on the clipboard and do useful things with it. Sorry about the description; I have stuff to learn about strings and python 3.3. To get the string length, I just do len(s). There were 10 columns and 700+ rows of data, so len(s) returned 80684 from an excel spreadsheet saved as a text file. From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
On 13/09/2013 01:58, stephen.bou...@gmail.com wrote: Hi Steven. Here is my code: import win32clipboard, win32con def getclipboard(): win32clipboard.OpenClipboard() s = win32clipboard.GetClipboardData(win32con.CF_TEXT) win32clipboard.CloseClipboard() return s I use this helper function to grab the text on the clipboard and do useful things with it. Sorry about the description; I have stuff to learn about strings and python 3.3. To get the string length, I just do len(s). There were 10 columns and 700+ rows of data, so len(s) returned 80684 from an excel spreadsheet saved as a text file. From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. From my own experiments (although not with Excel) it appears that GetClipboardData returns bytes (a bytestring), not (Unicode) strings. -- https://mail.python.org/mailman/listinfo/python-list
Re: Stripping characters from windows clipboard with win32clipboard from excel
Stephen Boulet: From the clipboard contents copied from the spreadsheet, the characters s[:80684] were the visible cell contents, and s[80684:] all started with b'\x0 and lack any useful info for what I'm trying to accomplish. Looks like Excel is rounding up its clipboard allocation to the next 64K. There used to be good reasons for trying to leave some extra room on the clipboard and avoid reallocating the block but I thought that was over a long time ago. To strip NULs off the end of the string use s.rstrip('\0') Neil -- https://mail.python.org/mailman/listinfo/python-list