Re: [Pharo-users] protobufs for Pharo
On Mon, May 11, 2015, at 12:27 PM, Paul DeBruicker wrote: If you've gotta have prorobuf then you've gotta have protobuf but there is a thrift implementation (https://thrift.apache.org/) and also a message pack implementation (https://code.google.com/p/stomp-serializer/) There are unfortunately a lot of protocols (including the one I mentioned used by RethinkDb) that are protobuf-based, so using one of those won't work too well. If interfacing with a third-party tool *weren't* necessary, I'd honestly probably just use Fuel, anyway. --Benjamin
[Pharo-users] protobufs for Pharo
Hey all, Has anyone implemented protobufs for Pharo yet? I’ve not managed to find any examples, so I assume the answer is “no”, but I thought I’d check here before rolling up my sleeves and writing one myself. (My specific target here is to write a RethinkDB driver, so if someone has magically also done *that*, please let me know, too.) Thanks, --Benjamin
Re: [Pharo-users] Spur images
On Sun, 05 Oct 2014 16:51:25 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: [snip] Apart from that, the tokenisation is not very efficient, #lines is a copy of your whole contents, so is the #split: and #trimmed. The algorithm sounds a bit lazy as well, writing it 'on purpose' with an eye for performance might yield better results. So I was reflecting on this more. If String and WideString were immutable, then it'd be easy to avoid all of these copies; you could instead pass around very tiny objects that had only three members (a String, a start position, a stop position), and avoid copying very much data. It's that String and WideString are mutable that preclude that. For fun, since I know I won't mutate the stringsin this example, I actually did a quick spike where I replaced #copyFrom:to: with a new method I introduced called #viewFrom:to: that returned a StringView. I'll post the code when I have a chance to clean it up if there's interest, but it looks like it pretty handedly chops off 120-150ms from that runtime (i.e., double the speed). Has there been any thought to introducing some immutable collections? Or maybe I'm just missing them? They'd be useful not just for String and WideString, but really for basically any of the collection types. The implementation in most cases would be as simple as overriding #at:put: and friends to throw self shouldNotImplement, and then providing methods/classes like the one I introduced to allow taking advantage of the newfound immutability. If there's interest, I'd be happy to submit a Slice we could use as a concrete RFC of what this could look like. --Benjamin
[Pharo-users] Spur images
My apologies if this is already spelled out somewhere and I simply can't find it, but are there any Spur images for Pharo yet? I can only find Squeak ones. It's not a big deal either way; I was just playing with an algorithm that, after very heavy optimization, I was able to get down to about 278ms per pass (from ~700ms initially from a naive implementation). For contrast, the equivalent Python runs in ~80ms. Looking at MessageTally, it seems at least half of that 278ms is spent noodling around in WideStrings and other small things that the Spur object format ought to help with, so it'd be interesting to see how much of a speed boost that gives without any further work.
Re: [Pharo-users] Ridiculous we are
On Tue, 23 Sep 2014 08:51:54 -0400, Hilaire hila...@drgeo.eu wrote: Le 23/09/2014 14:09, Damien Cassou a écrit : I recently read documents about utf-8 encoding. In all of them, the author says that pathnames should be kept as is because you never know which encoding the filesystem uses. So, a filename should probably be a bytearray. yes, but a #é should be encoded in two bytes. As noted in my previous message, é could be represented as either one or two Unicode code points, and these in turn could validly be either two or three bytes in UTF-8. My gut says that $é should be U+00E9, because otherwise you should have to use two Characters ($e and $´), but you could legitimately argue otherwise as well, and at any rate, #é could definitely be either. This is likely the core of the issue you're hitting.
Re: [Pharo-users] Ridiculous we are
On Mon, 22 Sep 2014 17:58:41 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: I also find the way some problems are reported quite disturbing. How much testing did you do ? On which platforms ? I can do this (in Pharo 3) without any problems (we're talking about arbitrary Unicode characters in path names): ('/tmp' asFileReference / 'été') ensureCreateDirectory. '/tmp/été' asFileReference exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') writeStreamDo: [ :out | out 'What about Greece ?' ]. ('/tmp/été' asFileReference / 'Ελλάδα.txt') exists. ('/tmp/été' asFileReference / 'Ελλάδα.txt') contents. And in a terminal, I get: $ ls /tmp/été/Ελλάδα.txt /tmp/été/Ελλάδα.txt $ cat !$ cat /tmp/été/Ελλάδα.txt What about Greece ? This is on Mac OS X. So this part fundamentally works in the image and on one VM. There might of course be problems in how paths are used in certain places or on certain VM/platforms. Focusing purely on Unicode itself (not the encoding systems), a letter like é can be represented as U+00E9 (LATIN SMALL LETTER E WITH ACUTE), or as U+0065 (LATIN SMALL LETTER E) followed by U+0301 (combining acute accent). These will appear identical to the user, but are emphatically *not* identical for most software. The way you're testing here, you will not hit any error relating to this concept, ever, because you're using Pharo for both generating and consuming the strings. At the very least, we'd need to generate a file named été with both forms explicitly and see what happens. Things get even more exciting, though, because Unix says that file names are simply arbitrary byte patterns that do not contain the null byte.* Thus, you can trivially create a file named été using Latin-1 encoding, and again using UTF-8 encoding, and again using UTF-7 encoding, and these might all be shown to the user as identically named, but I guarantee you that Pharo will not act sanely with all four of these. Even on Windows, where things are a bit saner (NTFS mandates UTF-16), and where an explicit normalization form is preferred (NFC), I just explicitly verified that I can trivially inject other normalization forms into the file system. Thus, you can still have two files named été that nevertheless have different names as far as the OS is concerned. In this case, as far as I can tell, Pharo assumes that all path names are Unicode, and does not do any work to convert strings to or from the various normalization schemes (looking in Path classcanonicalizeElements:, Path classfrom:delimiter, and FileSystemStorepathFromString: here). There's therefore a pretty straightforward fix that Pharo could do: 1. Path would use ByteArrays as the actual canonical store, and provide convenience methods to see what the array decodes to in various encodings. The developer and application can make decisions about what encoding system they want to use. 2. The VM likely needs to be modified to handle this (didn't check) As much as I wish Hilaire provided more details in his bug report, it's worth keeping in mind that not all users, or even all programmers, understand the full implications of things like how various Unicode normalization and encoding schemes interact in practice with Unix's very vague concept of what a file name actually is, so I usually try to approach these bug reports carefully and with an open mind. --Benjamin * On OS X, HFS+ uses UTF-16 with an Apple-specific variant of NFD, whereas I do not believe this holds for e.g. UFS or FUSE-backed file systems, so things are a bit subtler there, but the general rule holds.
Re: [Pharo-users] Ridiculous we are
On Wed, 24 Sep 2014 13:03:57 -0400, Sven Van Caekenberghe s...@stfx.eu wrote: Did you read the actual conversation in the issue ? https://pharo.fogbugz.com/f/cases/14054/Issue-with-path-with-accented-characters It has been renamed and there is a fix (as a change set, not as a slice, yet). Basically, there was a primitive call into a plugin that failed to do encoding. No, I apologize; I missed the bug link. Thanks for reposting it. Now regarding the issues you raised. Pharo does not do Unicode canonicalisation or any of that other fancy stuff (like categorisation, proper ordering and so on). This is another orthogonal and way more general issue. Regarding the pathnames encoding: if the OS itself does not know it, how can we ? That's actually the argument *against* using UTF-8 as the standard Pharo way to represent filenames--at least on Unix systems. If Pharo used ByteArrays to represent paths, with convenience methods for working with UTF-8 (since I do agree that's the most likely thing for a user/dev to want), then you'd be able to work with all files no matter what, *and* have a convenient way of doing so for the common case. This is an old discussion, and I do see both sides of it. In terms of SCMs, Mercurial and Git both just say it's a collection of bytes, whereas Subversion says it's Unicode code points. This has some uncomfortable implications for both systems when working on multiple platforms. --Benjamin