[9fans] no disk; none at all

2013-05-02 Thread erik quanstrom
or, a spruced-up authentication server.

due to a failed cx4 switch, i've been updating a number of machines
to spf+.

in the process, it seemed easier to replace the authentication server
than to just slap a new nic in it.

since cinap's iplfat gives one the ability to boot from usb, and usb
drives are much cheeper than small ssds, i thought it might be worth
doing it that way.  in theory they're just as easy to swap out as a
ssd.

one problem.  ipl lacks menus.  so i spent a bit of time adding them
yesterday.  the syntax is a little different due to the need to keep
ipl as small as possible.

while at it, it was easy to update to the 64-bit kernel.  and—bonus—
it turned out the new box had a e3-v2 cpu which supports the rdrnd
instruction.  it's possible with rdrand to generate random numbers
at several gigabytes/s.  nix's k10 kernel can use rdrand to generate
/dev/random randomness.  and this saves a lot of cpu time.
(~2000 cpu minutes/week).

that sounds like a bit of trivia, but the bottom line is
a 64-bit usb-booting auth server with no 9load in sight.

that almost feels like progress.

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread 9p-st
> On Thu, May 02, 2013 at 03:22:21PM -0400, erik quanstrom wrote:
> > can you give an example of xd outputting something that's not a rune?

if we're talking about xd, i'll suggest 'tcs -f 8859-1' again in which case:

> Indeed, if the regexp is an ASCII representation matching xd outputs
> there is not _this_ problem. But this is limited regexp, since one can
> not use "character" ranges (it depends on the size); not '.';

these problems go away

> because the conversion has to be done;

this remains

> because there is still the newline problem (that is added; not something in
> the original data) (if functions have been added to not deal with the
> newline, it is because the newline is a problem, and because regexp have a
> more wider use than "text").

and this problem goes away.

i imagine you'll still have problems with embedded NULs, but that's C
strings for you...

if you want a library function, use rregexec(2) and rregsub(2) with only
the low byte of each Rune filled...

(and yes, your data does quadruple itself)

tristan

-- 
All original matter is hereby placed immediately under the public domain.



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> Indeed, if the regexp is an ASCII representation matching xd outputs
> there is not _this_ problem. But this is limited regexp, since one can
> not use "character" ranges (it depends on the size); not '.'; because 

now you're at both ends.  the whole reason for this approach is to
match bytes that aren't valid runes.  so why complain that it does
what you want?

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 03:22:21PM -0400, erik quanstrom wrote:
> 
> can you give an example of xd outputting something that's not a rune?
> 

Indeed, if the regexp is an ASCII representation matching xd outputs
there is not _this_ problem. But this is limited regexp, since one can
not use "character" ranges (it depends on the size); not '.'; because 
the conversion has to be done; because there is still the newline
problem (that is added; not something in the original data) (if
functions have been added to not deal with the newline, it is because
the newline is a problem, and because regexp have a more wider use than
"text").

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> On Thu, May 02, 2013 at 08:45:28PM +0200, dexen deVries wrote:
> > please pardon the silly question, but... how about piping the binary data 
> > through xd(1) before sending it to regexp(3)?
> > 
> 
> Because it will work only for some cases, since newlines and formatting
> come in the picture and it still imposes to have regexp rune compatible,
> i.e. not every sequence is allowed it has to be an UTF-8 compatible one.

can you give an example of xd outputting something that's not a rune?

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 08:45:28PM +0200, dexen deVries wrote:
> please pardon the silly question, but... how about piping the binary data 
> through xd(1) before sending it to regexp(3)?
> 

Because it will work only for some cases, since newlines and formatting
come in the picture and it still imposes to have regexp rune compatible,
i.e. not every sequence is allowed it has to be an UTF-8 compatible one.

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 12:53:19PM -0400, erik quanstrom wrote:
> 
> i see we're at an impass.  since i don't agree that utf-8 is a user
> interface thing.  it's more entrenched than that.
> 
> why don't you code something up?

Because I have started sketching (this was for kerTeX/RISK) "basys" i.e.
basic system tools, but I'm trying to decide whether I start from mainly
BSD tools (ash, libregex, sed, ed and the small set of utilities
used by RISK or by kerTeX package framework), or from Plan9 ones
(rc has some features that are worth them). But I want "basys" to
be a "C language" system---the system speaks Cee, and that's all;
a not integer number is given with a '.' and not a ',' for Frenchs
and so on (this is an example of POSIX hell: the *printf() and
*scanf() take the localization to decide how to interpret or render
numbers, and even if they are used to read files, not interacting
with the user, whatever user environment value spoils the thing if you
have not protected against in the code...), dealing with octets
strings (for user language, let them be UTF-8; but system strictly
doesn't care:  this is octets strings) and for libregex(p) the rune
thing does not appeal to me (correction: the only rune thing, even
if for a definition of "character" this does make sense).

I might as well end up with a modified sh or rc that deals with C
strings (with a L---for "hell"?---for UTF-8, nothing for octets, W for
wydes, T for tetras and O for octas and even a modifier for endianess).

But contrary to what is "state of the art", I take long to study and
make things clear (to myself... YMMV), and after that I urge on
implementing in the direction I have chosen (it may take "calendar"
time; but this is simply because of limited slots of time; during
these slots I don't wonder about what has to be done: it is already
decided...). Till I have made the choice...

I have already decided that I will implement a bar(1) that only packs
the data with a "volume" listing in text whatever attributes in a form
attribute=value are linked to the data (this is, in some sens, what RISK
already does with rkinstall(1), except that it uses tar(1) to pack data).
That is bar(1) will be a pure C89 program without any system dependent
part (this will allow to do whatever with the data, for example changing
names to fit local conventions---the man hierarchy; compressing man
pages; caching the rendering; adding extensions etc.).
-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread dexen deVries
please pardon the silly question, but... how about piping the binary data 
through xd(1) before sending it to regexp(3)?

-- 
dexen deVries

[[[↓][→]]]

I have seen the Great Pretender and he is not what he seems.




Re: [9fans] CODE AS THOU WILT summer sponsored by the Loonie Revolution

2013-05-02 Thread Kurt H Maier
On Thu, May 02, 2013 at 12:11:26PM -0400, Scott Elcomb wrote:
> On Wed, May 1, 2013 at 10:46 PM, mycroftiv 9gridchan
>  wrote:
> > The Loonie Revolution is proud to announce and sponsor:
> >
> > CODE AS THOU WILT SHALL BE THE WHOLE OF THE LAW summer
> 
> Is this statement available on a web page somewhere?  Might be worth
> sharing on social sites. ;-)

Gross.

http://9fans.net/archive/2013/05/38



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> > is your proposal 
> > - to change programs that take regular expressions to be exceptions to
> > the plan 9 text model, or
> 
> No: to have a libregexp being agnostic about any encoding. The tools can
> stay, for user, the same, simply libregexp would not be "text" based but
> octets based.

there's always an encoding.

> > - to change the plan 9 text model
> 
> Neither. The "text" model is a user interface. My question is simply how
> is it difficult to have an alternative, special purpose, user interface,
> that do not have the UTF-8 filter for input from and output to the user
> interface.

i see we're at an impass.  since i don't agree that utf-8 is a user
interface thing.  it's more entrenched than that.

why don't you code something up?

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 02:38:25PM +0200, tlaro...@polynum.com wrote:
> Regexp(6) handles "characters" that are runes.
> 

Answering to myself: regexp deals with entities called "characters".
Some regexp specifications ('.', ranges, classes etc.) apply to 
"characters".

This means that the size of the character has to be known, and one can
not deal directly with UTF-8 for example ignoring it is UTF-8 since '.'
for example is a variable size sequence, whose start depends on
what was before.

So a libregexp dealing with not only runes will be possible, but would
need to specify the fixed size of the characters, i.e. the "encoding"
of the input (this has nothing to do with localization; but with what is
an elementary entity). 

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] CODE AS THOU WILT summer sponsored by the Loonie Revolution

2013-05-02 Thread Scott Elcomb
On Wed, May 1, 2013 at 10:46 PM, mycroftiv 9gridchan
 wrote:
> The Loonie Revolution is proud to announce and sponsor:
>
> CODE AS THOU WILT SHALL BE THE WHOLE OF THE LAW summer

Is this statement available on a web page somewhere?  Might be worth
sharing on social sites. ;-)

-- 
  Scott Elcomb
  @psema4 on Twitter / Identi.ca / Github & more

  Atomic OS: Self Contained Microsystems
  http://code.google.com/p/atomos/

  Member of the Pirate Party of Canada
  http://www.pirateparty.ca/



Re: [9fans] problem with a here document in rc

2013-05-02 Thread Rudolf Sykora
On 2 May 2013 17:24, erik quanstrom  wrote:
> i usually solve this problem like this
>
> for(i in 1 2){
> mkdir -p $i || fatal
> cp POSCAR $i || fatal
> @{
> {
> echo 2c
> echo $i
> echo .
> echo w
> echo q
> } | ed POSCAR
> }
> }
>
> - erik


So you avoid a here document...
Then, is it so that a here document can't ever serve the purpose (ie.
it is limited in the way that any present variables are evaluated just
once; when then?)?

Thanks for a clarification!
Ruda



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 11:19:38AM -0400, erik quanstrom wrote:
> 
> the plan 9 model is that all text is utf-8, with the exception of
> internal encodings which may be Runes.
> 
> is your proposal 
> - to change programs that take regular expressions to be exceptions to
> the plan 9 text model, or

No: to have a libregexp being agnostic about any encoding. The tools can
stay, for user, the same, simply libregexp would not be "text" based but
octets based.

> - to change the plan 9 text model

Neither. The "text" model is a user interface. My question is simply how
is it difficult to have an alternative, special purpose, user interface,
that do not have the UTF-8 filter for input from and output to the user
interface.

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> For the moment, I don't want to change anything, I'm trying to be
> convinced where the border has to be: "characters" (for me user
> level) on the one side, octets strings on the other system and
> library side (on a distributed system, it makes sense that filenames,
> being userlevel nicknames be UTF-8---supposed to be UTF-8 without any
> per filename codepage or whatever).  

there is currently no such distinction between user and library.
this eliminates context.  one never is confronted with, "oh, i can't
call that because that's a user function, not a library function".

- erik



Re: [9fans] problem with a here document in rc

2013-05-02 Thread erik quanstrom
> s = (1 2)
> for(i in $s) {
>   mkdir -p $i
>   cp POSCAR $i
>   @{
>   cd $i
>   ed POSCAR <[2]/dev/null
>   }
> }
> 2c
> $i
> .
> w
> q

i usually solve this problem like this

for(i in 1 2){
mkdir -p $i || fatal
cp POSCAR $i || fatal
@{
{
echo 2c
echo $i
echo .
echo w
echo q
} | ed POSCAR
}
}

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 11:10:34AM -0400, Kurt H Maier wrote:
> Why does this functionality have to be overloaded into existing tools
> that are already in common use?  
> 

I'm speaking about the libregexp. Not about the use existing tools do 
with it.
-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 05:02:45PM +0200, Bence Fábián wrote:
> you want to change default behaviour and make the usual usecase special?
> 

For the moment, I don't want to change anything, I'm trying to be
convinced where the border has to be: "characters" (for me user
level) on the one side, octets strings on the other system and
library side (on a distributed system, it makes sense that filenames,
being userlevel nicknames be UTF-8---supposed to be UTF-8 without any
per filename codepage or whatever).  

The usual behavior could perfectly be the same (the
leading "L" was just an exemple; it could be reversed; and octets
matching could simply be called by new functions---new names---, the
historical ones calling these new character agnostic ones). The
problem is not there. The problem is: are regexp only useful with
"text" implying "characters", or more widely useful? My feeling is
that they are more generally useful.

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> But that is exactly my point: to have localization far from regexp.
> Regexp taking simply a string of bytes and matching strings of bytes.

the plan 9 model is that all text is utf-8, with the exception of
internal encodings which may be Runes.

is your proposal 
- to change programs that take regular expressions to be exceptions to
the plan 9 text model, or
- to change the plan 9 text model
?

either way, i think the bar should be high to change the text model
for plan 9, and higher to make exceptions.

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread Kurt H Maier
Why does this functionality have to be overloaded into existing tools
that are already in common use?  

khm



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 10:58:30AM -0400, a...@9srv.net wrote:
> 
> i think the answer is just "no, there's no way to do that today".
> and i'd strongly advise keeping that tool as far away from any
> discussion of localization or character sets or runes or the like.
> there's oughtn't be any mode switching or the like: it's utf-8
> encoded unicode runes, or it's binary, not characters at all.
> 

But that is exactly my point: to have localization far from regexp.
Regexp taking simply a string of bytes and matching strings of bytes.
(The main advantage of UTF-8 is not, for me, Unicode (UTF-8 could
survive being an encoding for something else than Unicode), but
precisely that it is still strings of octets, and that the system
can be left alone, far from localization.)

This is a side effect of not Unicode (UTF-8) aware tools to be able to
be used with whatever string of bytes since no interpretation is done.

One could even imagine using regexp to find a pattern in an image (even
a sed like program, trying to math a first row pattern, and then looking
for the following rows if some patterns are matched too).

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread Bence Fábián
you want to change default behaviour and make the usual usecase special?


2013/5/2 

>
> Or even a specification à la C: by adding a leading 'L' meaning:
> treat the string as UTF-8 that is masters runes. And if not, leave
> it alone.
>
>


Re: [9fans] Octets regexp

2013-05-02 Thread a
your exact problem still isn't clear to me, but certainly there've
been times when I want to search for some array of characters
in a binary blob. i don't believe i've needed anything beyond a
literal string of bytes, but i could imagine from there the utility of
something regexp-like.

i think the answer is just "no, there's no way to do that today".
and i'd strongly advise keeping that tool as far away from any
discussion of localization or character sets or runes or the like.
there's oughtn't be any mode switching or the like: it's utf-8
encoded unicode runes, or it's binary, not characters at all.

hex editors are useful sometimes. being able to do more
complicated searches/edits/replaces/whatever could be
similarly useful sometimes. but don't go anywhere near the
character set or localization discussions with it.

anthony




Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 09:43:10AM -0400, Tristan wrote:
> > And after some thought, I don't see an obvious reason why the regexp
> > could not be used with bytes strings (so UTF-8 is OK) without trying to
> > match runes (since not every bytes string is a correct UTF-8 sequence).
> 
> with octet based regexps, [Þþ] doesn't match þ, but 0xc3, 0xbe and 0x9e
> independantly.
> 

Regexp knows subexpressions. So it could be achieved, and one could even
have the present functions be higher level ones, calling more basic ones
dealing with bytes (a rune specified by an UTF-8 sequence being replaced
by a subexpression) or even dealing with various sizes of element
(character; but one fixed size for the processing).

Or even a specification à la C: by adding a leading 'L' meaning:
treat the string as UTF-8 that is masters runes. And if not, leave
it alone.

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 09:44:38AM -0400, erik quanstrom wrote:
> > This is a reflexion made to me by a developer who can use, when
> > needed, regexp (ed(1) or sed(1)) on an Unix where they still deal
> > with "char" (bytes) to search for a string of bytes in a binary.
> 
> i have never needed to do this.  could you provide some motiviation
> for grepping for a wierd byte in an executable?  surely the debugger
> is better suited for this.
> 

Because everything is not a program? But maybe data? For example, the
TeX (or METAFONT etc.) predigested dumps are binary, but not program.

> > And after some thought, I don't see an obvious reason why the regexp
> > could not be used with bytes strings (so UTF-8 is OK) without trying to
> > match runes (since not every bytes string is a correct UTF-8 sequence).
> 
> because it makes things more complicated and probablly worse for the
> common case, while not providing an new functionality already in
> other tools.
> 

Ah? I thought the purpose was to have not duplicated tools... And I'm
not quite sure it would be more complicated for common cases since already
defined functions could be wrappers calling more low level functions,
with the definition of the size of the "entity"---byte, wyde, tetra,
octa (when I'm at it: endianess too) or UTF-8.

> 
> i think you've missed the point of making utf-8 *the* character set.
> it's not sometimes the character set.  or only on tuesday.  it's always
> the character set.
> 
No: I have understood this. What I'm not totally sure about, is that the
system deals with octet strings (as it have), and this UTF-8 i.e.
Unicode is on the user interface, but is there a mean to not have the
interface interpret the strings as UTF-8? Because everything is not
text.

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] Octets regexp

2013-05-02 Thread Tristan
putting a little more thought into your actual problem, use tcs:

tcs -f 8859-1

which (as i remember) will map 0x80-ff to U0080-00ff and you can use
normal utf8 regular expressions.

tristan

-- 
All original matter is hereby placed immediately under the public domain.



Re: [9fans] Octets regexp

2013-05-02 Thread Tristan
> And after some thought, I don't see an obvious reason why the regexp
> could not be used with bytes strings (so UTF-8 is OK) without trying to
> match runes (since not every bytes string is a correct UTF-8 sequence).

with octet based regexps, [Þþ] doesn't match þ, but 0xc3, 0xbe and 0x9e
independantly.

tristan

-- 
All original matter is hereby placed immediately under the public domain.



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> This is a reflexion made to me by a developer who can use, when
> needed, regexp (ed(1) or sed(1)) on an Unix where they still deal
> with "char" (bytes) to search for a string of bytes in a binary.

i have never needed to do this.  could you provide some motiviation
for grepping for a wierd byte in an executable?  surely the debugger
is better suited for this.

> And after some thought, I don't see an obvious reason why the regexp
> could not be used with bytes strings (so UTF-8 is OK) without trying to
> match runes (since not every bytes string is a correct UTF-8 sequence).

because it makes things more complicated and probablly worse for the
common case, while not providing an new functionality already in
other tools.

> Corollary: I don't know if there is an UTF-8 sequence that can tell:
> stop interpreting as UTF-8, takes "as is" (except every incorrect
> sequence, problem being to come back from there: if everything is OK "as
> is", what can be interpreted as: "stops raw, restart
> UTF-8"---solution: this is on user level, not low level, and this is in
> the shell explicitely delimiting chunks, like "'" is the only delimiter,
> and every embedded "'" has to be "escaped" by doubling it).

i think you've missed the point of making utf-8 *the* character set.
it's not sometimes the character set.  or only on tuesday.  it's always
the character set.

- erik



Re: [9fans] Octets regexp

2013-05-02 Thread tlaronde
On Thu, May 02, 2013 at 08:48:06AM -0400, erik quanstrom wrote:
> > Regexp(6) handles "characters" that are runes.
> 
> perhaps the man page is misleading.  rune in this context means utf-8.
> see regexp(2).  all the functions take char*s.

But the source files deal with runes...

> 
> one of the points of plan 9 was to standardize on one character set,
> utf-8.  imho, localization and character set aren't related unless one
> is dealing with 8859-x overlays or some other character set insufficient
> to represent the range of languages.
> 

Localization (as "handled" in POSIX for example) is a mess. So the Plan9
solution, with still octets (UTF-8) makes far more sense, since it
allows to extend, for the user, the "characters" that can be used in
naming computer objects, but this is just for nicknames: the system 
still speaks C/9P. 

So it is better, except perhaps for one thing: for me, the system
"speaks" C or even, obviously, "Plan9" (well: 9P). It does not have
to speak french, hebrew, etc. or even english! So it takes or gives
bytes, and this is good.  But the UTF-8 encoding is the main convention
for user interface, but can it be unset? I mean, can one use a
"raw" window, putting uninterpreted bytes, and rendering bytes (with
a special "ASCII" font with whether ASCII + "0xdd" glyphes or whatever,
using fonts to do what is done with vis(1) on Unices or od(1)/xd(1))
and do not impose the assumption that the octet strings is UTF-8? Can 
one make a file entering bytes---i.e. binary values that yield 
incorrect UTF-8 sequences?

This is a reflexion made to me by a developer who can use, when
needed, regexp (ed(1) or sed(1)) on an Unix where they still deal
with "char" (bytes) to search for a string of bytes in a binary.

And after some thought, I don't see an obvious reason why the regexp
could not be used with bytes strings (so UTF-8 is OK) without trying to
match runes (since not every bytes string is a correct UTF-8 sequence).

Corollary: I don't know if there is an UTF-8 sequence that can tell:
stop interpreting as UTF-8, takes "as is" (except every incorrect
sequence, problem being to come back from there: if everything is OK "as
is", what can be interpreted as: "stops raw, restart
UTF-8"---solution: this is on user level, not low level, and this is in
the shell explicitely delimiting chunks, like "'" is the only delimiter,
and every embedded "'" has to be "escaped" by doubling it).
-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



[9fans] problem with a here document in rc

2013-05-02 Thread Rudolf Sykora
Hello,

I have a problem with writing correctly a here document in rc.
I wrote, say:

s = (1 2)
for(i in $s) {
mkdir -p $i
cp POSCAR $i
@{
cd $i
ed POSCAR <[2]/dev/null
}
}
2c
$i
.
w
q
EOF

and I wanted to have the 2nd line of individual POSCAR files replaced
with the immediate value of $i.
But the script doesn't do it, and I don't know if it can be modified
in a simple way so that it did.
(I could possibly use sed [ie. without a here document] but ...)

If anyone can tell me...
Thanks!
Ruda



Re: [9fans] Octets regexp

2013-05-02 Thread erik quanstrom
> Regexp(6) handles "characters" that are runes.

perhaps the man page is misleading.  rune in this context means utf-8.
see regexp(2).  all the functions take char*s.

> I wonder if Plan9 developers, when trying to design a way towards some
> localization, have ever thought of bytes (octets) regexp, that is using
> regexp with not rune but octets strings (maybe UTF-8 as is) allowing to
> use regexp with binary too, not only newline terminated chunks etc.?

one of the points of plan 9 was to standardize on one character set,
utf-8.  imho, localization and character set aren't related unless one
is dealing with 8859-x overlays or some other character set insufficient
to represent the range of languages.

however, sam and acme allow for structured regular expressions,
and are generally not line oriented:

http://doc.cat-v.org/bell_labs/structural_regexps/se.pdf

and iirc, cinap has written a cifs bit that uses a bit of binary matching.

- erik



[9fans] Octets regexp

2013-05-02 Thread tlaronde
Regexp(6) handles "characters" that are runes.

I wonder if Plan9 developers, when trying to design a way towards some
localization, have ever thought of bytes (octets) regexp, that is using
regexp with not rune but octets strings (maybe UTF-8 as is) allowing to
use regexp with binary too, not only newline terminated chunks etc.?

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] [GSOC] graphics projects

2013-05-02 Thread David Hoskin
Hello again,

Thanks for the discussion!

I wound up submitting a proposal for the web draw server:

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/drh/1

-- 
David Hoskin 



Re: [9fans] CODE AS THOU WILT summer sponsored by the Loonie Revolution

2013-05-02 Thread hiro
mveety, are you mentally ill? i only take bitcoin and human sacrifices.

On 5/2/13, Matthew Veety  wrote:
> I feel like you're taking rips off the wrong bong tonight. Amazon payments
> are way better.
>
> On May 1, 2013, at 22:46, mycroftiv 9gridchan
>  wrote:
>
>> The Loonie Revolution is proud to announce and sponsor:
>>
>> CODE AS THOU WILT SHALL BE THE WHOLE OF THE LAW summer
>>
>> What are the rules?
>>
>> NONE. From May 1 to Aug 30.
>>
>> What is going to happen?
>>
>> Some people (maybe you) are going to write some original Plan 9 code
>> that does something at least slightly interesting and new.  Then the
>> Loonie Revolution will send you $99 via PayPal.
>>
>> That's not much money for coding, is it?
>>
>> Nope.  Think of it as a tip of gratitude, not as an hourly wage for
>> your efforts.
>>
>> Can I just write Hello World and get money for it?
>>
>> Nope, not unless it said hello to a different world via the tachyon
>> telephone device.  There is no minimum specification, but the software
>> has to do something new for Plan 9, not just solve an Euler problem or
>> something.
>>
>> "Loonie Revolution" - is this just a troll?
>>
>> Nope.  The new lunar-powered revolution of peace and love to bring the
>> fantasy of a better world into our shared reality has been declared.
>> The fiction binds are forming and the harmonic time resonance is
>> vibrating in helical patterns.
>>
>> Ben Kidwell
>> Loonie
>
>



[9fans] Equivalent of unix vis(1) with font

2013-05-02 Thread tlaronde
Hello,

I was wondering if someone has already done the following for Plan9:

Allowing to render visually non visual characters (controls and blanks)
but not with a vis(1) encoding, but by substituting the font, that is
switching to a font that has, as subfonts, the font already used for
display, but filling supplementary glyphes for the "invisible"
characters, à la EOF, with say a '->|' for a tab, a '|_|' for a space
(it is used by D.E.K. and Silvio Levy' CWEB rendering for strings in
source), a "diagonal" '^@' etc. for controls or whatever?

-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C