Re: Optimization

2000-06-05 Thread Anatoly Vorobey

On Tue, Jun 06, 2000 at 08:47:42AM +0900, Daniel C. Sobral wrote:
 Can someone discuss the performance trade-offs of the following two
 alternative codes (and maybe suggest alternatives)?
 
 Problem: I need to retrieve two values from a table.
 
 Alternative A:
 
   x = table[i].x;
   y = table[i].y;
 
 Alternative B:
 
   d = table[i];
   x = d  MASK;
   y = d  SHIFT;

Alternative A should be much faster. The compiler should be smart 
enough to cache (table[i]) in a register. OTOH I am not sure it'll
be smart enough to cache a structure (d) in a register even though
it might fit there.

The first line of Alternative B should take roughly as much as the whole
of Alternative A, and even more if the compiler is stupider (and sets
up an array-copying operation to retrieve d instead of explicitly copying
x and y). Of course, if table[i] includes anything else besides x and y,
B is slower yet.

You might want to declare x and y as register if that's alright with
you. It might speed you up wrt next operations you do with x and y.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Optimization

2000-06-05 Thread Anatoly Vorobey

On Mon, Jun 05, 2000 at 05:07:42PM -0700, Zach Brown wrote:
 On Tue, Jun 06, 2000 at 02:57:18AM +0300, Anatoly Vorobey wrote:
 
   Can someone discuss the performance trade-offs of the following two
   alternative codes (and maybe suggest alternatives)?
   
   Problem: I need to retrieve two values from a table.
   
   Alternative A:
   
 x = table[i].x;
 y = table[i].y;
   
   Alternative B:
   
 d = table[i];
 x = d  MASK;
 y = d  SHIFT;
  
  Alternative A should be much faster. The compiler should be smart 
 
 Don't forget the effects of caching.  If x/y are always referenced
 together, and memory is slow slow slow (on, say, any processor made in
 the last few years)  then the cost of unmushing the data in the cpu
 could be much cheaper than the cost of going to memory to get x and y
 from different tables.

On the other hand, if the array is properly aligned, getting x will
get the whole dword (qword, etc.) into the cache, and CPU won't have
to run to the memory for y. Another problem with B is that I'm not sure
the compiler will be smart enough to squeeze a structure into a register
if it fits there, even with optimizations. Uhm, I think I'll run some
tests on that, just for kicks.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Optimization

2000-06-05 Thread Anatoly Vorobey

On Tue, Jun 06, 2000 at 09:31:46AM +0900, Daniel C. Sobral wrote:
 Alternative A:
 
   x = table[i].x;
   y = table[i].y;
 
 Alternative B:
 
   d = table[i];
   x = d  MASK;
   y = d  SHIFT;

Alternative A should be much faster. The compiler should be smart 
 [stuff about d being a structure]
 
 It isn't.

Ah, I didn't realize you have freedom of changing table[i]'s type 
between implementations .

Okay, I change my mind then. B is better. I ran a quick test with -O3
on i386. What happens in A is that it transfers 32-bit values anyway,
but isn't smart enough to do it only once. So it accesses *(table+i*2),
and then *(table+2+i*2),  both accesses taking one instruction (and
i*2 sitting precomputed in a register). It puts one in eax, stores ax
away, then puts the other in eax, and stores ax away.

In B, it accesses (*table+i*2) once, puts it in eax, stores ax away,
rotates eax, stores ax away. Rotation should win over memory access
even if it goes through cache, especially considering the memory
access has a constant displacement inside the instrution.

If you test it, be sure to declare x and y volatile, otherwise you'll
the hardest time getting gcc from keeping them in registers. Don't use
a constant i, or it'll precompute addresses, etc. Use -O3 -g -S,
and .stabs entries in the assembly file will mark line boundaries in
source.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: An IA-64 port?

2000-06-04 Thread Anatoly Vorobey

On Sun, Jun 04, 2000 at 11:12:22AM -0400, Will Andrews wrote:
 On Sun, Jun 04, 2000 at 01:18:39AM +0800, Belldandy wrote:
Is there any effort(or at least, any thought) on making an
  IA-64 port of FreeBSD? It seems Intel is trying to push IA-64
  to be 'the platform' for servers and workstations, and I think
  FreeBSD definitely can't be left out
 
 Hi,
 
 Just new information for you:  David O'Brien [EMAIL PROTECTED]
 imported GCC 2.96 into -current's tree early this morning.  The build
 succeeded on my system, so I should have a compiler with rudimentary
 (i.e. "pre-alpha") support for IA-64 once I reboot.

Do you mean just "you" or "anyone with -current"? I thought our binutils
(e.g. gas) had no IA-64 support. Has that changed?

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Proper uses for MFS?

2000-05-25 Thread Anatoly Vorobey

You, Matthew Dillon, were spotted writing this on Thu, May 25, 2000 at 10:57:33AM 
-0700:
 
 I don't particularly like to use MFS for 'large' partitions, mainly
 because cached data blocks wind up in core memory twice (once in MFS's
 memory map, and once in the VM page cache).

You've said this several times in threads on MFS during recent months,
and I've always wanted to ask: is that a necessary 'feature' of MFS's
architecture, or something which could possibly be fixed without
too much hard work? For instance, would it be possible to force
VM not to cache MFS pages, etc.?

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: commit MAKE_SHELL?

2000-04-26 Thread Anatoly Vorobey

On Tue, Apr 25, 2000 at 11:00:07PM -0700, Doug Barton wrote:
 Anatoly Vorobey wrote:
 
  Well, *should* we have a built-in "test"? I gather the original ash didn't
  have it due to the KIS principle. But if it speeds things up considerably,
  it's not much of a bloat, is it? I'd volunteer to write it.
 
   Unfortunately, the only way to tell for sure would be to do a couple
 make worlds with the current sh, then do some with super-sh with the
 built in 'test'. 

You are right. I will do it, and report the results.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: commit MAKE_SHELL?

2000-04-26 Thread Anatoly Vorobey

On Wed, Apr 26, 2000 at 07:55:19PM +, Anatoly Vorobey wrote:

  Unfortunately, the only way to tell for sure would be to do a couple
  make worlds with the current sh, then do some with super-sh with the
  built in 'test'. 
 
 You are right. I will do it, and report the results.

Reporting back ;) Adding built-in "test" turns out to be frightfully easy,
by compiling into the shell the real test(1) code (builtin printf works
this way, too). I have broken world right now so I can't test it, but on
running several configure scripts from ports I generally save from 9% to 
18% of time. Maybe someone would like to test/review/comment on this?

Patches below. In test(1), errx() calls had to be changed into warnx()
calls because the latter are #define'd to be something else when compiling
within the shell, in order to capture the output when necessary.

To test, 'sh /usr/src/bin/test/TEST.sh', run configures, build worlds, etc. ;)

Index: test.c
===
RCS file: /freebsd/cvs/src/bin/test/test.c,v
retrieving revision 1.29
diff -u -r1.29 test.c
--- test.c  1999/12/28 09:34:57 1.29
+++ test.c  2000/04/26 22:30:13
@@ -10,10 +10,12 @@
  * This program is in the Public Domain.
  */
 
+#if !defined(SHELL)
 #ifndef lint
 static const char rcsid[] =
   "$FreeBSD: src/bin/test/test.c,v 1.29 1999/12/28 09:34:57 sheldonh Exp $";
 #endif /* not lint */
+#endif /* not a sh builtin */
 
 #include sys/types.h
 #include sys/stat.h
@@ -26,6 +28,11 @@
 #include string.h
 #include unistd.h
 
+#ifdef SHELL
+#define main testcmd
+#include "bltin/bltin.h"
+#endif
+
 /* test(1) accepts the following grammar:
oexpr   ::= aexpr | aexpr "-o" oexpr ;
aexpr   ::= nexpr | nexpr "-a" aexpr ;
@@ -171,7 +178,7 @@
p++;
if (strcmp(p, "[") == 0) {
if (strcmp(argv[--argc], "]"))
-   errx(2, "missing ]");
+   return( (warnx("missing ]") , 2) );
argv[argc] = NULL;
}
 
@@ -195,9 +202,9 @@
 {
 
if (op  *op)
-   errx(2, "%s: %s", op, msg);
+   exit( (warnx("%s: %s", op, msg), 2) );
else
-   errx(2, "%s", msg);
+   exit( (warnx("%s", msg), 2) );
 }
 
 static int

Index: Makefile
===
RCS file: /freebsd/cvs/src/bin/sh/Makefile,v
retrieving revision 1.30
diff -u -r1.30 Makefile
--- Makefile1999/09/08 15:40:43 1.30
+++ Makefile2000/04/27 00:24:55
@@ -5,7 +5,7 @@
 SHSRCS=alias.c arith.y arith_lex.l cd.c echo.c error.c eval.c exec.c expand.c 
\
histedit.c input.c jobs.c mail.c main.c memalloc.c miscbltin.c \
mystring.c options.c output.c parser.c printf.c redir.c show.c \
-   trap.c var.c
+   test.c trap.c var.c
 GENSRCS= builtins.c init.c nodes.c syntax.c
 GENHDRS= builtins.h nodes.h syntax.h token.h y.tab.h
 SRCS= ${SHSRCS} ${GENSRCS} ${GENHDRS} y.tab.h
@@ -22,7 +22,7 @@
 # for debug:
 # CFLAGS+= -g -DDEBUG=2
 
-.PATH: ${.CURDIR}/bltin ${.CURDIR}/../../usr.bin/printf
+.PATH: ${.CURDIR}/bltin ${.CURDIR}/../../usr.bin/printf ${.CURDIR}/../test
 
 CLEANFILES+= mkinit mkinit.o mknodes mknodes.o \
mksyntax mksyntax.o
Index: builtins.def
===
RCS file: /freebsd/cvs/src/bin/sh/builtins.def,v
retrieving revision 1.7
diff -u -r1.7 builtins.def
--- builtins.def1999/08/27 23:15:08 1.7
+++ builtins.def2000/04/27 00:25:08
@@ -80,6 +80,7 @@
 setcmd set
 setvarcmd  setvar
 shiftcmd   shift
+testcmd    test [
 trapcmdtrap
 truecmd: true
 typecmdtype

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: commit MAKE_SHELL?

2000-04-25 Thread Anatoly Vorobey

On Sun, Apr 23, 2000 at 06:51:16PM -0400, Brian Fundakowski Feldman wrote:

 I certainly don't mind adding more shells to the ${MAKE_SHELL} logic, but
 so far have only done ksh because using pdksh as the ${MAKE_SHELL} does,
 for me, result in about 10% faster make world time, and speeds port
 building enormously 

Do you have any guesses about what causes this speed increase? What does
our shell suck at, in terms of speed? Maybe we could try and speed it up.

(this is not meant to counter your proposal).

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: sigjmp_buf question

2000-04-19 Thread Anatoly Vorobey

On Wed, Apr 19, 2000 at 09:39:44PM -0500, Steve Price wrote:
 Where does one look in the source for the definition of what
 each of the ints in sigjmp_buf._sjb (or jmp_buf._jb for that
 matter) contain?  The only occurrences of it (according to
 grep(1)) are in the header file machine/setjmp.h.  I also
 looked into src/sys/i386/i386/machdep.c and didn't see anything
 that struck me as being what I'm looking for.

Look into src/lib/libc/i386/gen/_setjmp.S and other files in
the same directory for setjmp() and sigsetmp(). Basically, it'll store
edx, ebx, esp, ebp, esi, edi, and the CPU control word, in that order.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



memory in the kernel

2000-04-16 Thread Anatoly Vorobey

I have to malloc a lot of memory in the kernel, hence a few
questions:

1. The data must be absolutely present at all times, no page
faults or locking mechanisms, etc. Does that mean
I should use kmem_alloc_wired() or am I misunderstanding its purpose?
Does it make sense to alloc less than a pageful or is the rest simply
going to be wasted?

2. Unfortunately, I need to realloc a lot as data is dynamic and I
don't know sizes beforehand. How do I do that? Do I malloc a new
region, copy manually and release the old one?

Thanks a lot in advance,
Anatoly.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: memory in the kernel

2000-04-16 Thread Anatoly Vorobey

On Sun, Apr 16, 2000 at 03:36:31PM +0200, Poul-Henning Kamp wrote:
 In message [EMAIL PROTECTED], Anatoly Vorobey writes
 :
 I have to malloc a lot of memory in the kernel, hence a few
 questions:
 
 How much is "a lot" ?

Apparently somewhere in the vicinity of 8Mb, and also coming in a form
of many hash tables, dynamic-size linked lists, variable-length
structs, etc. so it's not practical to estimate a high bound, allocate
and be done with it.

FWIW, I think Win32 got it right in providing growable private heaps,
where you can create your own heap and malloc() from it, and then just
return all the memory back with one destroy call. It makes a lot of sense
in some contexts.

 1. The data must be absolutely present at all times, no page
 faults or locking mechanisms, etc. Does that mean
 I should use kmem_alloc_wired() or am I misunderstanding its purpose?
 Does it make sense to alloc less than a pageful or is the rest simply
 going to be wasted?
 
 malloc(9) should be used.

Thanks!

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: PC Keyboard Scancodes

2000-04-15 Thread Anatoly Vorobey

You, Warner Losh, were spotted writing this on Sat, Apr 15, 2000 at 12:01:05AM -0600:
 In message [EMAIL PROTECTED] Mike Pritchard writes:
 : Here are the codes for the Compaq "Easy Access Internet Keyboard".
 : They also have a newer version with even more buttons, but I don't
 : have access to one, so I can't supply the codes for it.  If someone
 : is going to do some work to get the Microsoft keyboard's extra keys
 : to work, it shouldn't be hard to integrate these keys at the same time.
 
 Thanks Mike.  If I move forward on this, I'll include these too.

To make FreeBSD grok them, go to sys/dev/kbd/atkbd.c (that's assuming
the keyboard is AT-style rather than USB), and modify atkbd_read_char():

--- atkbd.c Sat Apr 15 11:58:13 2000
+++ atkbd.c.new Sat Apr 15 12:09:28 2000
@@ -681,6 +681,15 @@
case 0x5d:  /* menu key */
keycode = 0x6b;
break;
+/* the following are super-duper extended MS keys */
+case 0x5f: /* Sleep key */
+keycode = 0x6d;
+break;
+case 0x65:/* Search key */
+keycode = 0x6e;
+case 0x66:/* Favourites key */
+keycode = 0x70;
+
default:/* ignore everything else */
goto next_code;
}

And so on for all the keys, using your scancodes in case
statements, and allocating new keycodes as you go along, starting
from the first available one now which is 0x6d.  Then
you just add new lines to keymap files, starting from 109=0x6d,
and it should work at once. We have 148 spare entries in keymap_t
at the moment, they should suffice for some time ;)

Of course, with all those new keys on all those keyboards, we should
perhaps think about whether to add all of them as new keycodes,
and if so, in which order, etc. I've no idea if FreeBSD's concept
of 'keycode' (i.e. key number independent of keyboard model) is
synchronized with other BSD's, or Linux, etc.

Have no idea what to do about X though. 

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: PC Keyboard Scancodes

2000-04-14 Thread Anatoly Vorobey

You, Daniel O'Connor, were spotted writing this on Fri, Apr 14, 2000 at 05:54:16PM 
+0930:
 Hi,
 I put together a new PC and noticed the keyboard I bought has 3 extra keys
 (Wakeup, Sleep, and Power). I wondered if they could be used by mapping
 scancodes to the corresponding meanings, but I can't find the scan codes.
 
 I made a keymap file which mapped the scan codes from 109 to 255 to
 'debug' but pressing the keys don't trigger it :(
 
 Does anyone know if/how I can use them? Suggestions thus far have been to
 patch syscons to print all the scan codes it gets :)

No need to patch. Put syscons into the K_RAW mode (open the device, and
use the KDSKBMODE ioctl - search for it in syscons source for details),
and syscons'll give you back the scancodes when you read it. This is
what X does, by the way.

One reason why your approach might not have been working is that
keymaps really translate from keycodes to charcodes, not from scancodes
to charcodes, and the keyboard driver might've been unsuccessful
in matching nonstandard scancodes to keycodes.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton






To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: PC Keyboard Scancodes

2000-04-14 Thread Anatoly Vorobey

 I will see if I can try what Anatoly suggests.. :)

Use this simple prog to cough up the scancodes:

#include stdio.h
#include sys/kbio.h
#include sys/ioctl.h
#include fcntl.h
#include termios.h

void die(char *str) {
  perror(str);
  exit(0);
}

int main(void) {
  int err, mode;
  struct termios term_saved, term;
  int i;
  char ch;

  err = tcgetattr(0,term);
  if(err==-1) die("tcgetattr");
 
  term_saved = term;
  cfmakeraw(term);

  err = ioctl(0,KDGKBMODE, mode);
  if(err==-1) die("getkbdmode");
  else printf("current kb mode: %d\n", mode);

  err = ioctl(0,KDSKBMODE, K_RAW);
  if(err==-1) die("setkbmode");
  else printf("K_RAW mode set\n");

  printf("Press Esc to end.\n");

  err = tcsetattr(0,TCSANOW,term); /* set terminal to raw */
  if(err==-1) die("tcsetattr");

  for(i=0; i1000; i++) {
err=read(0,ch,1);
if(err!=1) break;
printf("%d ",ch); fflush(stdout);
if(ch==1) break; /* break on Escape */
  }

  err = tcsetattr(0,TCSANOW,term_saved);
  if(err==-1) die("tcsetattr");

  err = ioctl(0,KDSKBMODE,mode);
  if(err==-1) die("setkbmode");
  else printf("\nkb mode restored\n");

}


-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: PC Keyboard Scancodes

2000-04-14 Thread Anatoly Vorobey

On Sat, Apr 15, 2000 at 01:43:40AM +0930, Daniel O'Connor [EMAIL PROTECTED]\
,Daniel O'Connor wrote: 
  
 On 14-Apr-00 Warner Losh wrote: 
  I also yesterday got one of those damn microsoft internet keyboards 
  and it has lots of extra keys that don't show up either.  Including 
  the Wakeup, Sleep and power.  My belief is that maybe you have to 
  explicitly enable the extra keys? 
  
 Could be :-/ 
  
 I couldn't find any info about the technical jiggery pokery of them on the \
web 
 though :( 
 
Also this may be of help: 
 
http://www.microsoft.com/hwdev/desinit/scancode.htm 
 
It doesn't seem to contain anything about enabling them though, seems 
like they should just emit the scancodes listed. 


-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-07 Thread Anatoly Vorobey

I'm glad we are discussing specific technical issues now. Perhaps
we should move this discussion to freebsd-i18n once it's created?

You, Kazutaka YOKOTA, were spotted writing this on Fri, Apr 07, 2000 at 12:13:14PM 
+0900:
 
 I have suggested adding Unicode support in the keyboard driver and the
 vga driver (more precisely, vga and syscons). As a result of such changes:
 
 a) keymap files would map keycodes to the desired Unicode values rather
 than 8-bit values depending on a particular encoding, which should
 greatly simplify /usr/share/syscons/keymaps and let applications
 that desire so obtain Unicode input directly;
 
 As you are well aware, the keyboard driver (and keyboard related part
 of syscons has no knowledge about the character code generated via the
 keymap.  Thus, we will need little or no modification to handle
 Unicode-based keymaps.

Well, new code must be written to translate Unicode values produced by
the (modified) keyboard driver back into 8bit for normal userland 
applications. This code would use the same encoding table that syscons
would use to translate 8bit output to Unicode before displaying it.

Moreover, a way should be provided for userland applications to receive 
Unicode input directly should they want that. One solution is 
to simply add another mode (ks_mode member of atkbd_state structure)
which would return Unicode codes directly. 

 b) font files would map Unicode chars, rather than encoding-dependent
 chars, to glyphs. That would greatly simplify /usr/share/syscons/fonts,
 get rid of a huge amount of redundant information there, and allow
 creation of unified font files describing many languages at once.
 
 Um, well, we may be able to use a unified font file for many
 languages.  But, do not expect that we will be able to create a single
 font file which will be suitable for ALL languages.

You are right. I won't expect that.

 c) vga code would be changed to allow 512-characters hardware fonts in
 text modes, which will suffice to hold several languages at once. Moreover,
 
 The pcvt driver already uses 512 chars.

True text modes create an additional problem to consider: given some
(Unicode) font files loaded into kernel, and a limited supply (512 minus
128) of available char slots, which glyphs should be loaded into the
VGA font table? In other words, which glyphs are more important than
the others? One solution is to let userland dictate this, but this isn't
completely satisfying, because then userland has two additional control
structures now to provide for the kernel: encoding table for 8bit--Unicode
translation and mapping table for Unicode-512chars translation, the latter
being also irrelevant for the raster modes. 

I'll look into how Linux people handled this issue.

 in raster modes (which are pseudo-text modes -- graphic modes with
 fast text rendering) any amount of Unicode glyphs could be displayed
 at once. 
 
 If we intend to display any languages at once in the console, the
 raster mode is the only solution.  I agree.  But, we need a fair
 amount of knowledge about the language/script we are dealing with, in
 order to display its text correctly.

Let's try to enumerate the issues we will run into here. After all a
new font file format depends crucially on that. We need to reach a
conclusion on what is realistic and what isn't to provide on a 
fixed-width console. For instance, I would love to be able to handle
bidirectional output and Hebrew diacritics, but I am not sure at all
this is realistic to provide.

 UTF-8 may play a role of
 one such particular table, which will in future allow easy way
 to modify userland applications to support UTF-8 if desired.
 
 Multilingual text processing in the userland is a completely different
 issue which, I think, should be discussed separately.

I agree, but I'm rather talking here about allowing (future) userland
multilingual processing, rather than what and how it should be done. 
What I mean here is that the encoding table format should be more flexible
than "one byte --one UCS-2 code" because that will not allow 
simple and easy UTF-8 translation in the future, should we want that.

 We need more discussion to design a reasoble implementation
 (compromise :-) which does not make lives of some people difficult by
 imposing a single rigid scheme.

Great, let's have this discussion right here and now ;)

 Unicode, as it stands now, does not seem to be THE solution which
 addresses all the issues/problems/complexities of the languages in the
 world...  It can be viewed/used as a tool, though.

I agree with that completely.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-06 Thread Anatoly Vorobey

On Wed, Apr 05, 2000 at 08:02:28PM -0700, Alex Belits wrote:
 
   Can you guess, which one of of multiple cyrillic charsets never was
 actually used in Russia?
 
   ISO 8859-5.

It's actually being used quite often now by users of MS Outlook 2000
(those of them not sophisticated enough to select their own outgoing
encoding).

   And which is still the standard in Russian-language newsgroups,
 for russian Unix users and most of Russian-language web pages?

Cyrillic!=Russian.

   koi8-r, one of the oldest cyrillic charsets, primarily designed to keep

This is untrue. cp1251 is used in almost all Russian web pages, and
koi8-r is the minority (for no good reason, of course, primarily because
too many people never learned to set the right charset in the outgoing
HTTP headers).

 "intuitive" mapping to ASCII, to remain usable after passing through
 characters-mangling old software and to be readable on 7-bit dumb
 terminals -- and the last mentioned property is still saving a lot of
 trouble for Russians that use mail-to-pager systems. History is more 
 complex than some people think.

And with all its attractive properties, it's still missing the letter
"yat'" that I need. It's there in Unicode, of course (and in 8859-5).

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-05 Thread Anatoly Vorobey

You, Alex Belits, were spotted writing this on Tue, Apr 04, 2000 at 05:34:55PM -0700:
 On Tue, 4 Apr 2000, Alex Belits wrote:
 
   You mean, MIME multipart documents are better than Unicode if I, for instance, 
   want to handle Tolstoy's "War and Peace" with French quotes in the middle of 
   Russian sentences? 
   
   I don't think so.
  
This is what multipart format exists for -- to combine documents or 
  sections in the document with possibly different metadata in the
  headers. The idea of "mail attachment" appeared later.
 
   I have to add that I agree that the way, MIME multipart is handled is
 primitive and inconvenient for such applications, however this is not the
 result of any flaw in its design, only of the lack of progress after
 "everything should adopt Unicode" doctrine was declared. 

 One may argue
 that the way that TeX handles such a text is even more inconvenient,
 however even now it's most likely that TeX would be used for this kind of
 typesetting.

But we're *not* talking about typesetting -- rather about multilingual 
text handling. TeX, indeed, does typesetting and thus solves the wrong 
problem. In "real life" someone who needs to handle text with Russian 
and French in it -- type it, send it, read it, study it, etc. -- not 
*typeset* it -- won't use TeX for it, but will rather walk over to the 
Windows machine and fire up Word. This is the solution that's used in 
"real life" right now -- and incidentally, one of the reasons it's 
become so annoyingly common to email Word files as some kind of 
universal text standard. I don't like this, but currently the Unix 
world doesn't have a good alternative to offer. UTF-8 changes that,
and I think that's a wonderful thing. It's fine for you to talk about
what would happen if MINE were to evolve into a general-purpose text-marking
standard powerful enough to handle a Czech word inside a French sentence,
but that didn't happen, which means that neither you nor anyone else took
it there. Frankly, I don't think MIME would have been up for the task 
anyway, but that's a moot point because it just didn't happen.


-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-05 Thread Anatoly Vorobey

You, G. Adam Stanislav, were spotted writing this on Tue, Apr 04, 2000 at 07:59:55PM 
-0500:
 On Tue, Apr 04, 2000 at 05:08:56PM +0300, Giorgos Keramidas wrote:
 Of course, it still remains to be seen if having Unicode support on the
 console is a Good Thing(TM).
 
 I don't see how it would be even possible, due to hardware limitations.
 The console can only support an 8-bit font (I mean 8-bit encoding). If
 you change it for one character, you change it for everything on the
 console. And this was designed by *International* Business Machines! :)

a) VGA actually supports 512-characters fonts; this is not currently
supported by FreeBSD, but can be.

b) FreeBSD supports "raster modes", which are graphics VGA modes
used as if they were text modes -- the characters gets drawn very
quickly by the VGA renderer code using their representation in
the font file (it is my understanding, though I might be wrong,
that Linux doesn't support these). In these modes, you could draw
arbitrarily many different glyphs at the same time, once Unicode support
is added.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton






















To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-05 Thread Anatoly Vorobey

You, Marco van de Voort, were spotted writing this on Wed, Apr 05, 2000 at 02:10:35PM 
+0100:
 
 
 I'm sorry that I maybe missed part of the thread, but what parts that should get 
 UNICODE support are we thinking of?

I have suggested adding Unicode support in the keyboard driver and the
vga driver (more precisely, vga and syscons). As a result of such changes:

a) keymap files would map keycodes to the desired Unicode values rather
than 8-bit values depending on a particular encoding, which should
greatly simplify /usr/share/syscons/keymaps and let applications
that desire so obtain Unicode input directly;

b) font files would map Unicode chars, rather than encoding-dependent
chars, to glyphs. That would greatly simplify /usr/share/syscons/fonts,
get rid of a huge amount of redundant information there, and allow
creation of unified font files describing many languages at once.

c) vga code would be changed to allow 512-characters hardware fonts in
text modes, which will suffice to hold several languages at once. Moreover,
in raster modes (which are pseudo-text modes -- graphic modes with
fast text rendering) any amount of Unicode glyphs could be displayed
at once. 

d) userland applications wouldn't feel a thing, and will continue
to receive pure 8-bit stream translated from/to Unicode by syscons by
way of a user-supplied encoding table. UTF-8 may play a role of
one such particular table, which will in future allow easy way
to modify userland applications to support UTF-8 if desired.

I am willing to do this work ( a)-d) ), have a good understanding of
the issues involved, etc. However I am neither a committer nor a 
member of -core. If -core thinks this whole thing is a Bad Idea,
my changes won't get reviewed and/or committed, and I don't want to do
a lot of work to find out later it won't get into FreeBSD. This
is why I've asked for an endorsement from the People Who Decide
Things: not a guarantee, of course, that whatever I do will be
welcomed, but rather an acknowledgement that this is a Worthy Issue
and if my diffs are working well and answer the needed criteria,
they will be reviewed and committed.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-04 Thread Anatoly Vorobey

You, Alex Belits, were spotted writing this on Mon, Apr 03, 2000 at 03:23:42PM -0700:
 On Mon, 20 Mar 2000, MikeM wrote:
 
  Has anyone thought of Unicode support on FreeBSD? 
 
   Really the question is much more basic -- who benefits from having
 Unicode (or Unicode in the form of UTF-8) support. It isn't me for sure
 -- I am Russian.

So am I, and guess what? I'd really love being able to handle French and
Russian together smoothly and transparently. Not to mention Hebrew.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-04 Thread Anatoly Vorobey

You, Alex Belits, were spotted writing this on Mon, Apr 03, 2000 at 08:59:51PM -0700:

  -- I am Russian.
  
  So?
 
   So I don't want UTF-8 to be forced on me.

Noone is trying to force UTF-8 on you. 

In fact, userland support of UTF-8 can (and should IMHO) be based around
an environment variable a-la LANG which would tell programs whether they
should expect pure 8-bit text or UTF-8 text. This will give you a pretty
easy option to leave things as they are.

 Charset definitions in MIME
 headers exist for a reason.

Yes, and the better mail clients (e.g. mutt) are already able to translate
transparently between different equivalent charsets by using internally
a common superset -- Unicode. Everyone should be able to use whatever 
charset they desire.

   One of the most basic strengths of Unix is the ease with which text can
 be manipulated, and how "non-text" data can be processed using the same
 tools without any complex "this is text and this is not"
 application-specific procedures. UTF-8 turns "text" into something that
 gives us a dilemma -- to redesign everything to treat "text" as the stream
 of UTF-8 encoded Unicode (and make it impossible to combine text and
 "non-text" without a lot of pain), or to leave tools as they are and deal
 with "invalid" output from perfectly valid operations. 

This is not a dilemma. Just about the only really different aspect of handling
UTF-8 text is the algorithm for calculating the number of characters.
Most of the existing programs can easily be tailored to treat the byte 
stream as either pure 8-bit stream or UTF-8 stream based on YOUR preferences.

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-04-01 Thread Anatoly Vorobey

On Wed, Mar 29, 2000 at 03:39:08AM +, Anatoly Vorobey wrote:
 I wonder how useful it would be to teach syscons/kbd to handle Unicode.

No replies so far; let me try again. I was not trying to convey the
attitude "let it be done" in my message; I was rather hoping for a
reply of a "this would be a nice thing to have; if you do it, I'll review 
it and if works right, I'll commit it" kind. I believe that what I am 
suggesting is a Good Thing, but if this belief is not shared by others, 
there's not much sense in me trying to do it.

Thus I suggest teaching kbd/syscons/vga to use Unicode internally.
The picture would look as follows. A keymap specifies Unicode values 
(rather than 8-bit values as now) for keycodes; the console driver 
receives Unicode values from the keyboard driver. On the video side,
the console driver has a bunch of Unicode characters (rather than
8-bit characters) to render; in text mode, it translates them into
8-bit codes and puts them on the screen, the correct font having
been previously loaded; in raster mode, it uses the current font to
draw them out directly on the screen.

The benefits, rather considerable I think, are as follows:

- keymaps for different languages don't need to depend on encodings as
they do now (most of the languages currently have 2 and more different
encodings schemes arranged for in /usr/share/syscons/keymaps ; if Unicode
values are specified in keymaps, they all go away and only different
key layouts will require separate keymap files); in fact, kbdcontrol(1)
can then be written to be aware of the symbolic Unicode names which 
then would be used in keymap files, simplifying them greatly. 

- screen fonts as well don't need to depend on encodings - they will map
Unicode symbols into screen shapes. The redundant screen font files
go away.

- in raster modes (SC_PIXEL_MODE on, etc.) more than 256 characters can
now be trivially drawn. In fact, different languages that prevously
occupied the same codespace in 8-bit (i.e. all languages except English)
can now be displayed together in these modes. Maybe there are consequences
for scripts such as hiragana etc.? Consider the convenience for users
of scripts with relatively many characters.

- /usr/share/syscons/scrnmaps goes away, this kludge being no longer needed.

- the road is wide open for Unicode support in userland, through UTF-8. 

The drawbacks, as I see them, are as follows:

- The format of screen font files must be changed. They may not be
describing consecutive character codes anymore, and 8-bit indexed arrays
go away. One font file may now describe lots of languages at once.

- much more kernel memory used for font files if they are unified as
above and used as a whole. Some mechanism may be used for telling 
kbdcontrol(1) and friends which subset of the font to load (doing this 
strictly by user's LANG won't let him use several languages at once though).
In text modes, a mapping must be created to squeeze Unicode into 
the available 8-bit VGA font space, and if there isn't enough space, someone
must decide which Unicode chars to let go and convert into blanks -- 
syscons is the module which will be doing this job, and userland may 
tell it what the really important Unicode chars are based on the user's
LANG.

- some rendering routines are slowed down due to the fact that simple
8-bit array lookups are no longer available for getting characters'
information. This may be circumvented somewhat by smart searches/hash 
tables.

Implementation considerations:

- may be done in stages, which is good. For instance, keyboard driver
together with kbdcontrol(1) and keymap files may be modified at first,
with syscons translating Unicode codes into 8-bit using a translation
table conveyed to it by kbdcontrol(1) and handling video exactly
as before. Later video routines are changed, etc.

- kbd driver changes aren't significant in the kernel, mainly type changes
and the like (who else except syscons/pcvt is using the kbd driver?).

- in syscons, virtual buffer stuff, font support, and the VGA renderer
need to be significantly changed.

- in userland, data files in /usr/share/syscons need to be changed, 
kbdcontrol(1) and vidcontrol(1) need to adapt to that, and a method
for relaying to syscons the current Unicode-8bit translation table
(so that userland programs won't feel anything) needs to be added.
The other alternative is to do that conversion in userland libraries
and make syscons completely Unicode.

What do you think?
-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message



Re: Unicode on FreeBSD

2000-03-28 Thread Anatoly Vorobey

On Sat, Mar 25, 2000 at 06:34:19PM +0100, Christian Weisgerber wrote:
 MikeM [EMAIL PROTECTED] wrote:
 
  Has anyone thought of Unicode support on FreeBSD? 
 
 It has crossed my mind...
 
  I think that it is inevitable that eventually FreeBSD
  will *need* to support unicode if it wants to continue
  as a viable operating system in the future.
 
 Probably. The demand for Unicode support is currently rather limited,
 but I expect it to pick up somewhat once it is pervasive under
 Linux and applications programmers come to expect its availability.

I wonder how useful it would be to teach syscons/kbd to handle Unicode.
If I mess up royally in the details below, someone please correct me,
I only browsed through the sources a bit.

Currently, if I understand this correctly, the keyboard driver, and
not the console driver, is aware of the current keymap. The keymap
maps keycodes with modifiers to actions, which are either some misc
actions or character codes. The console driver gets character codes from
kbd driver and doesn't need to know anything about keymaps (?). It doesn't
know about screen fonts either; it just throws charcodes to video driver
which has current font installed. It converses with applications via
charcodes as well; at that level, the charcodes also include terminal
emulation sequences.

The unhappy drawbacks are:

a) different language encodings require separate keymap files even when
the actual keyboard layout doesn't change.
b) different language encodings require separate screen fonts because those
are indexed by character codes. 
c) general 8-bit limitations.
d) ...? 

Now suppose we change the keymap files to assign Unicode codes to keycodes
with mofifiers. (the actual Unicode numbers seem better in this case than
short UTF-8 sequences?). The kbd driver returns those to syscons. Syscons
now has the notion of the current encoding table which translates 
8-bit -- Unicode. It translates the codes back to 8-bit and gives
them to applications. When applications give output, syscons translates
it back into Unicode after handling the terminal emulation stuff. The video
driver now uses a font which only depends on screen size, but not on encoding.
It displays the stuff. 

If this works, the next step would be to offer UTF-8 as the ultimate
table and start teaching usermode apps to be happy with it. However, even
before that, the immediate effect would be great simplification and 
reduction of keymaps/screen fonts structure. 

Does this sound reasonable?

-- 
Anatoly Vorobey,
[EMAIL PROTECTED] http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message