[Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Alan Post
I'm reaching a point where my PEG parser, gentufa'i[1], is going to
be ready to tag version 1.0.

I have an issue that I've been putting off that I would like some
input on.

If possible, I would like to parse utf8 input.  I currently have
utf8 enabled in my egg.

gentufa'i works by storing the entire input port in a string, and
ceating position objects to refer to the rest of the string as I
parse.

This means I need to perform the following:

1) reference a character by index
2) compare a character, string, or regular expression starting
   at an index.

Without utf8, step 1 is O(1).  With utf8 enabled, that step becomes
O(n).  step 2 is also more expensive with utf8, as I have to pay that
same O(n) to get to the correct index, and I suspect I have some
O(n*1) operations that become O(n*m), namely character class
comparisons.  I think that means step 2 becomes O(x*y*z) rather than
O(1*y*1), (x=index, y=string comparison, z=character class comparison)
but don't hold me to that!

I suspect there is a way to avoid the O(n) penalty for step 1, but
I'm uncertain how to do it.  I have some patterns to the way I
index, all of which are contained the position object in my code[2]:

1) I increment the index position by one character
2) I increment the index position by the length of the string I just
   succeeded at matching.

Essentially, I take characters off the front of the string as I
parse, with the caveat that PEG parsers support full backtracking,
so I sometimes retrieve previous position objects and work from
there--I can't just throw away the prefix of the string I've
matched.

Can anyone point me in the right direction?

Also, I'm not 100% sure what the utf8 gets me, compared to treating
the string like binary data.  I suspect it would work if I had a
utf8 input file, a utf8 string to match, but that I compare them as
binary data.  I couldn't compare a utf8 *character*, like
#\¿, but I think I could compare ¿.  (Those are both inverted
question marks, if you don't have utf8 e-mail support)  Am I wrong
about that?

Thank you for your help.

1: http://wiki.call-cc.org/eggref/4/genturfahi
2: http://bugs.call-cc.org/browser/release/4/genturfahi/trunk/lerfu-porsi.scm

-Alan
-- 
.i ko djuno fi le do sevzi

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread F. Wittenberger
Am Mittwoch, den 24.11.2010, 08:37 -0700 schrieb Alan Post:
 Can anyone point me in the right direction?

I'll paste an example from my code, because an example is sometime
better than a lyrics.

If you where to keep the byte offset (called start in the code down
there) with your position objects, you get O(1) access to the next utf8
character.


(define utf8-seek0
  (foreign-lambda*
   integer
   ((scheme-object str) ; the utf8 encode object
(integer sl); it's length in byte
(integer start) ; byte offset into str
(integer index) ; utf8 char offset into str
(integer pos)   ; seek-to position ( index)
)   ; returns next start position
   #EOF
  unsigned char *s=(unsigned char *)C_c_string(str);
  unsigned char *scan=s+start;
  unsigned char *limit=s+sl;
  if( index  pos) {
while (index  pos) {
 if( scan = limit ) {
  return(-1);
  /* raise_error( make3( TLREF(0), NIL_OBJ, make_string(index out
of bounds), int2fx(pos) ) ); */
 }
 ++index;
 if (*scan  0x80) scan++;
 else if (*scan  0xE0) scan+=2;
 else if (*scan  0xF0) scan+=3;
 else if (*scan  0xF8) scan+=4;
 else if (*scan  0xFC) scan+=5;
 else if (*scan  0xFE) scan+=6;
 else return(-2);
}
return(scan-s);
  } else if(index  pos) {
 int size=0;
 while( index  pos ) {
   if( s  limit ) {
return(-1);
   }
 do {
   size++;
   limit--;
   if( s  limit || size  6 ) {
return(-2);
   }
 } while((*limit = 0x80)  (*limit  0xC0));
 index--;
}
return(limit-s);
  } else {
return(start);
  }
EOF
))

(define (utf8-seek s start index pos)
  (let ((v (utf8-seek0 s (string-length s) start index pos)))
(if (fx= v 0) v
(raise
 (case v
   ((-1) index out of bounds)
   ((-2) bad string)
   (else 'utf8-seek))

Here we still pay the O(n) penalty, because we really require random
access and there is to the best of my knowledge no way around, except
that you where to parse the utf8 sequence into a vector of strings of
one utf8 char in size.  (Which looks kinda prohibitive expensive).

(define (utf8-substring str from to)
  (let ((start-offset (utf8-seek str 0 0 from)))
(substring str start-offset (utf8-seek str start-offset from to

(define (utf8-string-ref str index)
  (utf8-string-getc str (utf8-seek str 0 0 index)))

;; Return the character at byte offset 'start' from the source string
;; (e.g., of a string port) and it's length.

(define utf8-string-getc*
  (foreign-lambda*
   integer
   ((scheme-object str) (integer sl) (integer start) ((c-pointer
integer) rsize))
   #EOF
  unsigned char *s=C_c_string(str);
  unsigned char *scan=s+start;
  unsigned char *limit=s+sl;
  unsigned int i, size=1, ch;
  if (*scan  0x80) ch=*scan;
  else if (*scan  0xE0) {size=2; ch=*scan  0x1F;}
  else if (*scan  0xF0) {size=3; ch=*scan  0x0F;}
  else if (*scan  0xF8) {size=4; ch=*scan  0x07;}
  else if (*scan  0xFC) {size=5; ch=*scan  0x3;} 
  else if (*scan  0xFE) {size=6; ch=*scan  0x1;}
  else return(-1); /* ch=0, raise_error( make3( TLREF(0), NIL_OBJ,
make_string(bad character size), int2fx(scan-s) ) ); */

  if( scan++ + size  limit )
return(-2);
/*raise_error( make3( TLREF(0), NIL_OBJ, make_string(short
character), int2fx(scan-s) ) );*/
  for(i=size-1; i ;--i) {
if ((*scan0x80) || (*scan = 0xC0))
  return(-3);
/*  raise_error( make3( TLREF(0), NIL_OBJ, make_string(bad byte),
int2fx(scan-s) ) );*/
else { ch=(ch6) | (*scan++  0x3F); }
  }
  *rsize=size;
  return(ch);
EOF
))

(define (utf8-string-getc str start)
  (let-location
   ((size integer))
   (let ((c (utf8-string-getc* str (string-length str) start (location
size
 (integer-char c

(define open-utf8-input-string
  (let ([make-input-port make-input-port]
[string-length string-length])
(lambda (str)
  (let ((index 0))
(let-location ((size integer))
(make-input-port
 (lambda ()
   (if (fx index (string-length str))
   (let ((c (utf8-string-getc*
 str (string-length str) index (location size
 (set! index (fx+ index size))
 (integer-char c))
   #!eof))
 (lambda () (fx index (string-length str)))
 (lambda () #t)
 (lambda ()
   (if (fx index (string-length str))
   (let ((c (utf8-string-getc*
 str (string-length str) index (location size
 (integer-char c))
   #!eof

(define (call-with-utf8-input-string str proc)
  (proc (open-utf8-input-string str)))

(define open-utf8-output-string open-output-string)

(define (call-with-utf8-output-string proc)
  (let ((port (open-utf8-output-string)))
(proc port)
(close-output-port port)))


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Peter Bex
On Wed, Nov 24, 2010 at 08:37:37AM -0700, Alan Post wrote:
 gentufa'i works by storing the entire input port in a string, and
 ceating position objects to refer to the rest of the string as I
 parse.
 
 This means I need to perform the following:
 
 1) reference a character by index
 2) compare a character, string, or regular expression starting
at an index.

Are you sure you need this?  If I understood the sentence above the
list correctly, it might be enough to use string-list and then work
with a list of characters.  This can be done pretty fast, and you
can store pointers into arbitrary places of the input simply by storing
the relevant cons cell.

 Without utf8, step 1 is O(1).  With utf8 enabled, that step becomes
 O(n).  step 2 is also more expensive with utf8, as I have to pay that
 same O(n) to get to the correct index, and I suspect I have some
 O(n*1) operations that become O(n*m), namely character class
 comparisons.

utf8 uses the fantastic iset egg for dealing with character sets.
membership tests are as fast as O(1) for short ranges, and become
O(log(n)) for longer ones, where n is the number of ranges it needs to
store.  My knowledge of complexity theory is a little rusty so I'm
probably describing this in a very roundabout fashion, but the point is
that you shouldn't worry about charset performance.  It'll be slower
than built-in srfi-14 (which is insanely fast - because it's a damn hack :P)
but not by too much.

 Essentially, I take characters off the front of the string as I
 parse, with the caveat that PEG parsers support full backtracking,
 so I sometimes retrieve previous position objects and work from
 there--I can't just throw away the prefix of the string I've
 matched.

With cons cells you should be able to implement this efficiently enough.
We don't have anything like string-pointers which can store arbitrary
indices in a string AFAIK.  That would be useful to have, I guess.

 Also, I'm not 100% sure what the utf8 gets me, compared to treating
 the string like binary data.  I suspect it would work if I had a
 utf8 input file, a utf8 string to match, but that I compare them as
 binary data.  I couldn't compare a utf8 *character*, like
 #\¿, but I think I could compare ¿.  (Those are both inverted
 question marks, if you don't have utf8 e-mail support)  Am I wrong
 about that?

No, that's correct. It's only when you start operating on the character
level (or when splitting strings for example) that utf8 makes a difference.

HTH,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread F. Wittenberger
Am Mittwoch, den 24.11.2010, 08:37 -0700 schrieb Alan Post: 
 Can anyone point me in the right direction?

I'll paste an example from my code, because an example is sometime
better than a lyrics.

If you where to keep the byte offset (called start in the code down
there) with your position objects, you get O(1) access to the next utf8
character.


(define utf8-seek0
  (foreign-lambda*
   integer
   ((scheme-object str) ; the utf8 encode object
(integer sl) ; it's length in byte
(integer start) ; byte offset into str
(integer index) ; utf8 char offset into str
(integer pos) ; seek-to position ( index)
) ; returns next start position
   #EOF
  unsigned char *s=(unsigned char *)C_c_string(str);
  unsigned char *scan=s+start;
  unsigned char *limit=s+sl;
  if( index  pos) {
while (index  pos) {
 if( scan = limit ) {
  return(-1);
  /* raise_error( make3( TLREF(0), NIL_OBJ, make_string(index out
of bounds), int2fx(pos) ) ); */
 }
 ++index;
 if (*scan  0x80) scan++;
 else if (*scan  0xE0) scan+=2;
 else if (*scan  0xF0) scan+=3;
 else if (*scan  0xF8) scan+=4;
 else if (*scan  0xFC) scan+=5;
 else if (*scan  0xFE) scan+=6;
 else return(-2);
}
return(scan-s);
  } else if(index  pos) {
 int size=0;
 while( index  pos ) {
   if( s  limit ) {
return(-1);
   }
 do {
   size++;
   limit--;
   if( s  limit || size  6 ) {
return(-2);
   }
 } while((*limit = 0x80)  (*limit  0xC0));
 index--;
}
return(limit-s);
  } else {
return(start);
  }
EOF
))

(define (utf8-seek s start index pos)
  (let ((v (utf8-seek0 s (string-length s) start index pos)))
(if (fx= v 0) v
(raise
(case v
   ((-1) index out of bounds)
   ((-2) bad string)
   (else 'utf8-seek))

Here we still pay the O(n) penalty, because we really require random
access and there is to the best of my knowledge no way around, except
that you where to parse the utf8 sequence into a vector of strings of
one utf8 char in size.  (Which looks kinda prohibitive expensive).

(define (utf8-substring str from to)
  (let ((start-offset (utf8-seek str 0 0 from)))
(substring str start-offset (utf8-seek str start-offset from to

(define (utf8-string-ref str index)
  (utf8-string-getc str (utf8-seek str 0 0 index)))

;; Return the character at byte offset 'start' from the source string
;; (e.g., of a string port) and it's length.

(define utf8-string-getc*
  (foreign-lambda*
   integer
   ((scheme-object str) (integer sl) (integer start) ((c-pointer
integer) rsize))
   #EOF
  unsigned char *s=C_c_string(str);
  unsigned char *scan=s+start;
  unsigned char *limit=s+sl;
  unsigned int i, size=1, ch;
  if (*scan  0x80) ch=*scan;
  else if (*scan  0xE0) {size=2; ch=*scan  0x1F;}
  else if (*scan  0xF0) {size=3; ch=*scan  0x0F;}
  else if (*scan  0xF8) {size=4; ch=*scan  0x07;}
  else if (*scan  0xFC) {size=5; ch=*scan  0x3;} 
  else if (*scan  0xFE) {size=6; ch=*scan  0x1;}
  else return(-1); /* ch=0, raise_error( make3( TLREF(0), NIL_OBJ,
make_string(bad character size), int2fx(scan-s) ) ); */

  if( scan++ + size  limit )
return(-2);
/*raise_error( make3( TLREF(0), NIL_OBJ, make_string(short
character), int2fx(scan-s) ) );*/
  for(i=size-1; i ;--i) {
if ((*scan0x80) || (*scan = 0xC0))
  return(-3);
/*  raise_error( make3( TLREF(0), NIL_OBJ, make_string(bad byte),
int2fx(scan-s) ) );*/
else { ch=(ch6) | (*scan++  0x3F); }
  }
  *rsize=size;
  return(ch);
EOF
))

(define (utf8-string-getc str start)
  (let-location
   ((size integer))
   (let ((c (utf8-string-getc* str (string-length str) start (location
size
 (integer-char c

(define open-utf8-input-string
  (let ([make-input-port make-input-port]
[string-length string-length])
(lambda (str)
  (let ((index 0))
(let-location ((size integer))
(make-input-port
(lambda ()
   (if (fx index (string-length str))
   (let ((c (utf8-string-getc*
str (string-length str) index (location size
(set! index (fx+ index size))
(integer-char c))
   #!eof))
(lambda () (fx index (string-length str)))
(lambda () #t)
(lambda ()
   (if (fx index (string-length str))
   (let ((c (utf8-string-getc*
str (string-length str) index (location size
(integer-char c))
   #!eof

(define (call-with-utf8-input-string str proc)
  (proc (open-utf8-input-string str)))

(define open-utf8-output-string open-output-string)

(define (call-with-utf8-output-string proc)
  (let ((port (open-utf8-output-string)))
(proc port)
(close-output-port port)))

;; Tell the string index of a byte offset into an utf8 encoded string.
;; Reverse to utf8-seek.

(define utf8-tell0
  (foreign-lambda*
   integer
   ((scheme-object str) (integer sl)
(integer start) (integer index)
(integer pos))
   #EOF
  unsigned char *s=(unsigned char *)C_c_string(str);
  unsigned char *scan=s+start;
  unsigned char *limit=scan+sl;
  if( s+pos  limit ) {
   

Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Alan Post
On Wed, Nov 24, 2010 at 05:05:18PM +0100, Peter Bex wrote:
 On Wed, Nov 24, 2010 at 08:37:37AM -0700, Alan Post wrote:
  gentufa'i works by storing the entire input port in a string, and
  ceating position objects to refer to the rest of the string as I
  parse.
  
  This means I need to perform the following:
  
  1) reference a character by index
  2) compare a character, string, or regular expression starting
 at an index.
 
 Are you sure you need this?  If I understood the sentence above the
 list correctly, it might be enough to use string-list and then work
 with a list of characters.  This can be done pretty fast, and you
 can store pointers into arbitrary places of the input simply by storing
 the relevant cons cell.
 

I will play with this, as you're correct I could work with lists
rather that strings.

I'm using irregex for character class matching.  It looks like I should be
using srfi-14/utf8+iset instead.  Do those work only on the character level,
am I missing a string version of those?  I see char-set-contains? for
which I can determine whether a character is in the class, but I
usually want to compare several characters in a row, as in I want to
match the input until something isn't in the character class.

-Alan
-- 
.i ko djuno fi le do sevzi

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Alan Post
On Wed, Nov 24, 2010 at 05:05:38PM +0100, Jörg F. Wittenberger wrote:
 Am Mittwoch, den 24.11.2010, 08:37 -0700 schrieb Alan Post:
  Can anyone point me in the right direction?
 
 I'll paste an example from my code, because an example is sometime
 better than a lyrics.
 
 If you where to keep the byte offset (called start in the code down
 there) with your position objects, you get O(1) access to the next utf8
 character.
 

The code is obviously a lot to digest.  I'll thank you for providing
it before I'm able to look at it, as I'd otherwise be sitting here a
couple days without you hearing from me!

Thank you,

-Alan
-- 
.i ko djuno fi le do sevzi

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Peter Bex
On Wed, Nov 24, 2010 at 09:33:24AM -0700, Alan Post wrote:
 I'm using irregex for character class matching.

The Irregex in experimental is reasonably fast for charsets, giving
O(log(n)) performance for charsets membership checking.  If the charset
is continuous (ie, with no gaps) it's actually O(1).

It's much less efficient than iset on fragmented character sets, but
on huge unbroken character sets it can be faster.  It stores vectors of
cons cells which hold the start/end ranges of subranges within the
character set, whereas iset stores small bit-vectors for subranges,
stored in a btree.

 It looks like I should be
 using srfi-14/utf8+iset instead.  Do those work only on the character level,
 am I missing a string version of those?

SRFI-14 is for dealing with characters.

 I see char-set-contains? for
 which I can determine whether a character is in the class, but I
 usually want to compare several characters in a row, as in I want to
 match the input until something isn't in the character class.

Then irregex might actually be the best way to go about it since that
can compile matchers for charset overlaps in alternatives in a smart way.

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Felix
From: Peter Bex peter@xs4all.nl
Subject: Re: [Chicken-users] utf8 and string-ref performance
Date: Wed, 24 Nov 2010 17:05:18 +0100

 
 With cons cells you should be able to implement this efficiently enough.
 We don't have anything like string-pointers which can store arbitrary
 indices in a string AFAIK.  That would be useful to have, I guess.

How should that look? Would locatives be useful here?


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Peter Bex
On Wed, Nov 24, 2010 at 07:13:10PM +0100, Felix wrote:
  With cons cells you should be able to implement this efficiently enough.
  We don't have anything like string-pointers which can store arbitrary
  indices in a string AFAIK.  That would be useful to have, I guess.
 
 How should that look? Would locatives be useful here?

I'm afraid this is just the shared substring/blob structure proposal
in another guise.  I don't know if locatives are useful; those can't
really be kept around for a long time, can they?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Re: [Chicken Gazette - Issue 13] - ##sys#:keyword:'s

2010-11-24 Thread Felix
From: Jörg F. Wittenberger joerg.wittenber...@softeyes.net
Subject: [Chicken-users] Re: [Chicken Gazette - Issue 13] - ##sys#:keyword:'s
Date: Mon, 22 Nov 2010 22:47:51 +0100

 Am Montag, den 22.11.2010, 21:22 +0100 schrieb Peter Bex:
 On Saturday another new thread (!) was started by Alan Post in which
 he reported a bug in Chicken's keyword argument handling. He created a
 ticket in Trac to help track this bug, but with the help of Alex and
 Felix he found out it was not a bug in Chicken but in his own code;
 `string-symbol` does not produce keyword objects even when the string
 ends with a colon. After he changed his code to use `string-keyword`
 everything worked as it should. Keywords can be confusing things:
 they're not quite the same as symbols because they're self-evaluating,
 yet `symbol?` returns `#t`.
 
 May I ask a simple question: what is the actual rational behind keywords
 (wrt. symbols)?

They provide a syntactically and semantically distinct marker for things like
argument lists.

 
 Are there any good references?

Unfortunately not. Others have pointed out DSSSL, you can also
check out the bigloo and gambit documentation, but they don't
provide much more than the Chicken manual.

 
 Could we do away with them?

Why? Should we? 

 
 Boil them down to mere read syntax?  ( 'x same as x: ?)
 

That would make them indistinguishable from normal
symbols, and thus would make their usage more error-prone,
I think.


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Chicken Gazette - Issue 13

2010-11-24 Thread F. Wittenberger
Am Dienstag, den 23.11.2010, 09:19 +0100 schrieb Peter Bex:
 On Tue, Nov 23, 2010 at 12:32:24AM +0100, Jörg F. Wittenberger wrote:
   == 2. Core development
   The scrutinizer was updated to give a warning when a one-armed `if`
   is used in tail-position, as suggested on chicken-users by Jörg
   Wittenberg.
  
  sure?  or is -- no, wait, no git struggle now.  spare me, please!
 
 Just wait for the next dev snapshot.
 
  As I said with a smiley in the other posting: I would never enter into a
  religious war.
  
  Therefore I have to raise a flag in favour of the great SQLite data
  base!
  
  Right here!
  
  Why?:  How does your PostgreSQL handle master-master replication?
 
 I've never had use for that so I don't really know.  But this wiki page
 sounds hopeful:
 http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling#Comparison_matrix

I did not find anything about details of the master-master replication
there.  Though the usual way is, that all changes an any end are
replicated to the other one.  So MiM will always corrupt your data base.

 I didn't know SQLite had any replication whatsoever at all.  Or did you
 roll your own?  
Well, I told you with a grin that this is a letters to the editor.  So
kind of a me too.

Yes, I did.  Using chicken.  Did you read my linked example code?
http://www.askemos.org/Adc5dd0c30f6e63932811ed60e019bb2d/Kalender?date=2010-11-01
It's intended how easy it can be to use a replicated database (which is
safe against the MiM attack).

Adding yet another replica is as complicated as filling the id into this
form (screenshot)
http://www.askemos.org/Ab6c588dfa4ed826d7b387f19fbc60f10

 If so, you could do that with any database!

Maybe you could.  SQLite was the only one, for which I found a way to do
it.  Hooking into it's virtual file system interface.  A one-man-show of
about 1500 LoC.  (Without the actual replication code, which I had
already before.)

  What
  if I mount a man in the middle attack on one of your master replicas and
  inject fake update packets?  Will I be able to tamper with your data
  base?  Too bad for you!!!  ;-)
 
 That's what SSL connections (with client certificates) are for.
 
 Cheers,
 Peter

Wait, security can be even stronger.  What if replica is rooted?  Or you
got an admin bribed?

Or - as in my example code above: each replica owner has a different
interest in the database content.  Hence a reasonable fear of fraud.

/Jörg


___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] Chicken Gazette - Issue 13

2010-11-24 Thread Peter Bex
On Wed, Nov 24, 2010 at 06:15:24PM +0100, Jörg F. Wittenberger wrote:
  http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling#Comparison_matrix
 
 I did not find anything about details of the master-master replication
 there.  Though the usual way is, that all changes an any end are
 replicated to the other one.

Well, that's what master-master replication is, isn't it?

 So MiM will always corrupt your data base.

You'd need some kind of MiM protection, or some in-database system which
assigns trustworthiness to your data, but that seems like it would be an
extension to simple replication.

From what I've read some of those systems are based on triggers.  I guess
you could extend those triggers with your own custom weighing functions
or whatever.  I'm no expert in this subject matter, so I'm just guessing.

  I didn't know SQLite had any replication whatsoever at all.  Or did you
  roll your own?  
 Well, I told you with a grin that this is a letters to the editor.  So
 kind of a me too.
 
 Yes, I did.  Using chicken.  Did you read my linked example code?
 http://www.askemos.org/Adc5dd0c30f6e63932811ed60e019bb2d/Kalender?date=2010-11-01

I keep getting connection refused from that server, so I can't check.

 Adding yet another replica is as complicated as filling the id into this
 form (screenshot)
 http://www.askemos.org/Ab6c588dfa4ed826d7b387f19fbc60f10

Again, connection refused.

  If so, you could do that with any database!
 
 Maybe you could.

I have no need for such a system right now :)  I just wanted to let you
know it's unfair to cite it as an advantage of SQLite if you just hacked
it on top since the same could be done with postgres.  Except that the
work has already been done for sqlite, of course ;)

  That's what SSL connections (with client certificates) are for.
 
 Wait, security can be even stronger.  What if replica is rooted?  Or you
 got an admin bribed?

That's not exactly a classical MitM situation, is it?  How do you deal
with that now?

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music.
-- Donald Knuth

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] handling the undefined value

2010-11-24 Thread Felix
From: Jörg F. Wittenberger joerg.wittenber...@softeyes.net
Subject: Re: [Chicken-users] handling the undefined value
Date: Mon, 22 Nov 2010 15:08:46 +0100

 Have a compiler switch (since it may break some code), which changes the
 code to return zero values instead of the distinguished undefined value.

I don't think this is a great idea: this will change the
semantics of code using call-with-values, will be less efficient,
and may throw errors in some cases - R5RS (in contrast to CL and
R6RS) does not automatically adjust the number of result values
to the number of values expected by the location where the
result(s) is/are used.


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Alex Shinn
On Wed, Nov 24, 2010 at 7:37 AM, Alan Post alanp...@sunflowerriver.org wrote:

 If possible, I would like to parse utf8 input.  I currently have
 utf8 enabled in my egg.
[...]
 Can anyone point me in the right direction?

Parsing is generally one of the things you get for
free with utf8.  Probably the only thing you need
to do is *remove* the reference to the utf8 egg
and everything will work.  The effect of this is
that parsing will work on bytes instead of characters,
but the results will be the same.

There may still be corner cases.  If the API allows
searching for individual characters, you need to
check if they are non-ASCII and if so convert them
into the relevant utf8 string.

Indexes on input and output would be in terms
of byte position.  If you want to make this char
position you have to convert once each on input
and output.  That's O(n), so no effect on asymptotic
performance.

-- 
Alex

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Alan Post
On Wed, Nov 24, 2010 at 07:15:49PM +0100, Peter Bex wrote:
 On Wed, Nov 24, 2010 at 07:13:10PM +0100, Felix wrote:
   With cons cells you should be able to implement this efficiently enough.
   We don't have anything like string-pointers which can store arbitrary
   indices in a string AFAIK.  That would be useful to have, I guess.
  
  How should that look? Would locatives be useful here?
 
 I'm afraid this is just the shared substring/blob structure proposal
 in another guise.  I don't know if locatives are useful; those can't
 really be kept around for a long time, can they?
 

Weeks ago when I posted the thread about using mmap it was because in
relationship to this problem, I'm looking for the best performance I
can get in the inner loop of my parser, where I'm matching against
the input buffer.

-Alan
-- 
.i ko djuno fi le do sevzi

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] utf8 and string-ref performance

2010-11-24 Thread Felix
From: Peter Bex peter@xs4all.nl
Subject: Re: [Chicken-users] utf8 and string-ref performance
Date: Wed, 24 Nov 2010 19:15:49 +0100

 On Wed, Nov 24, 2010 at 07:13:10PM +0100, Felix wrote:
  With cons cells you should be able to implement this efficiently enough.
  We don't have anything like string-pointers which can store arbitrary
  indices in a string AFAIK.  That would be useful to have, I guess.
 
 How should that look? Would locatives be useful here?
 
 I'm afraid this is just the shared substring/blob structure proposal
 in another guise.  I don't know if locatives are useful; those can't
 really be kept around for a long time, can they?
 

Sorry, I don't understand? They are not invalidated by GC (in case
you mean that).


cheers,
felix

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


[Chicken-users] Can an egg have a library and executable with the same name?

2010-11-24 Thread Alan Post
My egg, genturfa'i, has an executable and a library.  I've named the
library genturfahi and the executable genturfahi-peg.  I'd rather
name the executable genturfahi too, though I suspect I'm not able to
do that.

Is this true?  If it isn't, can someone point me to an egg that has
a library and an executable named after the egg?

-Alan
-- 
.i ko djuno fi le do sevzi

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users


Re: [Chicken-users] New eggs: npdiff, format-textdiff

2010-11-24 Thread Daishi Kato
Hi Ivan,

It works now.
Thanks!

One very small request: diff --unified format.

Best,
Daishi

At Fri, 19 Nov 2010 13:42:20 +0900,
Ivan Raikov wrote:
 
 
 Hello,
 
   Thanks for trying to use format-textdiff. The problem below was
 actually caused by a bug in npdiff, which has been fixed in npdiff
 release 1.13. Please update your copy of npdiff and try again. Let me
 know if you encounter  any other issues with those eggs.
 
   -Ivan
 
 Daishi Kato dai...@axlight.com writes:
 
  Hi all,
 
  Anybody using format-textdiff?
 
  I encountered the following problem:
  CHICKEN
  (c)2008-2010 The Chicken Team
  (c)2000-2007 Felix L. Winkelmann
  Version 4.6.1
  linux-unix-gnu-x86 [ manyargs dload ptables ]
  compiled 2010-09-25 on lobule (Linux)
 
  #;1 (use format-textdiff)
  #;2 (textdiff (with-input-from-string abc\n read-lines) 
  (with-input-from-string def\n read-lines))
 
  Error: bad argument count - received 4 but expected 6: #procedure
 
  Call history:
 
  ##sys#call-with-values
  vector-lib#check-index
  values
  make-vector
  vector-list
  ##sys#call-with-values
  vector-lib#check-index
  values
  make-vector
  vector-list--
 
 **
  XREA.COM -Free Web Hosting-
  http://www.xrea.com/
 **

___
Chicken-users mailing list
Chicken-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/chicken-users