Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-10-07 Thread Prof Brian Ripley

On Fri, 7 Oct 2005, Duncan Murdoch wrote:

I haven't been following this conversation in order, but I think there's 
another bug here besides the one(s?) you identified:


Jens had this example:


x <- 1:4
names(x) <- c(NA, "NA", "a", "")
x[names(x)]

 a 
  113   NA

Shouldn't the second entry in the result be 2, with name "NA"?  It seems the 
string "NA" has been converted to  here.


Yes, but I don't see it in PR#8161 where there is no name "NA" that I can
see.  (In other words it is not an instance of the subject line.)

The issue is that  is matching "NA", and it should not.  As in the
code

Rboolean NonNullStringMatch(SEXP s, SEXP t)
{
if (CHAR(s)[0] && CHAR(t)[0] && strcmp(CHAR(s), CHAR(t)) == 0)
return TRUE;
else
return FALSE;
}

and there are more instances around.





Duncan Murdoch


Prof Brian Ripley wrote:

On Thu, 6 Oct 2005, "Jens Oehlschlägel" wrote:



Dear Thomas,



This looks deliberate (there is a function NonNullStringMatch that does
the matching).  I assume this is because there is no other way to
indicate that an element has no name.



If so, it is a documentation bug -- help(names) and FAQ 7.14 should
specify this behaviour.  Too late for 2.2.0, unfortunately.


I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation.



I think Thomas is right as to how S interprets this: "" is no name on 
assignment, wheread NA as a name is a different thing (there probably is a 
name, we just do not know what it is).


Here is the crux of the example.

p <- c(a=1, 2)


p <- c(a=1, 2)
names(p)


[1] "a" ""


p


a
1 2


p2 <- c(1,2)
names(p2) <- c("a", "")
identical(p, p2)


[1] TRUE

so giving the name is "" really is the same as giving no name.

`Error 1' is said to be



p[""]



   NA

You haven't given a name, so I think that is right.  S (which has no 
character NAs) uses "" as the name, but here there may be a name or not.




P <- list(a=1, 2)



I think Jens then meant as `error 2' that



P


$a
[1] 1

[[2]]
[1] 2

shows no name for the second element, and that seems right to me (although 
S shows "" here).


Finally (`error 3')



P[""]


$"NA"
NULL

is a length-one list with name character-NA.  (S has no name here.)  That 
seems the right answer but if so is printed inconsistently.


I would say that



Q <- list(1, 2)
names(Q) <- c("a", NA)
Q


$a
[1] 1

$"NA"
[1] 2

was the only bug here (the name should be printed as ).  Now that
comes from this bit of code

if( isValidName(CHAR(PRINTNAME(TAG(s )
sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s;
else
sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s;

so non-syntactic names are printed surrounded by "".  Nowadays I think we 
would prefer ``, as in




A <- list("a+b"=1)
A


$"a+b"
[1] 1



A$"a+b"


[1] 1


A$`a+b`


[1] 1

but NA needs to be a special case as in



A <- list(1, 2)
names(A) <- c("NA", NA)
A


$"NA"
[1] 1

$"NA"
[1] 2



is.na(names(A))


[1] FALSE  TRUE






__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel







--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-10-07 Thread Duncan Murdoch
I haven't been following this conversation in order, but I think there's 
another bug here besides the one(s?) you identified:

Jens had this example:

 > x <- 1:4
 > names(x) <- c(NA, "NA", "a", "")
 > x[names(x)]
 a 
113   NA

Shouldn't the second entry in the result be 2, with name "NA"?  It seems 
the string "NA" has been converted to  here.

Duncan Murdoch


Prof Brian Ripley wrote:
> On Thu, 6 Oct 2005, "Jens Oehlschlägel" wrote:
> 
> 
>>Dear Thomas,
>>
>>
>>>This looks deliberate (there is a function NonNullStringMatch that does
>>>the matching).  I assume this is because there is no other way to
>>>indicate that an element has no name.
>>
>>>If so, it is a documentation bug -- help(names) and FAQ 7.14 should
>>>specify this behaviour.  Too late for 2.2.0, unfortunately.
>>
>>I respectfully disagree: the element has a name, its an empty string. Of
>>course "" is a doubtful name for an element, but as long as we allow this
>>name when assigning names()<- we also should handle it like a name in
>>subscripting. The alternative would be to disallow "" in names at all.
>>However, both alternatives rather look like code changes, not only
>>documentation.
> 
> 
> I think Thomas is right as to how S interprets this: "" is no name on 
> assignment, wheread NA as a name is a different thing (there probably is a 
> name, we just do not know what it is).
> 
> Here is the crux of the example.
> 
> p <- c(a=1, 2)
> 
>>p <- c(a=1, 2)
>>names(p)
> 
> [1] "a" ""
> 
>>p
> 
> a
> 1 2
> 
>>p2 <- c(1,2)
>>names(p2) <- c("a", "")
>>identical(p, p2)
> 
> [1] TRUE
> 
> so giving the name is "" really is the same as giving no name.
> 
> `Error 1' is said to be
> 
> 
>>p[""]
> 
> 
>NA
> 
> You haven't given a name, so I think that is right.  S (which has no 
> character NAs) uses "" as the name, but here there may be a name or not.
> 
> 
>>P <- list(a=1, 2)
> 
> 
> I think Jens then meant as `error 2' that
> 
> 
>>P
> 
> $a
> [1] 1
> 
> [[2]]
> [1] 2
> 
> shows no name for the second element, and that seems right to me (although 
> S shows "" here).
> 
> Finally (`error 3')
> 
> 
>>P[""]
> 
> $"NA"
> NULL
> 
> is a length-one list with name character-NA.  (S has no name here.)  That 
> seems the right answer but if so is printed inconsistently.
> 
> I would say that
> 
> 
>>Q <- list(1, 2)
>>names(Q) <- c("a", NA)
>>Q
> 
> $a
> [1] 1
> 
> $"NA"
> [1] 2
> 
> was the only bug here (the name should be printed as ).  Now that
> comes from this bit of code
> 
>   if( isValidName(CHAR(PRINTNAME(TAG(s )
>   sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s;
>   else
>   sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s;
> 
> so non-syntactic names are printed surrounded by "".  Nowadays I think we 
> would prefer ``, as in
> 
> 
>>A <- list("a+b"=1)
>>A
> 
> $"a+b"
> [1] 1
> 
> 
>>A$"a+b"
> 
> [1] 1
> 
>>A$`a+b`
> 
> [1] 1
> 
> but NA needs to be a special case as in
> 
> 
>>A <- list(1, 2)
>>names(A) <- c("NA", NA)
>>A
> 
> $"NA"
> [1] 1
> 
> $"NA"
> [1] 2
> 
> 
>>is.na(names(A))
> 
> [1] FALSE  TRUE
> 
> 
> 
> 
> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-10-07 Thread Jens Oehlschlägel
Dear Brian,

Thanks for picking this up. 
I think the critical point is that it is not a single isolated bug and it
would be a main effort to get this stuff consistent, because it (and
implications) seems to be spread all over the code. The to be applauded
efforts to properly sort out "NA" vs. as.character(NA) have not been fully
successful yet and "" is a similar issue. Please consider the following,
sorry for the length:


# ERROR 1

# I agree that c() disallows "" and NA names
# it makes sense discouraging users from using such names

> c(as.character(NA)=1)
Fehler: Syntaxfehler in Zeile "c(as.character(NA)="
> c("NA"=2, "a"=3)
NA  a 
 2  3 
> c(""=4)
Fehler: Versuch einen Variablennamen der Länge 0 zu nutzen

# however, "NA" must be expected as a legal name, e.g. when importing data
# and in your example specifying "no-name" in fact results in a "" name

> names(c(a=1, 2))
[1] "a" "" 
> 

# My interpreteation is that the user specifies a mixture of elements with
and without names, 
# and therefore the no-names must be co-erced to "" names, and in principle
that's completely fine

# a character vector is defined to have either as.character(NA) OR "NA" OR
"" or another positive length string 
# (which is complicated enough)
# formally the names is an attribute (character vector) of an object and can
be manipulated as such

> x <- 1:4
> names(x) <- c(NA, "NA", "a", "")
> names(x)
[1] NA   "NA" "a"  ""  
> # and in principle all of those can be properly distinguished
> x[match(names(x), names(x))]
   NAa  
   1234 


# introducing a fifth non-name state that sometimes equals "" and sometimes
not, introduces inconsistency into the language
# e.g. the fact that elements can be selected by their name but not by their
non-name
# Thus currently selecting by names is a mess from a consistency perspective


> x[names(x)]
 a  
   113   NA 
 
# in the following subscripting with "" works, but not with "NA"
> for (i in names(x))
+ print(x[[i]])
[1] 1
[1] 1
[1] 3
[1] 4


# ERROR 1a: If failing on "NA" is not a bug, I switch from programming to
Kafka
> x["NA"]
 

   1 
# ERROR 1b: clearly wrong
> x[["NA"]]
[1] 1
# ERROR 1c: and from my humble understanding failing on "" is a bug as well 
> x[""]
 
  NA 
# wheras interestingly this is correct
> x[[""]]
[1] 4

  
# I think it is obvious how to remove these inconsistencies 
# (as long as we do not disallow "" in names alltogether, 
#  which is almost impossible, since every users legally can set the names
vector in a variety of ways )
 
# these are not easy, but perfectly fine
> x[as.character(NA)]
 
   1 
> x[as.integer(NA)]
 
  NA 
 
# and these are really debatable difficult ones
> x[NA]

  NA   NA   NA   NA 
> x[as.logical(NA)]

  NA   NA   NA   NA 



## ERROR 2+3: the above inconsistencies generalize to lists

lx <- as.list(x)

> lx
$"NA"   (ERROR 2a)
[1] 1

$"NA"
[1] 2

$a
[1] 3

[[4]]   (ERROR 2b)
[1] 4

# and should read 

> lx
$NA (  or $as.character(NA) for clarity and warning )
[1] 1

$"NA"
[1] 2

$a
[1] 3

$""
[1] 4


# Note that - except for printing - match works perfectly in 
> lx[match(names(lx), names(lx))]
$"NA"
[1] 1

$"NA"
[1] 2

$a
[1] 3

[[4]]
[1] 4

# and also in 
> for (i in match(names(lx), names(lx)))
+ print(lx[[i]])
[1] 1
[1] 2
[1] 3
[1] 4


# Of course I consider the following behaviour as inconsistent
> lx[names(lx)]
$"NA"
[1] 1

$"NA"
[1] 1   (ERROR 3a)

$a
[1] 3

$"NA"
NULL(ERROR 3b)


# using [[ the second one fails
> for (i in names(lx))
+ print(lx[[i]])
[1] 1
[1] 1   (ERROR 3c)
[1] 3
[1] 4   (interestingly correct)


# finally note that this works
> eval(substitute(lx$y, list(y=as.character(NA
# but not this
> get("$")(lx, as.character(NA))
Fehler in get("$")(lx, as.character(NA)) : ungültiger Indextyp
# and both go wrong with "NA"

--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-10-06 Thread Prof Brian Ripley

On Thu, 6 Oct 2005, "Jens Oehlschlägel" wrote:


Dear Thomas,


This looks deliberate (there is a function NonNullStringMatch that does
the matching).  I assume this is because there is no other way to
indicate that an element has no name.



If so, it is a documentation bug -- help(names) and FAQ 7.14 should
specify this behaviour.  Too late for 2.2.0, unfortunately.


I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation.


I think Thomas is right as to how S interprets this: "" is no name on 
assignment, wheread NA as a name is a different thing (there probably is a 
name, we just do not know what it is).


Here is the crux of the example.

p <- c(a=1, 2)

p <- c(a=1, 2)
names(p)

[1] "a" ""

p

a
1 2

p2 <- c(1,2)
names(p2) <- c("a", "")
identical(p, p2)

[1] TRUE

so giving the name is "" really is the same as giving no name.

`Error 1' is said to be


p[""]


  NA

You haven't given a name, so I think that is right.  S (which has no 
character NAs) uses "" as the name, but here there may be a name or not.



P <- list(a=1, 2)


I think Jens then meant as `error 2' that


P

$a
[1] 1

[[2]]
[1] 2

shows no name for the second element, and that seems right to me (although 
S shows "" here).


Finally (`error 3')


P[""]

$"NA"
NULL

is a length-one list with name character-NA.  (S has no name here.)  That 
seems the right answer but if so is printed inconsistently.


I would say that


Q <- list(1, 2)
names(Q) <- c("a", NA)
Q

$a
[1] 1

$"NA"
[1] 2

was the only bug here (the name should be printed as ).  Now that
comes from this bit of code

if( isValidName(CHAR(PRINTNAME(TAG(s )
sprintf(ptag, "$%s", CHAR(PRINTNAME(TAG(s;
else
sprintf(ptag, "$\"%s\"", CHAR(PRINTNAME(TAG(s;

so non-syntactic names are printed surrounded by "".  Nowadays I think we 
would prefer ``, as in



A <- list("a+b"=1)
A

$"a+b"
[1] 1


A$"a+b"

[1] 1

A$`a+b`

[1] 1

but NA needs to be a special case as in


A <- list(1, 2)
names(A) <- c("NA", NA)
A

$"NA"
[1] 1

$"NA"
[1] 2


is.na(names(A))

[1] FALSE  TRUE


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-10-06 Thread Jens Oehlschlägel
Dear Thomas,

> This looks deliberate (there is a function NonNullStringMatch that does 
> the matching).  I assume this is because there is no other way to 
> indicate that an element has no name.

> If so, it is a documentation bug -- help(names) and FAQ 7.14 should 
> specify this behaviour.  Too late for 2.2.0, unfortunately.

I respectfully disagree: the element has a name, its an empty string. Of
course "" is a doubtful name for an element, but as long as we allow this
name when assigning names()<- we also should handle it like a name in
subscripting. The alternative would be to disallow "" in names at all.
However, both alternatives rather look like code changes, not only
documentation. 

Best regards


Jens Oehlschlägel

-- 
Highspeed-Freiheit. Bei GMX supergünstig, z.B. GMX DSL_Cityflat,

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Subscripting fails if name of element is "" (PR#8161)

2005-09-30 Thread Thomas Lumley

On Fri, 30 Sep 2005, "Jens Oehlschlägel" wrote:

Dear all,

The following shows cases where accessing elements via their name fails (if
the
name is a string of length zero).



This looks deliberate (there is a function NonNullStringMatch that does 
the matching).  I assume this is because there is no other way to 
indicate that an element has no name.


If so, it is a documentation bug -- help(names) and FAQ 7.14 should 
specify this behaviour.  Too late for 2.2.0, unfortunately.


-thomas






Best regards


Jens Oehlschlägel



p <- 1:3
names(p) <- c("a","", as.character(NA))
p

  a  
  123


for (i in names(p))

+ print(p[[i]])
[1] 1
[1] 2
[1] 3


# error 1: vector subsripting with "" fails in second element
for (i in names(p))

+ print(p[i])
a
1

 NA

  3


# error 2: print method for list shows no name for second element
p <- as.list(p)


for (i in names(p))

+ print(p[[i]])
[1] 1
[1] 2
[1] 3


# error 3: list subsripting with "" fails in second element
for (i in names(p))

+ print(p[i])
$a
[1] 1

$"NA"
NULL

$"NA"
[1] 3



version

_
platform i386-pc-mingw32
arch i386
os   mingw32
system   i386, mingw32
status
major2
minor1.1
year 2005
month06
day  20
language R




# -- replication code --

p <- 1:3
names(p) <- c("a","", as.character(NA))
p

for (i in names(p))
 print(p[[i]])

# error 1: vector subsripting with "" fails in second element
for (i in names(p))
 print(p[i])

# error 2: print method for list shows no name for second element
p <- as.list(p)


for (i in names(p))
 print(p[[i]])

# error 3: list subsripting with "" fails in second element
for (i in names(p))
 print(p[i])




--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Subscripting fails if name of element is "" (PR#8161)

2005-09-30 Thread Jens Oehlschlägel
Dear all,

I resend this mail because it was blocked: I submitted a bug from the r-bug
webpage and hypatia seems to block mail that is send from a different IP
than that usually associated with the email. Looks like it is currently
impossible to correctly submit bugs from the website. However, here is the
original bug report:

(PR#8161)

Dear all,

The following shows cases where accessing elements via their name fails (if
the
name is a string of length zero). 

Best regards


Jens Oehlschlägel


> p <- 1:3
> names(p) <- c("a","", as.character(NA))
> p
   a   
   123 
> 
> for (i in names(p))
+ print(p[[i]])
[1] 1
[1] 2
[1] 3
> 
> # error 1: vector subsripting with "" fails in second element
> for (i in names(p))
+ print(p[i])
a 
1 
 
  NA 
 
   3 
> 
> # error 2: print method for list shows no name for second element
> p <- as.list(p)
> 
> 
> for (i in names(p))
+ print(p[[i]])
[1] 1
[1] 2
[1] 3
> 
> # error 3: list subsripting with "" fails in second element
> for (i in names(p))
+ print(p[i])
$a
[1] 1

$"NA"
NULL

$"NA"
[1] 3

> 
> version
 _  
platform i386-pc-mingw32
arch i386   
os   mingw32
system   i386, mingw32  
status  
major2  
minor1.1
year 2005   
month06 
day  20 
language R




# -- replication code --

p <- 1:3
names(p) <- c("a","", as.character(NA))
p

for (i in names(p))
 print(p[[i]])
 
# error 1: vector subsripting with "" fails in second element
for (i in names(p))
 print(p[i])

# error 2: print method for list shows no name for second element
p <- as.list(p)


for (i in names(p))
 print(p[[i]])
 
# error 3: list subsripting with "" fails in second element
for (i in names(p))
 print(p[i])




--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel