Hi, Jeff.

Thanks for your thoughtful suggestions.

I do not plan to wait for the hash package to be redesigned to meet my 
expectations. As a matter of fact, I have:

a) Submitted a report of unexpected behavior in hash::values, which the package 
maintainer quickly replied to and said would examine. 
b) Designed (with the help of this list) and implemented a workaround in the 
form of wrapping the POSIXct objects in lists, which has my program working 
correctly for now. 

If the hash package is updated and the  workaround is no longer necessary, then 
I'll reverse this change. Otherwise, I'll look more deeply into my alternatives 
which might involve maintaining this workaround permanently, or analyzing 
alternative architectures.

The hash package is a beautiful piece of code that is working perfectly for me 
in many situations. Even with the list wrapping around the POSIXct objects, it 
is meeting my performance requirements much better than the alternatives I 
tested. So I'd rather not completely re-engineer working complex code without a 
very good reason.

However, I would like to respectfully disagree with you that my reaction to 
hash::values behavior was illogical. I don't want to start a flame war or 
anything, so let's try to keep the discussion civil. :)

See, a hash table (or a queue, or a stack, or an R vector) is a data structure 
that works as a container. You insert objects and you get them back according 
to the specificities of each data structure (stacks will have a FILO ordering, 
queues will have FIFO ordering, hashes will maintain key/value pairs, and so).

It is completely unreasonable to insert an object of class X into a container, 
and then get it back altered in a way that is not part of the 'contract' behind 
the data structure. If I assign X to key K on a hash, however I choose to ask 
the hash for the value associated with key K back, I should get exactly X as a 
response. I believe most computer scientists would agree that to be 
self-evident.

And that is to be expected by reading hash::values documentation:

        Extract values from a hash object. This is a pseudo- accessor method 
that returns hash values (without keys) as a vector if possible, a list 
otherwise.


Moreover, it has this to say about non-primitive types:

        If the values are of different types or of a complex class than a named 
list is returned. 


It never says it will unclass objects, or coerce them into primitive types. 
Hence the 'contract' implies I will get back what I inserted, unaltered, either 
in a vector or a list. And that is provably not what is happening. I would have 
been ok with a vector of POSIXct or a named list containing the POSIXct values, 
but instead I am getting a numeric vector.

I understand R is based on S, and that OOP concepts were introduced later into 
its history. However, one of the key concepts in OOP is encapsulation - as an 
outside entity you do not get to see the internal implementation of a class, 
you interact with it exclusively through its published "interface" (method, 
public member variables, etc). 

I cannot find any justification as for why an object "losing" its class 
unintentionally is ever acceptable, as it violates the concept of 
encapsulation. That is essentially what's happening if I look up several keys 
using values(). So this violates the encapsulation of the POSIXct class, as I 
am exposed to its internal numeric value. Moreover, it breaks the 
"method-dispatch" of R functions that know to treat POSIXct values differently. 
All of a sudden, the POSIXct objects I inserted are being treated, for example, 
by format as numeric instead of being dispatched to format.Date as expected.

So I don't think my reaction to this issue was illogical at all. Hope you'll 
agree now that I've explained myself a little better. :)

-- 
Alexandre Sieira
CISA, CISSP, ISO 27001 Lead Auditor

"The truth is rarely pure and never simple."
Oscar Wilde, The Importance of Being Earnest, 1895, Act I
On 21 de maio de 2013 at 22:44:19, Jeff Newmiller (jdnew...@dcn.davis.ca.us) 
wrote:
I recommend that you not plan on waiting for the hash package to be redesigned 
to meet your expectations. Also, your response to discovering this feature of 
the hash package seems illogical.  

From a computer science perspective, the hash mechanism is an implementation 
trick that is intended to improve lookup speed. It does not actually represent 
a fundamental data structure like a vector or a set does. You can always put 
your keys in a vector and search through them (e.g. vector indexing by string) 
to get an equivalent data retrieval. If the hash package is not improving the 
speed of your data access, adding an extra layer of data structure is hardly an 
appropriate solution.  

Why are you not using normal vectors or data frames and accessing with string 
or logical indexing?  

If you are avoiding vectors because they seem slow in loops, perhaps you just 
need to preallocate the vectors you will store your results in before your loop 
to regain acceptable speed. Or, perhaps the duplicated() or merge() functions 
could save you from this mess of incremental data processing.  
---------------------------------------------------------------------------  
Jeff Newmiller The ..... ..... Go Live...  
DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...  
Live: OO#.. Dead: OO#.. Playing  
Research Engineer (Solar/Batteries O.O#. #.O#. with  
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k  
---------------------------------------------------------------------------  
Sent from my phone. Please excuse my brevity.  

Alexandre Sieira <alexandre.sie...@gmail.com> wrote:  

>You are absolutely right.  
>  
>I am storing POSIXct objects into a hash (from the hash package).  
>However, if I try to get them out as a vector using the values()  
>function, they are unclassed. And that breaks my (highly vectorized)  
>code. Take a look at this:  
>  
>  
>> h = hash()  
>> h[["a"]] = Sys.time()  
>> str(h[["a"]])  
> POSIXct[1:1], format: "2013-05-20 16:54:28"  
>> str(values(h))  
> Named num 1.37e+09  
> - attr(*, "names")= chr "a"  
>  
>  
>I have reported this to the hash package maintainers. In the meantime,  
>however, I am storing, for each key, a list containing a single  
>POSIXct. Then, when I extract all using values(), I get a list  
>containing all POSIXct entries with class preserved.   
>  
>  
>> h = hash()  
>> h[["a"]] = list( Sys.time() )  
>> h[["b"]] = list( Sys.time() )  
>> h[["c"]] = list( Sys.time() )  
>> values(h)  
>$a  
>[1] "2013-05-21 09:54:03 BRT"  
>  
>$b  
>[1] "2013-05-21 09:54:07 BRT"  
>  
>$c  
>[1] "2013-05-21 09:54:11 BRT"  
>  
>> str(values(h))  
>List of 3  
> $ a: POSIXct[1:1], format: "2013-05-21 09:54:03"  
> $ b: POSIXct[1:1], format: "2013-05-21 09:54:07"  
> $ c: POSIXct[1:1], format: "2013-05-21 09:54:11"  
>  
>  
>However, the next thing I need to do is a min() over that list, so I  
>need to convert the list into a vector again.  
>  
>I agree completely with you that this is horrible for performance, but  
>it is a temporary workaround until values() is "fixed".  
>  
>--   
>Alexandre Sieira  
>CISA, CISSP, ISO 27001 Lead Auditor  
>  
>"The truth is rarely pure and never simple."  
>Oscar Wilde, The Importance of Being Earnest, 1895, Act I  
>On 20 de maio de 2013 at 19:40:14, Jeff Newmiller  
>(jdnew...@dcn.davis.ca.us) wrote:  
>I don't know what you plan to do with this list, but lists are quite a  
>bit less efficient than fixed-mode vectors, so you are likely losing a  
>lot of computational speed by using this list. I don't hesitate to use  
>simple data frames (lists of vectors), but processing lists is on par  
>with for loops, not vectorized computation. It may still support a  
>simpler model of computation, but that is an analyst comprehension  
>benefit rather than a computational efficiency benefit.  
>---------------------------------------------------------------------------  
>  
>Jeff Newmiller The ..... ..... Go Live...  
>DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...  
>Live: OO#.. Dead: OO#.. Playing  
>Research Engineer (Solar/Batteries O.O#. #.O#. with  
>/Software/Embedded Controllers) .OO#. .OO#. rocks...1k  
>---------------------------------------------------------------------------  
>  
>Sent from my phone. Please excuse my brevity.  
>  
>Alexandre Sieira <alexandre.sie...@gmail.com> wrote:  
>  
>>I was trying to convert a vector of POSIXct into a list of POSIXct,  
>>However, I had a problem that I wanted to share with you.  
>>  
>>Works fine with, say, numeric:  
>>  
>>  
>>> v = c(1, 2, 3)  
>>> v  
>>[1] 1 2 3  
>>> str(v)  
>> num [1:3] 1 2 3  
>>> l = as.vector(v, mode="list")  
>>> l  
>>[[1]]  
>>[1] 1  
>>  
>>[[2]]  
>>[1] 2  
>>  
>>[[3]]  
>>[1] 3  
>>  
>>> str(l)  
>>List of 3  
>> $ : num 1  
>> $ : num 2  
>> $ : num 3  
>>  
>>If you try it with POSIXct, on the other hand…  
>>  
>>  
>>> v = c(Sys.time(), Sys.time())  
>>> v  
>>[1] "2013-05-20 18:02:07 BRT" "2013-05-20 18:02:07 BRT"  
>>> str(v)  
>> POSIXct[1:2], format: "2013-05-20 18:02:07" "2013-05-20 18:02:07"  
>>> l = as.vector(v, mode="list")  
>>> l  
>>[[1]]  
>>[1] 1369083728  
>>  
>>[[2]]  
>>[1] 1369083728  
>>  
>>> str(l)  
>>List of 2  
>> $ : num 1.37e+09  
>> $ : num 1.37e+09  
>>  
>>The POSIXct values are coerced to numeric, which is unexpected.  
>>  
>>The documentation for as.vector says: "The default method handles 24  
>>input types and 12 values of type: the details of most coercions are  
>>undocumented and subject to change." It would appear that treatment  
>for  
>>POSIXct is either missing or needs adjustment.  
>>  
>>Unlist (for the reverse) is documented to converting to base types, so 
>  
>>I can't complain. Just wanted to share that I ended up giving up on  
>>vectorization and writing the two following functions:  
>>  
>>  
>>unlistPOSIXct <- function(x) {  
>>  retval = rep(Sys.time(), length(x))  
>>  for (i in 1:length(x)) retval[i] = x[[i]]  
>>  return(retval)  
>>}  
>>  
>>listPOSIXct <- function(x) {  
>>  retval = list()  
>>  for (i in 1:length(x)) retval[[i]] = x[i]  
>>  return(retval)  
>>}  
>>  
>>Is there a better way to do this (other than using *apply instead of  
>>for above) that better leverages vectorization? Am I missing something  
>  
>>here?  
>>  
>>Thanks!  
>>  
>>  
>>  
>>  
>>--   
>>Alexandre Sieira  
>>CISA, CISSP, ISO 27001 Lead Auditor  
>>  
>>"The truth is rarely pure and never simple."  
>>Oscar Wilde, The Importance of Being Earnest, 1895, Act I  
>>  
>>------------------------------------------------------------------------  
>  
>>  
>>______________________________________________  
>>R-help@r-project.org mailing list  
>>https://stat.ethz.ch/mailman/listinfo/r-help  
>>PLEASE do read the posting guide  
>>http://www.R-project.org/posting-guide.html  
>>and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to