You might want to use a data.table then. It will automatically detect that it is a 64 bit int. Although also in that case the user will have to install the data.table package. (which is a good idea anyway in my opinion :) )
It will then obviously allow you to join tables. Willem On 20-01-17 18:47, Nicolas Paris wrote: > Well I definitely cannot use them as numeric because join is the main > reason of those identifiers. > > About int64 and bit64 packages, it's not a solution, because I am > releasing a dataset for external users. I cannot ask them to install a > package in order to exploit them. > > I have to be very carefull when releasing the data. If a user just use > read.csv functions, they by default cast the identifiers as numeric. > > $ more res.csv > "col1";"col2" > "-1311071933951566764";"toto" > "-1311071933951566764";"tata" > > >> read.table("res.csv",sep=";",header=T) > col1 col2 > 1 -1.311072e+18 toto > 2 -1.311072e+18 tata > >> sapply(read.table("res.csv",sep=";",header=T),class) > col1 col2 > "numeric" "factor" > >> read.table("res.csv",sep=";",header=T,colClasses="character") > col1 col2 > 1 -1311071933951566764 toto > 2 -1311071933951566764 tata > > Am I comdemned to provide a R script with the data in order to exploit the > dataset ? > > Le 20 janv. 2017 à 18h29, Murray Stokely écrivait : >> 2^53 == 2^53+1 >> TRUE >> >> Which makes joining or grouping data sets with 64 bit identifiers >> problematic. >> >> Murray (mobile) >> >> On Jan 20, 2017 9:15 AM, "Nicolas Paris" <nicolas.pa...@aphp.fr> wrote: >> >> Le 20 janv. 2017 à 18h09, Murray Stokely écrivait : >> > The lack of 64 bit integer support causes lots of problems when dealing >> with >> > certain types of data where the loss of precision from coercing to 53 >> bits with >> > double is unacceptable. >> >> Hello Murray, >> Do you mean, by eg. -1311071933951566764 loses in precision during >> as.numeric(-1311071933951566764) process ? >> Thanks, >> > >> > Two packages were developed to deal with this: int64 and bit64. >> > >> > You may need to find archival versions of these packages if they've >> fallen off >> > cran. >> > >> > Murray (mobile phone) >> > >> > On Jan 20, 2017 7:20 AM, "Gabriel Becker" <gmbec...@ucdavis.edu> wrote: >> > >> > I am not on R-core, so cannot speak to future plans to internally >> support >> > int8 (though my impression is that there aren't any, at least none >> that are >> > close to fruition). >> > >> > The standard way of dealing with whole numbers too big to fit in an >> integer >> > is to put them in a numeric (double down in C land). this can >> represent >> > integers up to 2^53 without loss of precision see ( >> > http://stackoverflow.com/questions/1848700/biggest- >> > integer-that-can-be-stored-in-a-double). >> > This is how long vector indices are (currently) implemented in R. >> If >> it's >> > good enough for indices it's probably good enough for whatever you >> need >> > them for. >> > >> > Hope that helps. >> > >> > ~G >> > >> > >> > On Fri, Jan 20, 2017 at 6:33 AM, Nicolas Paris >> <nicolas.pa...@aphp.fr >> > >> > wrote: >> > >> > > Hello r users, >> > > >> > > I have to deal with int8 data with R. AFAIK R does only handle >> int4 >> > > with `as.integer` function [1]. I wonder: >> > > 1. what is the better approach to handle int8 ? `as.character` ? >> > > `as.numeric` ? >> > > 2. is there any plan to handle int8 in the future ? As you might >> know, >> > > int4 is to small to deal with earth population right now. >> > > >> > > Thanks for you ideas, >> > > >> > > int8 eg: >> > > >> > > human_id >> > > ---------------------- >> > > -1311071933951566764 >> > > -4708675461424073238 >> > > -6865005668390999818 >> > > 5578000650960353108 >> > > -3219674686933841021 >> > > -6469229889308771589 >> > > -606871692563545028 >> > > -8199987422425699249 >> > > -463287495999648233 >> > > 7675955260644241951 >> > > >> > > reference: >> > > 1. https://www.r-bloggers.com/r-in-a-64-bit-world/ >> > > >> > > -- >> > > Nicolas PARIS >> > > >> > > ______________________________________________ >> > > R-devel@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-devel >> > > >> > >> > >> > >> > -- >> > Gabriel Becker, PhD >> > Associate Scientist (Bioinformatics) >> > Genentech Research >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> > >> >> -- >> Nicolas PARIS >> >>
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel