Re: [Rd] truncation/rounding bug with write.csv

2018-03-15 Thread Gregory Michaelson
So, I come in this morning, and I also find that the behavior is not
Happening any longer as well.  Perhaps it has to do with Memory utilization
and some built-in safeguards to avoid Memory Problems by truncating the
numerics?  It's extermely frustrating that it I can no longer make this
happen.

On Wed, Mar 14, 2018 at 8:05 PM, Joris Meys  wrote:

> My apologies for not including sessionInfo(), and I'm a bit angry at
> myself for that. Retrying in a fresh session of R, I get different results.
> More specifically, I get the expected result where accuracy is the same in
> the first and the last line. As I didn't include my sessionInfo() in my
> previous mail, I can't figure out why I now have a different result. So I'm
> positive I've seen the behaviour described by Gregory, but I can't
> reproduce consistently.
>
> Results and session Info below.
>
> Cheers
> Joris
>
> df = data.frame(replicate(100, runif(100, 0,1)))
> write.csv(df, "temp.csv")
>
> > system('head -n2 temp.csv')
> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11"
> ,"X12","X13","X14","X15","X16","X17","X18","X19","X20","X21"
> ,"X22","X23","X24","X25","X26","X27","X28","X29","X30","X31"
> ,"X32","X33","X34","X35","X36","X37","X38","X39","X40","X41"
> ,"X42","X43","X44","X45","X46","X47","X48","X49","X50","X51"
> ,"X52","X53","X54","X55","X56","X57","X58","X59","X60","X61"
> ,"X62","X63","X64","X65","X66","X67","X68","X69","X70","X71"
> ,"X72","X73","X74","X75","X76","X77","X78","X79","X80","X81"
> ,"X82","X83","X84","X85","X86","X87","X88","X89","X90","X91"
> ,"X92","X93","X94","X95","X96","X97","X98","X99","X100"
> "1",0.278388975420967,0.370451691094786,0.717217007186264,0.
> 116161955753341,0.144262576242909,0.937281515449286,0.373484081588686,0.
> 955863541224971,0.826917823404074,0.821003203978762,0.592950115678832,0.
> 0627794633619487,0.815737818833441,0.0805139308795333,0.238502083579078,0.
> 509200588334352,0.73775092815049,0.868772336747497,0.0352788285817951,0.
> 96509046619758,0.403636189643294,0.435718205757439,0.0162769011221826,0.
> 597037401981652,0.504837732296437,0.206882111029699,0.883217994589359,0.
> 548339378088713,0.294472687412053,0.996299823047593,0.84715538774617,0.
> 206719091162086,0.936834576772526,0.439650829415768,0.48171737533994,0.
> 847850588615984,0.168411831371486,0.74452265072614,0.148969533387572,0.
> 410039864480495,0.778313281945884,0.432499173562974,0.512454774230719,0.
> 16644035698846,0.82063413807191,0.978053349768743,0.99700310616754,0.
> 874686364317313,0.796479270327836,0.816980117466301,0.274035695008934,0.
> 00785374757833779,0.678476774599403,0.660274159396067,0.184961069142446,0.
> 681200950173661,0.611048432299867,0.73395977425389,0.209964233217761,0.
> 310086127603427,0.975754244253039,0.125808657845482,0.015794032253325,0.
> 526331929024309,0.531722096726298,0.59097072808072,0.815139955608174,0.
> 529103851644322,0.183188699418679,0.910278890514746,0.237709420500323,0.
> 752752122003585,0.14534721034579,0.00572531204670668,0.222574554383755,0.
> 895228188252077,0.899962505558506,0.987743409816176,0.592631630599499,0.
> 948386731324717,0.86595072131604,0.0715177122037858,0.0426598901394755,0.
> 336731978459284,0.641609625890851,0.949697833275422,0.26424896903336,0.
> 528028564760461,0.562290757661685,0.653207891387865,0.513830083655193,0.
> 818740799557418,0.86044091056101,0.790382120991126,0.227793522411957,0.
> 580261130817235,0.181467723799869,0.295633365400136,0.548259064555168,0.
> 833231552969664
> > system('powershell -nologo & Get-Content -Path temp.csv -Tail 1')
> "100",0.946863592602313,0.656343327835202,0.627083137864247,0.
> 482342466711998,0.337082419078797,0.424337374512106,0.626660786569118,0.
> 870844106189907,0.78627574048005,0.0107703430112451,0.50574235082604,0.
> 182688802946359,0.29385484661907,0.0441680049989372,0.375604564556852,0.
> 895043386844918,0.510951161850244,0.865806604968384,0.0833957826253027,0.
> 100834607845172,0.139034334337339,0.854574690107256,0.121182460337877,0.
> 86904955166392,0.616418665507808,0.616997531382367,0.325345175806433,0.
> 487117795739323,0.0097313594771,0.30411878527,0.
> 0132197963539511,0.654607841046527,0.896146323531866,0.358923224499449,0.
> 968490360304713,0.757937406655401,0.926832290366292,0.863271801266819,0.
> 325824091676623,0.140821835258976,0.550571520347148,0.645497811725363,0.
> 545551799703389,0.440615838393569,0.296690225601196,0.838868388207629,0.
> 488215223187581,0.512655091006309,0.764586469857022,0.156665422255173,0.
> 109298826660961,0.660329486243427,0.220234925625846,0.192423258908093,0.
> 672684306278825,0.239764124620706,0.754978574579582,0.636799369007349,0.
> 240582759492099,0.458807958755642,0.196174292825162,0.477994701592252,0.
> 725636600283906,0.473409370519221,0.741089153569192,0.906417449470609,0.
> 540478575974703,0.360421892022714,0.933905930491164,0.631188633851707,0.
> 416520888684317,0.485372453462332,0.700725849252194,0.186034456361085,0.
> 

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Joris Meys
My apologies for not including sessionInfo(), and I'm a bit angry at myself
for that. Retrying in a fresh session of R, I get different results. More
specifically, I get the expected result where accuracy is the same in the
first and the last line. As I didn't include my sessionInfo() in my
previous mail, I can't figure out why I now have a different result. So I'm
positive I've seen the behaviour described by Gregory, but I can't
reproduce consistently.

Results and session Info below.

Cheers
Joris

df = data.frame(replicate(100, runif(100, 0,1)))
write.csv(df, "temp.csv")

> system('head -n2 temp.csv')
"","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X21","X22","X23","X24","X25","X26","X27","X28","X29","X30","X31","X32","X33","X34","X35","X36","X37","X38","X39","X40","X41","X42","X43","X44","X45","X46","X47","X48","X49","X50","X51","X52","X53","X54","X55","X56","X57","X58","X59","X60","X61","X62","X63","X64","X65","X66","X67","X68","X69","X70","X71","X72","X73","X74","X75","X76","X77","X78","X79","X80","X81","X82","X83","X84","X85","X86","X87","X88","X89","X90","X91","X92","X93","X94","X95","X96","X97","X98","X99","X100"
"1",0.278388975420967,0.370451691094786,0.717217007186264,0.116161955753341,0.144262576242909,0.937281515449286,0.373484081588686,0.955863541224971,0.826917823404074,0.821003203978762,0.592950115678832,0.0627794633619487,0.815737818833441,0.0805139308795333,0.238502083579078,0.509200588334352,0.73775092815049,0.868772336747497,0.0352788285817951,0.96509046619758,0.403636189643294,0.435718205757439,0.0162769011221826,0.597037401981652,0.504837732296437,0.206882111029699,0.883217994589359,0.548339378088713,0.294472687412053,0.996299823047593,0.84715538774617,0.206719091162086,0.936834576772526,0.439650829415768,0.48171737533994,0.847850588615984,0.168411831371486,0.74452265072614,0.148969533387572,0.410039864480495,0.778313281945884,0.432499173562974,0.512454774230719,0.16644035698846,0.82063413807191,0.978053349768743,0.99700310616754,0.874686364317313,0.796479270327836,0.816980117466301,0.274035695008934,0.00785374757833779,0.678476774599403,0.660274159396067,0.184961069142446,0.681200950173661,0.611048432299867,0.73395977425389,0.209964233217761,0.310086127603427,0.975754244253039,0.125808657845482,0.015794032253325,0.526331929024309,0.531722096726298,0.59097072808072,0.815139955608174,0.529103851644322,0.183188699418679,0.910278890514746,0.237709420500323,0.752752122003585,0.14534721034579,0.00572531204670668,0.222574554383755,0.895228188252077,0.899962505558506,0.987743409816176,0.592631630599499,0.948386731324717,0.86595072131604,0.0715177122037858,0.0426598901394755,0.336731978459284,0.641609625890851,0.949697833275422,0.26424896903336,0.528028564760461,0.562290757661685,0.653207891387865,0.513830083655193,0.818740799557418,0.86044091056101,0.790382120991126,0.227793522411957,0.580261130817235,0.181467723799869,0.295633365400136,0.548259064555168,0.833231552969664
> system('powershell -nologo & Get-Content -Path temp.csv -Tail 1')
"100",0.946863592602313,0.656343327835202,0.627083137864247,0.482342466711998,0.337082419078797,0.424337374512106,0.626660786569118,0.870844106189907,0.78627574048005,0.0107703430112451,0.50574235082604,0.182688802946359,0.29385484661907,0.0441680049989372,0.375604564556852,0.895043386844918,0.510951161850244,0.865806604968384,0.0833957826253027,0.100834607845172,0.139034334337339,0.854574690107256,0.121182460337877,0.86904955166392,0.616418665507808,0.616997531382367,0.325345175806433,0.487117795739323,0.0097313594771,0.30411878527,0.0132197963539511,0.654607841046527,0.896146323531866,0.358923224499449,0.968490360304713,0.757937406655401,0.926832290366292,0.863271801266819,0.325824091676623,0.140821835258976,0.550571520347148,0.645497811725363,0.545551799703389,0.440615838393569,0.296690225601196,0.838868388207629,0.488215223187581,0.512655091006309,0.764586469857022,0.156665422255173,0.109298826660961,0.660329486243427,0.220234925625846,0.192423258908093,0.672684306278825,0.239764124620706,0.754978574579582,0.636799369007349,0.240582759492099,0.458807958755642,0.196174292825162,0.477994701592252,0.725636600283906,0.473409370519221,0.741089153569192,0.906417449470609,0.540478575974703,0.360421892022714,0.933905930491164,0.631188633851707,0.416520888684317,0.485372453462332,0.700725849252194,0.186034456361085,0.903570784721524,0.0693298415280879,0.261779377236962,0.128776200115681,0.0801852298900485,0.665786169003695,0.144309232477099,0.485807131510228,0.0646850543562323,0.909404250094667,0.848976222565398,0.862456669798121,0.949187902035192,0.240288577275351,0.177118748193607,0.0833796421065927,0.0747064722236246,0.107194342184812,0.774909492349252,0.424547733273357,0.848057812545449,0.913047505775467,0.134580536745489,0.904593974584714,0.90503191947937,0.386907825712115

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: 

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Ista Zahn
I don't see the issue here. It would be helpful if people would report
their sessionInfo() when reporting whether or not they see this issue.
Mine is

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 rmsfact_0.0.3  cowsay_0.5.0   fortunes_1.5-4

On Wed, Mar 14, 2018 at 12:02 PM, Gregory Michaelson  wrote:
> I ran this code in RStudio Server on a linux machine, but I don’t know the 
> version offhand.  I will try to get it tomorrow.  Thanks.
>
> Thanks,
> Greg Michaelson
> www.datarobot.com
> 704-981-1118
>
>
>
>
>> On Mar 14, 2018, at 4:47 PM, Joris Meys  wrote:
>>
>> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is not 
>> recognized by Windows cmd, I replaced with:
>>
>> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1')
>>
>> The last line shows only 7 digits after the decimal, whereas the first have 
>> 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files are 
>> not the best way to work with datasets.
>>
>> Cheers
>> Joris
>>
>>
>>
>> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel > > wrote:
>>
>> What OS are you on?  On Ubuntu 17.10 with R 3.4.3 all seems well (see
>> below for your example, I just added a setwd()).
>>
>> [ That said, I long held a (apparently minority) view that csv is for all
>> intends and purposes a less-than-ideal format.  If you have that much data,
>> you do generally not want to serialize it back and forth as that is slow, and
>> may drop precision.  The rds format is great for R alone; we now have C code
>> to read it from other apps (in the librdata repo by Evan Miller).  Different
>> portable serializations work too (protocol buffer, msgpack, ...), there are
>> databases and on and on... ]
>>
>> Dirk
>>
>>
>> R> df <- data.frame(replicate(100, runif(100, 0,1)))
>> R> setwd("/tmp")
>> R> write.csv(df, "temp.csv")
>> R> system('tail -n1 temp.csv')
>> "100",0.11496100993827,0.740764639340341,0.519190795486793,0.736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.00151186063885689,0.446474697208032,0.0673662247136235,0.791947861900553,0.0973296447191387
>> R> system('head -n2 temp.csv')
>> 

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Gregory Michaelson
I ran this code in RStudio Server on a linux machine, but I don’t know the 
version offhand.  I will try to get it tomorrow.  Thanks.

Thanks,
Greg Michaelson
www.datarobot.com
704-981-1118




> On Mar 14, 2018, at 4:47 PM, Joris Meys  wrote:
> 
> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is not 
> recognized by Windows cmd, I replaced with:
> 
> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1')
> 
> The last line shows only 7 digits after the decimal, whereas the first have 
> 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files are 
> not the best way to work with datasets.
> 
> Cheers
> Joris
> 
> 
> 
> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel  > wrote:
> 
> What OS are you on?  On Ubuntu 17.10 with R 3.4.3 all seems well (see
> below for your example, I just added a setwd()).
> 
> [ That said, I long held a (apparently minority) view that csv is for all
> intends and purposes a less-than-ideal format.  If you have that much data,
> you do generally not want to serialize it back and forth as that is slow, and
> may drop precision.  The rds format is great for R alone; we now have C code
> to read it from other apps (in the librdata repo by Evan Miller).  Different
> portable serializations work too (protocol buffer, msgpack, ...), there are
> databases and on and on... ]
> 
> Dirk
> 
> 
> R> df <- data.frame(replicate(100, runif(100, 0,1)))
> R> setwd("/tmp")
> R> write.csv(df, "temp.csv")
> R> system('tail -n1 temp.csv')
> "100",0.11496100993827,0.740764639340341,0.519190795486793,0.736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.00151186063885689,0.446474697208032,0.0673662247136235,0.791947861900553,0.0973296447191387
> R> system('head -n2 temp.csv')
> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X21","X22","X23","X24","X25","X26","X27","X28","X29","X30","X31","X32","X33","X34","X35","X36","X37","X38","X39","X40","X41","X42","X43","X44","X45","X46","X47","X48","X49","X50","X51","X52","X53","X54","X55","X56","X57","X58","X59","X60","X61","X62","X63","X64","X65","X66","X67","X68","X69","X70","X71","X72","X73","X74","X75","X76","X77","X78","X79","X80","X81","X82","X83","X84","X85","X86","X87","X88","X89","X90","X91","X92","X93","X94","X95","X96","X97","X98","X99","X100"
> 

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Joris Meys
To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is not
recognized by Windows cmd, I replaced with:

system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1')

The last line shows only 7 digits after the decimal, whereas the first have
15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files are
not the best way to work with datasets.

Cheers
Joris



On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel  wrote:

>
> What OS are you on?  On Ubuntu 17.10 with R 3.4.3 all seems well (see
> below for your example, I just added a setwd()).
>
> [ That said, I long held a (apparently minority) view that csv is for all
> intends and purposes a less-than-ideal format.  If you have that much data,
> you do generally not want to serialize it back and forth as that is slow,
> and
> may drop precision.  The rds format is great for R alone; we now have C
> code
> to read it from other apps (in the librdata repo by Evan Miller).
> Different
> portable serializations work too (protocol buffer, msgpack, ...), there are
> databases and on and on... ]
>
> Dirk
>
>
> R> df <- data.frame(replicate(100, runif(100, 0,1)))
> R> setwd("/tmp")
> R> write.csv(df, "temp.csv")
> R> system('tail -n1 temp.csv')
> "100",0.11496100993827,0.740764639340341,0.519190795486793,0.
> 736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.
> 437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.
> 496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.
> 821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.
> 576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.
> 0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.
> 657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.
> 500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.
> 559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.
> 317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.
> 667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.
> 421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.
> 650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.
> 357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.
> 211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.
> 74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.
> 21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.
> 572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.
> 626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.
> 477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.
> 0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.
> 372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.
> 259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.
> 00151186063885689,0.446474697208032,0.0673662247136235,0.
> 791947861900553,0.0973296447191387
> R> system('head -n2 temp.csv')
> "","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11"
> ,"X12","X13","X14","X15","X16","X17","X18","X19","X20","X21"
> ,"X22","X23","X24","X25","X26","X27","X28","X29","X30","X31"
> ,"X32","X33","X34","X35","X36","X37","X38","X39","X40","X41"
> ,"X42","X43","X44","X45","X46","X47","X48","X49","X50","X51"
> ,"X52","X53","X54","X55","X56","X57","X58","X59","X60","X61"
> ,"X62","X63","X64","X65","X66","X67","X68","X69","X70","X71"
> ,"X72","X73","X74","X75","X76","X77","X78","X79","X80","X81"
> ,"X82","X83","X84","X85","X86","X87","X88","X89","X90","X91"
> ,"X92","X93","X94","X95","X96","X97","X98","X99","X100"
> "1",0.995067856274545,0.0237177284434438,0.839840568602085,0.
> 99880409357138,0.455015312181786,0.967688028467819,0.191194181796163,0.
> 903533136472106,0.570170691236854,0.86230118968524,0.23530788696371,0.
> 30707904486917,0.256274404237047,0.369592409580946,0.989929250674322,0.
> 50812312704511,0.806819133926183,0.536566868191585,0.0863138805143535,0.
> 294523851014674,0.676951135974377,0.195627561537549,0.261776751372963,0.
> 383222601376474,0.578275503357872,0.79082652577199,0.19860127940774,0.
> 0204593606758863,0.659964868798852,0.42379029514268,0.69516694964841,0.
> 0594558380544186,0.124592808773741,0.289328144863248,0.524508266709745,0.
> 84306427766569,0.317027662880719,0.273440480465069,0.111866136547178,0.
> 217484838794917,0.354757327819243,0.973936082562432,0.673076402861625,0.
> 300948366522789,0.219195493729785,0.912278874544427,0.276768424082547,0.
> 959344451315701,0.500720858341083,0.431024399353191,0.81699790329,0.
> 0738761406391859,0.600137831410393,0.639816240407526,0.405302967177704,0.
> 941259450744838,0.190415472723544,0.0382565588224679,0.486769351176918,0.
> 127647049957886,0.55870802059,0.686994878342375,0.176803215174004,0.
> 

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Dirk Eddelbuettel

What OS are you on?  On Ubuntu 17.10 with R 3.4.3 all seems well (see
below for your example, I just added a setwd()).

[ That said, I long held a (apparently minority) view that csv is for all
intends and purposes a less-than-ideal format.  If you have that much data,
you do generally not want to serialize it back and forth as that is slow, and
may drop precision.  The rds format is great for R alone; we now have C code
to read it from other apps (in the librdata repo by Evan Miller).  Different
portable serializations work too (protocol buffer, msgpack, ...), there are
databases and on and on... ]

Dirk


R> df <- data.frame(replicate(100, runif(100, 0,1)))
R> setwd("/tmp")
R> write.csv(df, "temp.csv")
R> system('tail -n1 temp.csv')
"100",0.11496100993827,0.740764639340341,0.519190795486793,0.736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.00151186063885689,0.446474697208032,0.0673662247136235,0.791947861900553,0.0973296447191387
R> system('head -n2 temp.csv')
"","X1","X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20","X21","X22","X23","X24","X25","X26","X27","X28","X29","X30","X31","X32","X33","X34","X35","X36","X37","X38","X39","X40","X41","X42","X43","X44","X45","X46","X47","X48","X49","X50","X51","X52","X53","X54","X55","X56","X57","X58","X59","X60","X61","X62","X63","X64","X65","X66","X67","X68","X69","X70","X71","X72","X73","X74","X75","X76","X77","X78","X79","X80","X81","X82","X83","X84","X85","X86","X87","X88","X89","X90","X91","X92","X93","X94","X95","X96","X97","X98","X99","X100"
"1",0.995067856274545,0.0237177284434438,0.839840568602085,0.99880409357138,0.455015312181786,0.967688028467819,0.191194181796163,0.903533136472106,0.570170691236854,0.86230118968524,0.23530788696371,0.30707904486917,0.256274404237047,0.369592409580946,0.989929250674322,0.50812312704511,0.806819133926183,0.536566868191585,0.0863138805143535,0.294523851014674,0.676951135974377,0.195627561537549,0.261776751372963,0.383222601376474,0.578275503357872,0.79082652577199,0.19860127940774,0.0204593606758863,0.659964868798852,0.42379029514268,0.69516694964841,0.0594558380544186,0.124592808773741,0.289328144863248,0.524508266709745,0.84306427766569,0.317027662880719,0.273440480465069,0.111866136547178,0.217484838794917,0.354757327819243,0.973936082562432,0.673076402861625,0.300948366522789,0.219195493729785,0.912278874544427,0.276768424082547,0.959344451315701,0.500720858341083,0.431024399353191,0.81699790329,0.0738761406391859,0.600137831410393,0.639816240407526,0.405302967177704,0.941259450744838,0.190415472723544,0.0382565588224679,0.486769351176918,0.127647049957886,0.55870802059,0.686994878342375,0.176803215174004,0.794697789475322,0.59406904829666,0.0897431457415223,0.196549082174897,0.0750515828840435,0.736311340238899,0.00494878669269383,0.383522965712473,0.960385771468282,0.101023471681401,0.209177070530131,0.798869548132643,0.147874428424984,0.187238642480224,0.148522146046162,0.32379064662382,0.620601811446249,0.201180462958291,0.179565666476265,0.466121524339542,0.245493365218863,0.980698639061302,0.342919659335166,0.387780519668013,0.393966492731124,0.148554262006655,0.521724705817178,0.722740866011009,0.105151653522626,0.461909410310909,0.905382365221158,0.073629385553,0.636923864483833,0.540197744267061,0.425208077067509,0.666353516280651,0.584139186656103
R> 

[Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Gregory Michaelson
Hello, I have looked on https://www.r-project.org/bugs.html , but it seems
that this is the only way to do it.

The issue is that the precision used by write.csv is on consistant for big
files.  See the following code:

First I create a large dataframe filled with random uniform values.  Then I
write it to .csv and print out the first and last lines.


df = data.frame(replicate(100, runif(100, 0,1)))

write.csv(df, "temp.csv")
system('tail -n1 temp.csv')
system('head -n2 temp.csv')


If you run this, you will note that the precision for the first line is
different from the preision of the last line.  I'm not sure what is
Controlling this, but in the code that led me to this bug, I was only
getting 3 decimal Points by the end of the file.

if you use the write functionality in readr, then you get consistent
precision:

readr::write_csv(df, "temp2.csv")
system('tail -n1 temp2.csv')
system('head -n2 temp2.csv')

 I hope that this ishelpful.  If this is not the proper way to submit the
bug, please let me know.


-- 

Greg

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel