It is not needed. There is a large community of developer using SparkR. https://spark.apache.org/docs/latest/sparkr.html It does exactly what you want.
On 3 September 2017 at 20:38, Juan Telleria <jteller...@gmail.com> wrote: > Dear R Developers, > > I would like to suggest the creation of a new S4 object class for On-Disk > data.frames which do not fit in RAM memory, which could be called > disk.data.frame() > > It could be based in rsqlite for example (By translating R syntax to SQL > syntax for example), and the syntax and way of working of the > disk.data.frame() class could be exactly the same than with data.frame > objects. > > When the session is of, is such disk.data.frames are not saved, and > implicit DROP TABLE could be done in all the schemas created in rsqlite. > > Nowadays, with the SSD disk drives such new data.frame() class could have > sense, specially when dealing with Big Data. > > It is true that this new class might be slower than regular data.frame, > data.table or tibble classes, but we would be able to handle much more > data, even if it is at cost of speed. > > Also with data sampling, and use of a regular odbc connection we could do > all the work, but for people who do not know how to use RDBMS or specific > purpose R packages for this job, this could work. > > Another option would be to base this new S4 class on feather files, but > maybe making it with rsqlite is simply easier. > > A GitHub project could be created for such purpose, so that all the > community can contribute (included me :D ). > > Thank you, > Juan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel