Two quick hints: * use simplifyDataFrame = FALSE in fromJSON()
* read https://jennybc.github.io/purrr-tutorial/ls02_map-extraction-advanced.html (and https://jennybc.github.io/purrr-tutorial/) Hadley On Tue, Nov 29, 2016 at 8:06 AM, Daniel Bastos <dbas...@toledo.com> wrote: > Greetings! > > In an SQL table, I have a column that contains a JSON. I'd like easy > access to all (in an ideal world) of these JSON fields. I started out > trying to get all fields from the JSON and so I wrote this function. > > unfold.json <- function (df, column) > { > library(jsonlite) > ret <- data.frame() > > for (i in 1:nrow(df)) { > js <- fromJSON(df[i, ][[column]]) > ret <- rbind(ret, cbind(df[i, ], js)) > } > > ret > } > > It takes a data frame and a column-string where the JSON is to be > found. It produces a new RET data frame with all the rows of DF but > with new columns --- extracted from every field in the JSON. > > (The performance is horrible.) > > fromJSON sometimes produces a list that sometimes contains a data frame. > As a result, I end up getting a RET data frame with duplicated rows. > Here's what happens. > >> nrow(df) > [1] 1 > >> nrow(unfold.json(df, "response")) > [1] 3 > Warning messages: > 1: In data.frame(CreateUTC = "2016-11-29 02:00:43", Payload = list( : > row names were found from a short variable and have been discarded > 2: In data.frame(..., check.names = FALSE) : > row names were found from a short variable and have been discarded >> > > I expected a data frame with 1 row. The reason 3 rows is produced is > because in the JSON there's an array with 3 rows. > >> fromJSON(df$response)$RawPayload > [1] 200 1 128 > > I have also cases where fromJSON(df$response)$Payload$Fields is a data > frame containing various rows. So unfold.json produces a data frame > with these various rows. > > So I gave up on this general approach. > > (*) My humble approach > > For the moment I'm not interested in RawPayload nor Payload$Fields, so I > nullified them in this new approach. To improve performance, I guessed > perhaps merge() would help and I think it did, but this was not at all a > decision thought out. > > unfold.json.fast <- function (df, column) > { > library(jsonlite) > ret <- data.frame() > if (nrow(df) > 0) { > for (i in 1:nrow(df)) { > ls <- fromJSON(df[i, ][[column]]) > ls$RawPayload <- NULL > ls$Payload$Fields <- NULL > js <- data.frame(ls) > ret <- rbind(ret, merge(df[i, ], js)) > } > } > > ret > } > > I'm looking for advice. How would you approach this problem? > > Thank you! > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://hadley.nz ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.