Re: [R] Remove duplicate elements in lists via recursive indexing

Janko Thyson Mon, 23 May 2011 07:58:47 -0700

Hi Timothy,

and thanks for the answer. Loops where exactly what I was trying toavoid as much as possible. My initial idea was that that once I hadrecursive indexes at my disposal (which were retrieved over recursiveloops), I could simply use it in a similar manner as we do with indexes(that is 'do-all-at-once', like in 'x <- x[-idx.drop]'). But I thinkeven though recursive indexes are nice, you can't get around looping,which I think in turn means that you constantly have to adapt yourrecursive index set to the most recent 'state' of your list.

In case you're interested, in the attachment you'll find my currentsolution ('listDuplicatesProcess.txt' including the example script'listDuplicatesProcess_examples.txt'). It builds on some other code, soyou'd have to source 'flatten.txt' and 'envirToList' as well.


Regards,
Janko

On 23.05.2011 14:23, Timothy Bates wrote:

Dear Janko,
I think requires a for loop. The approach I took here was mark the dups, then 
dump them all in one hit:

testData = expand.grid(letters[1:4],c(1:3))
testData$keep=F
uniqueIDS = unique(testData$Var1)
for(thisID in uniqueIDS) {
        firstCaseOnly = match(thisID,testData$Var1)
        testData[firstCaseOnly,"keep"]=T
}

(testData = testData[testData$keep==T,])


On 23 May 2011, at 11:59 AM, Janko Thyson wrote:

Dear list,

I'm trying to solve something pretty basic here, but I can't really come up 
with a good solution. Basically, I would just like to remove duplicated named 
elements in lists via a their respective recursive indexes (given that I have a 
routine that identifies these recursive indexes). Here's a little example:

# VECTORS
# Here, it's pretty simple to remove duplicated entries
y<- c(1,2,3,1,1)
idx.dupl<- which(duplicated(y))
y<- y[-idx.dupl]
# /

# LISTS
x<- list(a=list(a.1.1=1, a.1.1=2, a.1.1=3))

x[[c(1,1)]]
x[[c(1,2)]] # Should be removed.
x[[c(1,3)]] # Should be removed.

# Let's say a 'checkDuplicates' routine would give me:
idx.dupl<- list(c(1,2), c(1,3))

# Remove first duplicate:
x[[idx.dupl[[1]]]]<- NULL
x
# Problem:
# Once I remove the first duplicate, my duplicate index would have to be
# updated as well as there is not third element anymore.
x[[idx.dupl[[2]]]]<- NULL

# So something like this would not work:
sapply(idx.dupl, function(x.idx){
    x[[x.idx]]<<- NULL
})
# /

Sorry if I'm missing something totally obvious here, but do you have any idea 
how to solve this?

Thanks a lot,
Janko

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

#' Coerce Environment to List (Recursively).
#'
#' Recursively coerces an \code{environment} to a \code{list}.  
#'
#' @param src A an \code{environment} that should be coerced.
#' @param order.type A \code{character} vector (length: max=1) specifying if
#' the list elements should be ordered and if so, what type of ordering should
#' be applied.
#' @param ... Further args.
#' @return A named \code{list} that corresponds to the recursively coerced
#' initial \code{environment}.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{flatten}}
#' @example inst/examples/envirAsList.R
envirToList <- function(
    src, 
    order.type=c("increasing", "decreasing", "none"),
    ...
){
    if(length(order.type) > 1){
        order.type <- order.type[1]
    }
    if(class(src) == "environment"){
        envir   <- new.env()  
        src     <- as.list(src)
        # LOOP OVER ELEMENTS
        out <- lapply(seq(along=src), function(x.src){
            envir$names <- c(envir$names, names(src[x.src]))
            # RECURSIVE FLATTENING
            out <- envirToList(src[[x.src]])
            return(out)
        })
        names(out) <- envir$names
        if(order.type == "increasing"){
            idx.order <- order(names(out))
            out <- out[idx.order]
        }
        if(order.type == "decreasing"){
            idx.order <- order(names(out), decreasing=TRUE)
            out <- out[idx.order]
        }
        # /
    } else {
        out <- src 
    }
    return(out)
}

#' Flatten (Nested) Lists or Environments.
#'
#' Flatten \code{lists} or \code{environments} according to specifications
#' made via arg \code{start.after} and/or arg \code{stop.at}. When keeping 
#' the defaults, the function will traverse arg \code{src} (if \code{src} is 
#' an \code{environment}, it is coerced to a \code{list} 
#' via \code{\link{envirToList}} first) to retrieve the values at the 
#' respective bottom layers/bottom elements. These values are arranged in a 
#' named \code{list} where the respective names can be interpreted as the
#' the paths to the retrieved values. See examples.   
#'
#' @param src A named (arbitrary deeply nested) \code{list} or an 
#' \code{environment} that should be flattened.
#' @param start.after An \code{integer} specifying the layer after which to 
#' start the flattening. \code{NULL} means to start at the very top. See
#' examples.
#' @param stop.at An \code{integer} specifying the layer at which to stop
#' the flattening. \code{NULL} means there is no stop criterion.
#' @param delim.path A \code{character} (length: 1) specifying how the names of 
#' the resulting flattened list should be pasted.
#' @param .do.debug If \code{TRUE}, print information that might be helpful
#' for debugging.
#' @param ... Further args.
#' @return A named \code{list} that features the desired degree of flattening.
#' @callGraphPrimitives
#' @author Janko Thyson \email{janko.thyson.rstuff@@googlemail.com}
#' @seealso \code{\link{envirToList}}
#' @example inst/examples/flatten.R
flatten <- function(
    src, 
    start.after=NULL, 
    stop.at=NULL, 
    .delim.path="/",
    .delim.index="-",
    do.index=FALSE,
    do.original=TRUE,
    do.warn=TRUE,
    .do.debug=FALSE,
    ...
){
    #---------------------------------------------------------------------------
    # VALIDATE
    #---------------------------------------------------------------------------
    
    if(!is.list(src) & !is.environment(src)){
        stop("Arg 'src' must be a 'list' or an 'environment'.")
    }
    if(!is.null(start.after) & !is.null(stop.at)){
        if(start.after == 1& stop.at == 1){
            msg <- c(
                "Invalid specification:",
                paste("* start.after: ", start.after, sep=""),
                paste("* stop.at:     ", stop.at, sep="")
            )
            stop(cat(msg, sep="\n"))
        }
    }
    # /VALIDATE ----------
    
    #---------------------------------------------------------------------------
    # INNER FUNCTIONS
    #---------------------------------------------------------------------------
    
    .startAfterInner <- function(
        envir,
        nms,
        out.1,
        do.reset=FALSE,
        ...
    ){
        .do.debug <- envir$.do.debug
        idx.diff <- diff(c(envir$start.after, length(envir$counter)))
        if(.do.debug){
            cat(c("", "+++", ""), sep="\n")
            #                print("+++")
            cat("names:", sep="\n")
            print(names(out.1))
            cat("envir$counter:", sep="\n")
            print(envir$counter)
            cat("idx.diff:", sep="\n")
            print(idx.diff)            
        }
        # UPDATE IF DEGREE OF NESTEDNESS EXCEEDS START CRITERION
        if(idx.diff > 0){
            idx.cutoff      <- (
                length(envir$counter)-idx.diff+1):length(envir$counter
            ) 
            idx.left        <- envir$counter[-idx.cutoff]
            nms.1           <- nms[idx.cutoff]
            names(out.1)    <- paste(nms.1, collapse="/") 
            # UPDATE SRC
            idx.append <- sapply(envir$history, function(x.hist){
                all(idx.left == x.hist)        
            })
            if(.do.debug){
                cat("idx.cutoff:", sep="\n")
                print(idx.cutoff)
                cat("idx.left:", sep="\n")
                print(idx.left)
                cat("idx.append:", sep="\n")
                print(idx.append)
                cat("names remaining:", sep="\n")
                print(names(out.1))
            }
            if(any(idx.append)){                                          
                envir$src[[idx.left]] <- append(envir$src[[idx.left]], 
                    values=out.1)                    
            } else {
                envir$src[[idx.left]] <- out.1
                # UPDATE HISTORY
                envir$history <- c(envir$history, list(idx.left))
            }
            envir$out <- envir$src
            # /             
        } 
        if(idx.diff < 0){
            envir$out <- envir$src
        }
        # /
        # RESET
        if(do.reset){
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
        }
        # /
        return(TRUE)
    }
    
    .updateOutInner <- function(
        envir,
        out.1,
        do.reset=FALSE,
        ...
    ){
        .do.debug <- envir$.do.debug
        # UPDATE OUT
        out.0           <- get("out", envir = envir)
        out             <- c(out.0, out.1)
        envir$out       <- out
        # /
        # RESET
        if(do.reset){
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
        }
        # /
        return(TRUE)
    }
    
    .flattenInner <- function(
        src, 
        envir, 
        ...
    ){
        .do.debug <- envir$.do.debug
        if( (class(src)=="list" & length(src) != 0) |
            (class(src) == "environment" & length(src) != 0)
            ){
            if(class(src) == "environment"){
                src <- as.list(src)
            }
            # UPDATE
            envir$counter.history <- c(envir$counter.history, 
list(envir$counter))
            # EXIT IF DEGREE EXCEEDS CUTOFF
            if(!is.null(envir$stop.at)){
                if(length(envir$counter) > envir$stop.at){ 
                    # THIS
                    nms         <- get("nms", envir=envir)
                    path.nms    <- paste(envir$nms, collapse=envir$.delim.path)
                    if(.do.debug){
                        cat("names:", sep="\n")
                        print(path.nms)
                    }
                    out.1           <- list(src)
                    names(out.1)    <- path.nms
                    # /
                    # DECISION ON FLATTENING
                    if(!is.null(envir$start.after)){
                        .startAfterInner(envir=envir, nms=nms, out.1=out.1, 
                            do.reset=TRUE)
                        return(NULL)
                        #                    }
                        # /
                    } else {
                        .updateOutInner(envir=envir, out.1=out.1, do.reset=TRUE)
                        return(NULL)
                    }
                }
            }
            # /
            # LOOP OVER ELEMENTS
            for(i in seq(along=src)){                
                # UPDATE COUNTER
                envir$counter <- c(envir$counter, i)
                # UPDATE NAMES
#                assign("nms", c(get("nms", envir=envir), names(src[i])), 
envir=envir)
                envir$nms   <- c(get("nms", envir=envir), names(src[i]))
                path.nms    <- paste(envir$nms, collapse=envir$.delim.path)
                # UPDATE INDEX
                idx.append <- !(path.nms %in% names(envir$index))            
                index.1 <- list(data.frame(
                    name=path.nms,
                    index=paste(envir$counter, collapse=envir$.delim.index),
                    is.top=length(envir$counter) == 1,
                    is.bottom=FALSE,
                    degree=length(envir$counter),
                    duplicate=!idx.append,
                    class=class(src[[i]]),
                    type=typeof(src[[i]]),
                    length=length(src[[i]]),
                    dim={
                        if(is.null(dim(src[[i]]))){
                            NA
                        } else {
                            paste(dim(src[[i]]), collapse=" ")
                        }
                    },
                    stringsAsFactors=FALSE
                ))
#                index.1$is.bottom   <- rslt
                names(index.1)      <- path.nms 
                envir$index         <- c(envir$index, index.1)
                # /
                # RECURSIVE FLATTENING
                rslt <- .flattenInner(src[[i]], envir) # call  recursively
                envir$index[[length(envir$index)]]$is.bottom <- rslt
                # RESET COUNTER
                if(i == length(src)){
                    envir$nms       <- envir$nms[-length(envir$nms)]
                    envir$counter   <- envir$counter[-length(envir$counter)]
                }
                # /
#                return(FALSE)
            }
            # /
            return(TRUE)
        } else {
            # THIS
            nms         <- get("nms", envir=envir)
            path.nms    <- paste(envir$nms, collapse=envir$.delim.path)
            if(.do.debug){
                cat("names:", sep="\n")
                print(path.nms)
            }
            out.1           <- list(src)
            names(out.1)    <- path.nms
            # /
            # DECISION ON FLATTENING
            if(!is.null(envir$start.after)){
                .startAfterInner(envir=envir, nms=nms, out.1=out.1)
            } else {
                .updateOutInner(envir=envir, out.1=out.1)
            }
            # /
            if(.do.debug){
                cat("out.1:", sep="\n")
                print(out.1)
            }
            # UPDATE INDEX
            idx.append <- !(path.nms %in% names(envir$index))
            if(idx.append){
                index.1 <- list(data.frame(
                    name=path.nms,
                    index=paste(envir$counter, collapse=envir$.delim.index),
                    is.top=FALSE,
                    is.bottom=TRUE,
                    degree=length(envir$counter),
                    class=class(src[[i]]),
                    type=typeof(src[[i]]),
                    length=length(src[[i]]),
                    dim={
                        if(is.null(dim(src[[i]]))){
                            NA
                        } else {
                            paste(dim(src[[i]]), collapse=" ")
                        }
                    },
                    stringsAsFactors=FALSE
                ))
                names(index.1) <- path.nms 
                envir$index <- c(envir$index, index.1)
            }
            # /
            # RESET
            envir$nms       <- envir$nms[-length(envir$nms)]
            envir$counter   <- envir$counter[-length(envir$counter)]
            # /
            return(TRUE)
        }
#        return(TRUE)
    }
    
    # /INNER FUNCTIONS ----------
    
    #---------------------------------------------------------------------------
    # ACTUAL PROCESSING
    #---------------------------------------------------------------------------
    
    # COERCE TO LIST
    if(class(src) == "environment"){
        src <- envirToList(src=src)
    }
    # /
    # PRESERVE ORIGINAL (just in case)
    src.0               <- src
    out                 <- list()
    # ENVIR
    envir               <- new.env()
    envir$.do.debug     <- .do.debug
    envir$counter       <- NULL
    envir$counter.history <- NULL
    envir$.delim.index   <- .delim.index
    envir$.delim.path    <- .delim.path
    envir$do.warn       <- do.warn
    envir$history       <- NULL
    envir$index         <- NULL
    envir$nms           <- NULL
    envir$out           <- list()
    envir$src           <- src
    envir$start.after   <- start.after
    stop.at.0           <- stop.at
    if(!is.null(stop.at)){
        stop.at.0 <- stop.at
        if(stop.at == 1){
            # OUT VALUE
            out <- envir$out
            out <- list(
                original=NULL,
                flat=src,
                index=list(raw=NULL, table=NULL),
                degree={
                    c(
                        if(is.null(start.after)){
                                0
                            } else {
                                start.after
                            },
                        if(is.null(stop.at.0)){
                                0
                            } else {
                                stop.at.0
                            }
                    )
                }
            )
            # /
            if(do.original){
                out$original <- src
            }
            # /
            # CLASS
            class(out) <- c("FlatList", class(out))
            return(out)
        } else {
            stop.at <- stop.at - 1
        }
    }
    envir$stop.at       <- stop.at
    # /
    # APPLY INNER
    .flattenInner(src, envir)
    
    # WARNINGS
    if(envir$do.warn){
        max.length <- max(sapply(envir$counter.history, length))
#        if(!envir$do.block.warning){
        if(!is.null(start.after)){            
            if(start.after > max.length){                        
                warning(paste("Argument 'start.after=", start.after, 
                    "' exceeds maximum degree of sublayer nestedness (=", 
                    max.length, ").", sep=""))
            }
        }
        if(!is.null(stop.at)){
            if(stop.at.0 > max.length){
                warning(paste("Argument 'stop.at=", stop.at.0, 
                    "' exceeds maximum degree of sublayer nestedness (=", 
                    max.length, ").", sep=""))    
            }
        }
    }
    # /
    
    # OUT VALUE
    out <- envir$out
    out <- list(
        original=NULL,
        flat=out,
        index=list(raw=NULL, table=NULL),
        degree={
            c(
                if(is.null(start.after)){
                0
                } else {
                    start.after
                },
                if(is.null(stop.at.0)){
                    0
                } else {
                    stop.at.0
                }
            )
        }
    )
    # /
    # PROCESS OUT
    if(do.index){
        index.raw   <- envir$index
        index.table <- do.call("rbind", index.raw)
        rownames(index.table) <- NULL
        out$index$raw <- index.raw
        out$index$table <- index.table
    }
    if(do.original){
        out$original <- src
    }
    # /
    # CLASS
    class(out) <- c("FlatList", class(out))
    # /

    # /ACTUAL PROCESSING ----------
   
    return(out)    
}

#-------------------------------------------------------------------------------
# SIMPLE LISTS
#-------------------------------------------------------------------------------

src <- list(DF=data.frame(A=c(1,2)), vec=c("a", "b")) 
src <- list(src,src)

flatten(src) 

# /SIMPLE LISTS ----------

#-------------------------------------------------------------------------------
# NESTED LISTS
#-------------------------------------------------------------------------------

src <- list(a=list(a.1=list(a.1.1=list(a.1.1.1=1, a.1.1.1=2, a.1.1.1=3), 
    a.1.2=1, a.1.2=2, a.1.2=3), a.2=list(a.2.1=list())), b=NULL, 
    c=data.frame(a=1:3, b=3:5)
)

flatten(src)
flatten(src, start.after=1)
flatten(src, start.after=1, .do.debug=TRUE)
flatten(src, start.after=2)
flatten(src, start.after=3)
flatten(src, start.after=4)

flatten(src, stop.at=1)
flatten(src, stop.at=2)
flatten(src, stop.at=3)
flatten(src, stop.at=4)

flatten(src, start.after=1, stop.at=1)
flatten(src, start.after=1, stop.at=2)
flatten(src, start.after=1, stop.at=3)
flatten(src, start.after=1, stop.at=4)
flatten(src, start.after=2, stop.at=4)

flatten(src, do.index=TRUE)
flatten(src, do.index=TRUE, do.return.index=TRUE)

# PROFILE
require(microbenchmark)
prfl <- microbenchmark(flatten(src))
median(prfl$time)/1000000000
prfl <- microbenchmark(flatten(src, do.index=TRUE))
median(prfl$time)/1000000000
prfl <- microbenchmark(flatten(src, do.index=TRUE, do.return.index=TRUE))
median(prfl$time)/1000000000
# /

# /NESTED LISTS ----------

#-------------------------------------------------------------------------------
# ENVIRONMENTS
#-------------------------------------------------------------------------------

envir <- new.env()
envir$a <- new.env()
envir$a$a.1 <- new.env()
envir$a$a.1$a.1.1 <- new.env()
envir$a$a.1$a.1.1$a.1.1.1 <- NA
envir$a$a.1$a.1.2 <- 5
envir$a$a.2 <- new.env()
envir$a$a.2$a.2.1 <- list()
envir$b <- NULL
envir$c <- data.frame(a=1:3, b=3:5)

envirToList(src=envir)

flatten(envir)
flatten(envir, start.after=1)
flatten(envir, start.after=1, .do.debug=TRUE)
flatten(envir, start.after=2)
flatten(envir, start.after=3)
flatten(envir, start.after=4)

flatten(envir, stop.at=1)
flatten(envir, stop.at=2)
flatten(envir, stop.at=3)
flatten(envir, stop.at=4)

flatten(envir, start.after=1, stop.at=1)
flatten(envir, start.after=1, stop.at=2)
flatten(envir, start.after=1, stop.at=3)
flatten(envir, start.after=1, stop.at=4)

flatten(envir, do.index=TRUE)
flatten(envir, do.index=TRUE, do.return.index=TRUE)

# PROFILE
require(microbenchmark)
prfl <- microbenchmark(flatten(envir))
median(prfl$time)/1000000000
prfl <- microbenchmark(flatten(envir, do.index=TRUE))
median(prfl$time)/1000000000
prfl <- microbenchmark(flatten(envir, do.index=TRUE, do.return.index=TRUE))
median(prfl$time)/1000000000
# /

# /ENVIRONMENTS ----------

listDuplicatesProcess <- function(
    src,
    handle.duplicates=c("stop", "keep.original", "keep.first", "keep.last",
        "index.duplicates", "index.all"),
    do.return.flat=TRUE,
    .delim.index="-",
    .delim.path="/",
    ...
){
    .buffer <- new.env()
    
    if(length(handle.duplicates) > 1){
        handle.duplicates <- handle.duplicates[1]
    }
    # FLATTEN
    if(!("FlatList" %in% class(src))){
        src.flat <- flatten(
            src=src, 
            do.index=TRUE, 
            .delim.index=.delim.index,
            .delim.path=.delim.path
        )
    } else {
        src.flat <- src
    }
    # /
    if(!all(src.flat$degree == c(0,0))){
        stop("Flatlist of degree c(0,0) required.")
    }
    .buffer$src.flat <- src.flat
    .buffer$index    <- NULL
    
    idx.dupl <- which(src.flat$index$table$duplicate)
    src.flat.split  <- split(
        .buffer$src.flat$index$table[idx.dupl,], 
f=.buffer$src.flat$index$table$name[idx.dupl]
    )
    if(length(idx.dupl)){
        if( !is.expression(handle.duplicates)){
            if(handle.duplicates == "stop"){
                stop("Duplicates identified.")
            }
        }     
        JNK <- sapply(src.flat.split, function(x.spl){
            if(!is.expression(handle.duplicates)){
                idx.dupl <- which(.buffer$src.flat$index$table$name %in% 
                    x.spl$name)
                if(handle.duplicates == "index.duplicates"){
                    index.new <- 
list(.buffer$src.flat$index$table[idx.dupl[-1],])
                    names(index.new) <- unique(x.spl$name)
                    .buffer$index <- c(.buffer$index, index.new)
                    return(NULL)
                }
                if(handle.duplicates == "index.all"){
                    index.new <- list(.buffer$src.flat$index$table[idx.dupl,])
                    names(index.new) <- unique(x.spl$name)
                    .buffer$index <- c(.buffer$index, index.new)
                    return(NULL)
                }
                if(handle.duplicates == "keep.original"){
                    idx.keep <- 1
                    idx.dupl <- idx.dupl[-idx.keep]
                    idx.drop <- which(
                        names(.buffer$src.flat$flat) %in% 
                            .buffer$src.flat$index$table$name[idx.dupl]
                    )[-idx.keep]
                }
                if(handle.duplicates == "keep.first"){
                    idx.keep <- 2
                    idx.dupl <- idx.dupl[-idx.keep]
                    idx.drop <- which(
                        names(.buffer$src.flat$flat) %in% 
.buffer$src.flat$index$table$name[idx.dupl]
                    )[-idx.keep]
                }
                if(handle.duplicates == "keep.last"){
                    idx.keep <- length(idx.dupl)
                    idx.dupl <- idx.dupl[-idx.keep]
                    idx.drop <- which(
                        names(.buffer$src.flat$flat) %in% 
.buffer$src.flat$index$table$name[idx.dupl]
                    )
                    idx.drop <- idx.drop[-idx.keep]
                }
            } else {
#handle.duplicates <- expression(.buffer$src.flat$index$table$class == 
"numeric")
# TODO
# Test more thoroughly.    
                idx.dupl <- which(.buffer$src.flat$index$table$name %in% 
x.spl$name)
                idx.drop <- which(
                    names(.buffer$src.flat$flat) %in% 
.buffer$src.flat$index$table$name[idx.dupl]
                )
                idx.expr <- which(eval(handle.duplicates))
                if(!length(idx.expr)){
                    msg <- c(
                        "Duplicate processing criterion could not be matched:",
                        paste("* handle.duplicates: '", handle.duplicates, "'", 
sep="")
                    )
                    stop(cat(msg, sep="\n"))
                }
                idx.keep <- which(idx.dupl %in% idx.expr)                
                if(!length(idx.keep)){
                    msg <- c(
                        "Duplicate processing criterion could not be matched:",
                        paste("* handle.duplicates: '", handle.duplicates, "'", 
sep="")
                    )
                    stop(cat(msg, sep="\n"))
                }
                if(length(idx.keep) > 1){
                    msg <- c(
                        "Duplicate processing criterion resulted in multiple 
matches:",
                        paste("* handle.duplicates: '", handle.duplicates, "'", 
sep=""),
                        paste("* Matches: ", paste(idx.dupl[idx.keep], 
collapse=" "), sep="")
                    )
                    stop(cat(msg, sep="\n"))
                }
                idx.dupl <- idx.dupl[-idx.keep] 
                idx.drop <- idx.drop[-idx.keep]
            }
            idx.dupl.1  <- strsplit(
                .buffer$src.flat$index$table$index[idx.dupl], split=.delim.index
            )
            .buffer$counter <- 0            
            idx.dupl.1 <- lapply(idx.dupl.1, function(x){
                idx <- as.numeric(x)
                idx[length(idx)]    <- idx[length(idx)] - .buffer$counter
                .buffer$counter     <- .buffer$counter + 1
                return(idx)
            })   
            JNK <- sapply(idx.dupl.1, function(x.idx){
                .buffer$src.flat$original[[x.idx]] <- NULL
            })
            
            if(length(idx.drop)){
                .buffer$src.flat$flat <- .buffer$src.flat$flat[-idx.drop]
            }
            .buffer$src.flat$index$raw      <- 
.buffer$src.flat$index$raw[-idx.dupl]
            .buffer$src.flat$index$table    <- 
.buffer$src.flat$index$table[-idx.dupl,]
            .buffer$src.flat$index$table$duplicate <- FALSE
            rownames(.buffer$src.flat$index$table) <- NULL
            
            # FIND RIGHT BOTTOM LAYER VALUES
            idx.bottom <- which(
                .buffer$src.flat$index$table$is.bottom &
                .buffer$src.flat$index$table$name == unique(x.spl$name)
            )          
            # /
            # UPDATE INDEX IN TABLE 
            idx.index  <- strsplit(
                .buffer$src.flat$index$table$index[idx.bottom], split="-"
            )
            index.new <- sapply(idx.index, function(x){
                x[length(x)] <- 1
                paste(x, collapse=.delim.index)
            })
            .buffer$src.flat$index$table$index[idx.bottom] <<- index.new
            # /
            # UPDATE INDEX IN RAW
            tmp.paths <- unique(.buffer$src.flat$index$table$name[idx.bottom])
            
            JNK <- sapply(seq(along=tmp.paths), function(x){                 
                .buffer$src.flat$index$raw[[tmp.paths[x]]]$index <<- 
index.new[x] 
            })
            # /
        })
    }  
    
    # RETURN VALUE
    if( handle.duplicates == "index.duplicates" |
        handle.duplicates == "index.all" 
    ){
        out <- .buffer$index
    } else {
        if(do.return.flat){
            out <- .buffer$src.flat       
        } else {
            out <- .buffer$src.flat$original
        }
    }
    # /
    return(out)
}

src <- list(a=list(a.1=list(a.1.1=list(a.1.1.1=1, a.1.1.1=2, a.1.1.1=3), 
    a.1.2=1, a.1.2=2, a.1.2=3), a.2=list(a.2.1=list())), b=NULL, 
    c=data.frame(a=1:3, b=3:5))

listDuplicatesProcess(
    src=src,
    handle.duplicates="stop",
    do.return.flat=TRUE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

listDuplicatesProcess(
    src=src,
    handle.duplicates="keep.original",
    do.return.flat=TRUE,
    .delim.index="-",
    .delim.path="/"
)
listDuplicatesProcess(
    src=src,
    handle.duplicates="keep.original",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

listDuplicatesProcess(
    src=src,
    handle.duplicates="keep.first",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

listDuplicatesProcess(
    src=src,
    handle.duplicates="keep.last",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

listDuplicatesProcess(
    src=src,
    handle.duplicates="index.duplicates",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)
listDuplicatesProcess(
    src=src,
    handle.duplicates="index.all",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

src.flat <- flatten(src, do.index=TRUE)

listDuplicatesProcess(
    src=src.flat,
    handle.duplicates="keep.original",
    do.return.flat=FALSE,
    .delim.index="-",
    .delim.path="/"
)

#+++++

testData = expand.grid(letters[1:4],c(1:3))
testData$keep=F
uniqueIDS = unique(testData$Var1)
for(thisID in uniqueIDS) {
    firstCaseOnly = match(thisID,testData$Var1)
    testData[firstCaseOnly,"keep"]=T
}

(testData = testData[testData$keep==T,])

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Remove duplicate elements in lists via recursive indexing

Reply via email to