Re: [R] processing time too long

2016-06-14 Thread Jim Lemon
I'm still unsure of what you are attempting to do with this data.
First, it is very sparse, appearing to be the counts of occurrences of
2567 strings, some of which are recognizable English words. I suspect
that you are trying to get something very simple like the frequency of
these strings within whatever corpus they inhabit. The code you sent
does some manipulations I can understand, others seem to be redundant
or even discarded after they are performed. For instance, you write
the result file twice, line by line. You also try to access the
element "matrixdata$ID" when as far as I can see, it doesn't exist.
That would certainly stop the script. Without knowing what is supposed
to be the result of this, it is impossible to even analyze code that
runs (for quite a few minutes) and does not appear to produce any
output..

Jim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] processing time too long

2016-06-14 Thread Jim Lemon
Hi Sashi,
Since I do not want to create a large fake data set and then
painstakingly test and debug your code, why not try your code with a
subset of the data, maybe only 400 rows. If that runs slowly, your
code is very inefficient (it looks as though it is). You can then
begin to identify where the efficiency of the code can be improved.

Jim


On Tue, Jun 14, 2016 at 10:41 PM, SHASHI SETH  wrote:
> Dear Jim,
>
> Thanks for ur suggesion. Earlier problem is solved with ur advise. My code
> is taking too long to
> execute, more than 30 hours. there are 40309 rows and 26952 columns. file
> size is 110 MB.Please guide
> me what is wrong.
>
> Shashi
> On Thu, 09 Jun 2016 14:27:17 +0530 Jim Lemon wrote
>>Hi Shashi,
>
> Without trying to go through all that code, your error is something
>
> simple. When you read in "matrixdata" right in the beginning, you are
>
> getting a data frame, not a vector or a matrix (which in some cases
>
> can be treated like a vector). That will cause trouble at some point.
>
> Another thing is that when you call this:
>
>
>
> if((sum > 0 && sums1 > 0 && sums2 > 0) != NA)
>
>
>
> you seem to be asking for the union of three multi-valued vectors (?)
>
> which will probably cause at least a warning, but the error suggests
>
> that at least one of these objects has an NA value somewhere. This
>
> might be because "dtm_500_1.CSV" (whatever that is) has NA values in
>
> it. The code is fairly obscure and I can only say that your best bet
>
> is probably to check the initial data frame for NA values and then
>
> print out the results of each step, or least
>
>
>
> cat(sum(is.na(x)),"\n")
>
>
>
> where x is the object you have just created. That should allow you to
>
> find where in the tangle of code the NAs are appearing.
>
>
>
> Jim
>
>
>
>
>
>
>
> On Thu, Jun 9, 2016 at 4:49 PM, SHASHI SETH wrote:
>
>> Hi Jim,
>
>>
>
>> I am getting the following error:
>
>> Error in if ((sum > 0 && sums1 > 0 && sums2 > 0) != NA) { :
>
>> missing value where TRUE/FALSE needed
>
>>
>
>>
>
>> I have including my code below for your review:
>
>>
>
>> fitness_1_data <- c();
>
>>
>
>> src="dtm_500_1.CSV"
>
>> matrixdata <- read.csv(src)
>
>>
>
>> #get no vector/column from file/matrix
>
>> noofvec <- length(matrixdata)
>
>>
>
>> #set no of records/rows/document
>
>> noofrecords <- length(matrixdata[,1])
>
>>
>
>> #set row index
>
>> rindex<-1;
>
>>
>
>> #preapare header
>
>> colindex<-1;
>
>> colList <- colnames(matrixdata)
>
>>
>
>> combine<-"";
>
>>
>
>> vec_fitness_data<- c();
>
>>
>
>> while(colindex <= length(colList))
>
>> {
>
>> fitness_1_data <- append(fitness_1_data,colList[colindex])
>
>>
>
>> colindex<- colindex+1
>
>> }
>
>>
>
>> #add two additional vector for percentage and cluster
>
>> fitness_1_data <- append(fitness_1_data,"percentage")
>
>> fitness_1_data <- append(fitness_1_data,"Cluster")
>
>>
>
>> write.table(as.list(fitness_1_data), file ="Result_500_cycle1.csv",append
>> =
>
>> TRUE,
>
>> row.names=FALSE, col.names=FALSE, sep=",")
>
>>
>
>> #end header record
>
>>
>
>> nestedloopindex <- 2
>
>>
>
>>
>
>> while( nestedloopindex <= noofrecords )
>
>> {
>
>>
>
>> #init of temperory variables
>
>> sums1 <- 0;
>
>> sums2 <- 0;
>
>> sum <- 0;
>
>>
>
>> #set initial index of column 2 ,coloumn one hold document no not
>
>> actual data
>
>> colindex <- 2;
>
>>
>
>> # combine <-"";
>
>>
>
>> vec1 <- c();
>
>> vec2 <- c();
>
>>
>
>> #add document number in vector
>
>> vec1 <- append(vec1,matrixdata[rindex,1]);
>
>> vec2 <- append(vec2,matrixdata[nestedloopindex,1]);
>
>>
>
>> #declaration of temp -out variable for calculation
>
>> #out <- 0;
>
>>
>
>>
>
>> while(colindex <= noofvec )
>
>> {
>
>>
>
>>
>
>> vec1 <- append(vec1,matrixdata[rindex,colindex]);
>
>> vec2 <- append(vec2,matrixdata[nestedloopindex,colindex]);
>
>>
>
>> sum = sum +
>
>> matrixdata[rindex,colindex]*matrixdata[nestedloopindex,colindex]
>
>>
>
>> sums1 <- sums1 + matrixdata[rindex,colindex]^2;
>
>>
>
>> sums2 <- sums2 + matrixdata[nestedloopindex,colindex]^2;
>
>>
>
>> colindex <- colindex+1
>
>> }
>
>>
>
>> if((sum > 0 && sums1 > 0 && sums2 > 0) != NA)
>
>> {
>
>>
>
>> out <- sum / ((sqrt(sums1) * sqrt(sums2)))
>
>> }else
>
>> {
>
>> out <-0
>
>> }
>
>>
>
>> vec1 <- append(vec1,out);
>
>> vec1 <-append(vec1, "1")
>
>> vec2 <- append(vec2, out);
>
>>
>
>>
>
>>
>
>> if(nestedloopindex==2)
>
>> {
>
>> write.table(as.list(vec1), file ="Result_500_cycle1.csv",append =
>
>> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
>
>> write.table(as.list(vec2), file ="Result_500_cycle1.csv",append =
>
>> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
>
>> nestedloopindex<- nestedloopindex+1
>
>> } else
>
>> {
>
>> write.table(as.list(vec2), file ="Result_500_cycle1.csv",append =
>
>> TRUE, row.names=FALSE, col.names=FALSE, sep=",")
>
>> nestedloopindex<- nestedloopindex+1
>
>> }
>
>>
>
>> }
>
>>
>
>>
>
>> With Best Regards,
>
>> Shashi
>
>>
>
>> On Thu, 09 Jun 2016 04:45:09 +0530 Jim Lemon wrote
>
>>>Hi John,