Re: [R-sig-phylo] read.nexus.data parser

2013-04-09 Thread Johan Nylander
Dear All,

Just to avoid confusion, the readNexus function is in the phylobase
package. And as Ben pointed out, other packages have their own functions
for reading the data part from a nexus-formatted file, see e.g., read.nex
in phyloch.

On a related note, I wrote read.nexus.data as a temporary, crude parsing
function while waiting for the phylobase project to take off (phylobase
uses NCL by Lewis  Holder - _the_ nexus parser), so expect
read.nexus.data to have it's limitations.

Furthermore, if speed is the concern, it would perhaps be preferable to
first convert the Nexus data to Fasta, and then use one of the many
fast(er) parsers implemented in numerous R packages.

Cheers
Johan


On 04/07/2013 02:59 PM, Ben Bolker wrote: On 13-04-05 01:29 PM, Jessica
Sabo wrote:
 Hi All,

 I am wondering if there is anyway to increase the speed of the
 read.nexus.data parser. Or if there is an alternative that is a
 faster nexus file data parser.

 THanks, Jess


I don't know if it's faster or not, but there is ?readNexus in the
 'ape' package.  Also see library(sos); findFn(read {nexus format})

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] read.nexus.data parser

2013-04-09 Thread Klaus Schliep
Hi all,

I had a short look in the code and found some bits to speed the
read.nexus.data function up. I added Emmanuel on the list so he may
can put it into the next ape release if it does work.
Generally I agree with Johan that if speed matters fasta files are the
way to go. Nexus files are ugly to parse and contain many
inconsistencies, like parameters inside comments [].

Regards,
Klaus


On 4/9/13, Johan Nylander johan.nylan...@abc.se wrote:
 Dear All,

 Just to avoid confusion, the readNexus function is in the phylobase
 package. And as Ben pointed out, other packages have their own functions
 for reading the data part from a nexus-formatted file, see e.g., read.nex
 in phyloch.

 On a related note, I wrote read.nexus.data as a temporary, crude parsing
 function while waiting for the phylobase project to take off (phylobase
 uses NCL by Lewis  Holder - _the_ nexus parser), so expect
 read.nexus.data to have it's limitations.

 Furthermore, if speed is the concern, it would perhaps be preferable to
 first convert the Nexus data to Fasta, and then use one of the many
 fast(er) parsers implemented in numerous R packages.

 Cheers
 Johan


 On 04/07/2013 02:59 PM, Ben Bolker wrote: On 13-04-05 01:29 PM, Jessica
 Sabo wrote:
 Hi All,

 I am wondering if there is anyway to increase the speed of the
 read.nexus.data parser. Or if there is an alternative that is a
 faster nexus file data parser.

 THanks, Jess


I don't know if it's faster or not, but there is ?readNexus in the
 'ape' package.  Also see library(sos); findFn(read {nexus format})

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/


 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 
Klaus Schliep
Phylogenomics Lab at the University of Vigo, Spain

read.nexus.data - function (file) 
{
  find.ntax - function(x) {
for (i in 1:NROW(x)) {
  if (any(f - grep(\\bntax, x[i], ignore.case = TRUE))) {
ntax - as.numeric(sub((.+?)(ntax\\s*\\=\\s*)(\\d+)(.+), 
   \\3, x[i], perl = TRUE, ignore.case = TRUE))
break
  }
}
ntax
  }
  find.nchar - function(x) {
for (i in 1:NROW(x)) {
  if (any(f - grep(\\bnchar, x[i], ignore.case = TRUE))) {
nchar - as.numeric(sub((.+?)(nchar\\s*\\=\\s*)(\\d+)(.+), 
\\3, x[i], perl = TRUE, ignore.case = TRUE))
break
  }
}
nchar
  }
  find.matrix.line - function(x) {
for (i in 1:NROW(x)) {
  if (any(f - grep(\\bmatrix\\b, x[i], ignore.case = TRUE))) {
matrix.line - as.numeric(i)
break
  }
}
matrix.line
  }
  trim.whitespace - function(x) {
gsub(\\s+, , x)
  }
  trim.semicolon - function(x) {
gsub(;, , x)
  }
  X - scan(file = file, what = character(), sep = \n, quiet = TRUE, 
comment.char = [, strip.white = TRUE)
  ntax - find.ntax(X)
  nchar - find.nchar(X)
  matrix.line - find.matrix.line(X)
  start.reading - matrix.line + 1
  Obj - vector(list, ntax)
  for(i in 1:ntax)Obj[[i]] = rep(NA, nchar)
  
  i - 1
  pos - 0
  tot.nchar - 0
  tot.ntax - 0
  for (j in start.reading:NROW(X)) {
Xj - trim.semicolon(X[j])
if (Xj == ) {
  break
}
if (any(jtmp - grep(\\bend\\b, X[j], perl = TRUE, ignore.case = TRUE))) {
  break
}
ts - unlist(strsplit(Xj, (?=\\S)(\\s+)(?=\\S), perl = TRUE))
#browser()
if (length(ts)  2) {
  stop(nexus parser does not handle spaces in sequences or taxon names (ts2))
}
if (length(ts) != 2) {
  stop(nexus parser failed to read the sequences (ts!=2))
}
Seq - trim.whitespace(ts[2])
Name - trim.whitespace(ts[1])
nAME - paste(c(\\b, Name, \\b), collapse = )

if (any(l - grep(nAME, names(Obj {
  tsp - strsplit(Seq, NULL)[[1]]
  
  Obj[[l]][pos + c(1:length(tsp))] - tsp
  chars.done - length(tsp)   
  
}
else {
  names(Obj)[i] - Name
  tsp - strsplit(Seq, NULL)[[1]]
  
  Obj[[i]][pos + c(1:length(tsp))] - tsp
  chars.done - length(tsp)  
  
}
tot.ntax - tot.ntax + 1
if (tot.ntax == ntax) {
  i - 1
  tot.ntax - 0
  tot.nchar - tot.nchar + chars.done
  if (tot.nchar == nchar * ntax) {
print(ntot was more than nchar*ntax)
break
  }
  pos - tot.nchar
}
else {
  i - i + 1
}
  }
  if (tot.ntax != 0) {
cat(ntax:, ntax, differ from actual number of taxa in file?\n)
stop(nexus parser did not read names correctly (tot.ntax!=0))
  }
  for (i in 1:length(Obj)) {
if (length(Obj[[i]]) != nchar) {
  cat(names(Obj[i]), has, length(Obj[[i]]), characters\n)
  

Re: [R-sig-phylo] read.nexus.data parser

2013-04-07 Thread Ben Bolker
On 13-04-05 01:29 PM, Jessica Sabo wrote:
 Hi All,
 
 I am wondering if there is anyway to increase the speed of the 
 read.nexus.data parser. Or if there is an alternative that is a
 faster nexus file data parser.
 
 THanks, Jess
 

  I don't know if it's faster or not, but there is ?readNexus in the
'ape' package.  Also see library(sos); findFn(read {nexus format})

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] read.nexus.data parser

2013-04-05 Thread Jessica Sabo
Hi All,

I am wondering if there is anyway to increase the speed of the
read.nexus.data parser. Or if there is an alternative that is a faster
nexus file data parser.

THanks,
Jess

-- 

Jessica L. Sabo
js...@ufl.edu
sabo.j...@gmail.com

PhD Student, The Edward Braun Lab
Department of Biology, Zoology Program
University of Florida
P.O. Box 118525
Gainesville, FL 32611

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/