Everything works fine for me with quote="":

> system.time(gwas <-read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv", quote=""))
   user  system elapsed
  4.435   0.052   4.487

> dim(gwas)
[1] 179364     38

> sessionInfo()
R version 4.0.0 Patched (2020-04-27 r78316)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-4.0.r78316/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78316/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.0



On 4/30/20 04:48, Vincent Carey wrote:
This file trips up fread around record 170349, inconsistently ... I haven't
figured that out yet.
readLines, strsplit may be the ultimate solution.

On Thu, Apr 30, 2020 at 7:15 AM Vincent Carey <st...@channing.harvard.edu>
wrote:

right, line 35265 of
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_gwas_api_search_downloads_alternative&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=sJ8FryxOQ9eoMTUfGAbArTqR9f5L51ynwMntfimjbpQ&e=
  has an
unclosed quote in a field.

  35265 2019-04-10      30804558        Grove J 2019-02-25      Nat Genet
     
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.gov_pubmed_30804558&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=3yK9fsZtA_2bCHWktLA1ny1Wr7RRciU2QTOoE1xaWyg&e=
     I       dentification of
common genetic risk variants for autism spectrum disorder.    Autism
spectrum disorder        18       ,381 European ancestry cases, 27,969
European ancestry controls       2,119 European ancestry cases, 142,379
Euro       pean ancestry controls                               Intergenic

chr11:102751102"-?      chr11:102751102 0                       1       0.037
   8E-6    5.096910013008056                      1.1641443       [NR]    
Illumina
[9112387] (imputed)    N       autism spectrum disorder        http:/
   
/https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_efo_EFO-5F0003756&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=wWA7LPEZrntrqx5SpL9Y1q5_Kzo-w1L2Ymz6P_6jf00&e=
     GCST007556      Genome-wide
genotyping array

On Thu, Apr 30, 2020 at 6:59 AM Martin Morgan <mtmorgan.b...@gmail.com>
wrote:

I'd look instead at or around line 35264 for use of quotes, e.g., "3'
DNA", and change the argument read.delim(quote = "") (though I never get
that right so probably wrong again...). A comment character might also be a
problem.

If you point to the location of the file I could investigate further...

Martin

On 4/30/20, 6:55 AM, "Bioc-devel on behalf of Vincent Carey" <
bioc-devel-boun...@r-project.org on behalf of st...@channing.harvard.edu>
wrote:

     The EBI GWAS catalog is large -- now the download is over 100MB for
179K
     associations.  A "bug" in the
     package was reported, so I acquired the file by hand.

     > nn =
read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv",
     sep="\t")

     *Warning message:*

     *In scan(file = file, what = what, sep = sep, quote = quote, dec =
dec,  :*

     *  EOF within quoted string*

     > dim(nn)

     [1] 35264    38


     The "bug" is the number 35264 ...


     >

     [1]+  Stopped                 R

     %vjcair> wc gwas_cat*tsv

       179365 13243516 120140148
     gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv

     %vjcair> vi gwas_cat*tsv

     %vjcair> fg

     R


     > tail(nn)

     *Error: C stack usage  98161262 is too close to the limit*


     *Maybe my R needs to be updated.*


     *If I use data.table::fread to consume the tsv over HTTP all seems
well,
     and perhaps*

     *I will switch to that.*

     --
     The information in this e-mail is intended only for the
...{{dropped:18}}

     _______________________________________________
     Bioc-devel@r-project.org mailing list
     
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=mnmrbhNqYbx1zpyO1DBuCFg14rcd8ZVFEKuCgPqfQAQ&e=




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to