Everything works fine for me with quote="":
> system.time(gwas
<-read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv",
quote=""))
user system elapsed
4.435 0.052 4.487
> dim(gwas)
[1] 179364 38
> sessionInfo()
R version 4.0.0 Patched (2020-04-27 r78316)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS
Matrix products: default
BLAS: /home/hpages/R/R-4.0.r78316/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78316/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.0.0
On 4/30/20 04:48, Vincent Carey wrote:
This file trips up fread around record 170349, inconsistently ... I haven't
figured that out yet.
readLines, strsplit may be the ultimate solution.
On Thu, Apr 30, 2020 at 7:15 AM Vincent Carey <st...@channing.harvard.edu>
wrote:
right, line 35265 of
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_gwas_api_search_downloads_alternative&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=sJ8FryxOQ9eoMTUfGAbArTqR9f5L51ynwMntfimjbpQ&e=
has an
unclosed quote in a field.
35265 2019-04-10 30804558 Grove J 2019-02-25 Nat Genet
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.gov_pubmed_30804558&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=3yK9fsZtA_2bCHWktLA1ny1Wr7RRciU2QTOoE1xaWyg&e=
I dentification of
common genetic risk variants for autism spectrum disorder. Autism
spectrum disorder 18 ,381 European ancestry cases, 27,969
European ancestry controls 2,119 European ancestry cases, 142,379
Euro pean ancestry controls Intergenic
chr11:102751102"-? chr11:102751102 0 1 0.037
8E-6 5.096910013008056 1.1641443 [NR]
Illumina
[9112387] (imputed) N autism spectrum disorder http:/
/https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ebi.ac.uk_efo_EFO-5F0003756&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=wWA7LPEZrntrqx5SpL9Y1q5_Kzo-w1L2Ymz6P_6jf00&e=
GCST007556 Genome-wide
genotyping array
On Thu, Apr 30, 2020 at 6:59 AM Martin Morgan <mtmorgan.b...@gmail.com>
wrote:
I'd look instead at or around line 35264 for use of quotes, e.g., "3'
DNA", and change the argument read.delim(quote = "") (though I never get
that right so probably wrong again...). A comment character might also be a
problem.
If you point to the location of the file I could investigate further...
Martin
On 4/30/20, 6:55 AM, "Bioc-devel on behalf of Vincent Carey" <
bioc-devel-boun...@r-project.org on behalf of st...@channing.harvard.edu>
wrote:
The EBI GWAS catalog is large -- now the download is over 100MB for
179K
associations. A "bug" in the
package was reported, so I acquired the file by hand.
> nn =
read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv",
sep="\t")
*Warning message:*
*In scan(file = file, what = what, sep = sep, quote = quote, dec =
dec, :*
* EOF within quoted string*
> dim(nn)
[1] 35264 38
The "bug" is the number 35264 ...
>
[1]+ Stopped R
%vjcair> wc gwas_cat*tsv
179365 13243516 120140148
gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv
%vjcair> vi gwas_cat*tsv
%vjcair> fg
R
> tail(nn)
*Error: C stack usage 98161262 is too close to the limit*
*Maybe my R needs to be updated.*
*If I use data.table::fread to consume the tsv over HTTP all seems
well,
and perhaps*
*I will switch to that.*
--
The information in this e-mail is intended only for the
...{{dropped:18}}
_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=oM6e8C3QAbH860EUSfLCLlCa2Q2xqXbeOojfJo_0GDg&s=mnmrbhNqYbx1zpyO1DBuCFg14rcd8ZVFEKuCgPqfQAQ&e=
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel