[Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

2014-10-12 Thread Martin, Tiphaine
Hi,


I need to create dataset BiomartGeneRegionTrack via Gviz package to run 
examples in my packages. But when I run

"R CMD check coMET", i have warning message for the checking :


 checking data for non-ASCII characters ... WARNING
  Warning: found non-ASCII strings
  '[alpha cell,acidophil cell,acinar cell,adipoblast,adipocyte,amacrine 
cell,beta cell,capsular cell,cementocyte,chief 
cell,chondroblast,chondrocyte,chromaffin cell,chromophobic 
cell,corticotroph,delta cell,dendritic cell,enterochromaffin 
cell,ependymocyte,epithelium,erythroblast,erythrocyte,fibroblast,fibrocyte,follicular
 cell,germ cell,germinal epithelium,giant cell,glial cell,glioblast,goblet 
cell,gonadotroph,granulosa cell,haemocytoblast,hair 
cell,hepatoblast,hepatocyte,hyalocyte,interstitial cell,juxtaglomerular 
cell,keratinocyte,keratocyte,lemmal cell,leukocyte,luteal cell,lymphocytic stem 
cell,lymphoid cell,lymphoid stem cell,macroglial cell,mammotroph,mast 
cell,medulloblast,megakaryoblast,megakaryocyte,melanoblast,melanocyte,mesangial 
cell,mesothelium,metamyelocyte,monoblast,monocyte,mucous neck cell,muscle 
cell,myelocyte,myeloid cell,myeloid stem cell,myoblast,myoepithelial 
cell,myofibrobast,neuroblast,neuroepithelium,neuron,odontoblast,osteoblast,osteoclast,osteocy!
 te,oxyntic cell,parafollicular cell,paraluteal cell,peptic 
cell,pericyte,phaeochromocyte,phalangeal cell,pinealocyte,pituicyte,plasma 
cell,platelet,podocyte,proerythroblast,promonocyte,promyeloblast,promyelocyte,pronormoblast,reticulocyte,retinal
 pigment epithelium,retinoblast,somatotroph,stem cell,sustentacular 
cell,teloglial cell,zymogenic cell,small cell,Th1,Cell Type,Mller 
cell,primary oocyte,Claudius' cell,Th2,follicular dendritic 
cell,astrocyte,white,T-lymphoblast,basal cell,T-lymphocyte,helper induced 
T-lymphocyte:Th2,B-lymphocyte,neutrophil,oocyte,unclassifiable (Cell 
Type),natural killer cell,helper induced T-lymphocyte,brown,CD4+,Hensen 
cell,lymphocyte,cardiac muscle cell,lymphoblast,Paneth cell,alveolar 
macrophage,macrophage,squamous cell,oligodendrocyte,smooth muscle 
cell,gamete,spermatid,Schwann cell,CD34+,spermatocyte,helper induced 
T-lymphocyte:Th1,astroblast,eosinophil,oligodendroblast,basophil,peripheral 
blood mononuclear cell,histiocyte,Sertoli cel!
 l,endothelium,granulocyte,spermatozoon,Merkel cell,skeletal muscle cel
l,thymocyte,foam cell,ovum,secondary spermatocyte,Langerhans cell,primary 
spermatocyte,transitional,Purkinje cell,Kupffer cell,secondary 
oocyte,B-lymphoblast]' in object 'biomTrack'


chrom <- "chr2"
start <- 38290160
end <- 38303219
gen <- "hg19"

  biomTrack <- BiomartGeneRegionTrack(genome = gen,
  chromosome = chr, start = start,
  end = end,  name = "ENSEMBL",
  fontcolor="black", groupAnnotation = 
"group",
  just.group = "above",showId=showId )


Do you have an idea to correct this error? I think that we need to discuss with 
EMBL to correct that, do we ?


Tiphaine



Tiphaine Martin
PhD Research Student | King's College
The Department of Twin Research & Genetic Epidemiology | Genetics & Molecular 
Medicine Division
St Thomas' Hospital
4th Floor, Block D, South Wing
SE1 7EH, London
United Kingdom

email : tiphaine.mar...@kcl.ac.uk
Fax: +44 (0) 207 188 6761

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

2014-10-12 Thread Vincent Carey
I don't know exactly how you are triggering this warning.  If you have the
ability to prefilter your content before serializing, that may be best.
The following
is from the gwascat package.  You have very little chance, I believe, of
getting an
institutional guarantee that only ascii will go into their emissions.

fixNonASCII = function(df) {
 hasNonASCII = function(x) {
   asc = iconv(x, "latin1", "ASCII")
   any(asc != x | is.na(asc))
   }
 havebad = sapply(df, function(x) hasNonASCII(x))
 if (!(any(havebad))) return(df)
 message("NOTE: input data had non-ASCII characters replaced by '*'.")
 badinds = which(havebad)
 for (i in 1:length(badinds))
   df[,badinds[i]] = iconv(df[,badinds[i]], to="ASCII", sub="*")
 df
}



On Sun, Oct 12, 2014 at 2:14 PM, Martin, Tiphaine  wrote:

> Hi,
>
>
> I need to create dataset BiomartGeneRegionTrack via Gviz package to run
> examples in my packages. But when I run
>
> "R CMD check coMET", i have warning message for the checking :
>
>
>  checking data for non-ASCII characters ... WARNING
>   Warning: found non-ASCII strings
>   '[alpha cell,acidophil cell,acinar cell,adipoblast,adipocyte,amacrine
> cell,beta cell,capsular cell,cementocyte,chief
> cell,chondroblast,chondrocyte,chromaffin cell,chromophobic
> cell,corticotroph,delta cell,dendritic cell,enterochromaffin
> cell,ependymocyte,epithelium,erythroblast,erythrocyte,fibroblast,fibrocyte,follicular
> cell,germ cell,germinal epithelium,giant cell,glial cell,glioblast,goblet
> cell,gonadotroph,granulosa cell,haemocytoblast,hair
> cell,hepatoblast,hepatocyte,hyalocyte,interstitial cell,juxtaglomerular
> cell,keratinocyte,keratocyte,lemmal cell,leukocyte,luteal cell,lymphocytic
> stem cell,lymphoid cell,lymphoid stem cell,macroglial cell,mammotroph,mast
> cell,medulloblast,megakaryoblast,megakaryocyte,melanoblast,melanocyte,mesangial
> cell,mesothelium,metamyelocyte,monoblast,monocyte,mucous neck cell,muscle
> cell,myelocyte,myeloid cell,myeloid stem cell,myoblast,myoepithelial
> cell,myofibrobast,neuroblast,neuroepithelium,neuron,odontoblast,osteoblast,osteoclast,osteocy!
>  te,oxyntic cell,parafollicular cell,paraluteal cell,peptic
> cell,pericyte,phaeochromocyte,phalangeal cell,pinealocyte,pituicyte,plasma
> cell,platelet,podocyte,proerythroblast,promonocyte,promyeloblast,promyelocyte,pronormoblast,reticulocyte,retinal
> pigment epithelium,retinoblast,somatotroph,stem cell,sustentacular
> cell,teloglial cell,zymogenic cell,small cell,Th1,Cell Type,Mller
> cell,primary oocyte,Claudius' cell,Th2,follicular dendritic
> cell,astrocyte,white,T-lymphoblast,basal cell,T-lymphocyte,helper induced
> T-lymphocyte:Th2,B-lymphocyte,neutrophil,oocyte,unclassifiable (Cell
> Type),natural killer cell,helper induced T-lymphocyte,brown,CD4+,Hensen
> cell,lymphocyte,cardiac muscle cell,lymphoblast,Paneth cell,alveolar
> macrophage,macrophage,squamous cell,oligodendrocyte,smooth muscle
> cell,gamete,spermatid,Schwann cell,CD34+,spermatocyte,helper induced
> T-lymphocyte:Th1,astroblast,eosinophil,oligodendroblast,basophil,peripheral
> blood mononuclear cell,histiocyte,Sertoli cel!
>  l,endothelium,granulocyte,spermatozoon,Merkel cell,skeletal muscle cel
> l,thymocyte,foam cell,ovum,secondary spermatocyte,Langerhans cell,primary
> spermatocyte,transitional,Purkinje cell,Kupffer cell,secondary
> oocyte,B-lymphoblast]' in object 'biomTrack'
>
>
> chrom <- "chr2"
> start <- 38290160
> end <- 38303219
> gen <- "hg19"
>
>   biomTrack <- BiomartGeneRegionTrack(genome = gen,
>   chromosome = chr, start = start,
>   end = end,  name = "ENSEMBL",
>   fontcolor="black", groupAnnotation =
> "group",
>   just.group = "above",showId=showId )
>
>
> Do you have an idea to correct this error? I think that we need to discuss
> with EMBL to correct that, do we ?
>
>
> Tiphaine
>
>
> 
> Tiphaine Martin
> PhD Research Student | King's College
> The Department of Twin Research & Genetic Epidemiology | Genetics &
> Molecular Medicine Division
> St Thomas' Hospital
> 4th Floor, Block D, South Wing
> SE1 7EH, London
> United Kingdom
>
> email : tiphaine.mar...@kcl.ac.uk
> Fax: +44 (0) 207 188 6761
>
> [[alternative HTML version deleted]]
>
> ___
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


Re: [Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

2014-10-13 Thread Hahne, Florian
Hi Tiphaine,
You can follow Vince¹s advice and transform all the data into proper ASCII
character. Or you can just get rid of the culprit (being the @biomart slot
of the object) before serialising. The easiest way to do that is:
foo@biomart <- NULL
The slot is only present to cache the BiomaRt connection, which is lost
anyways when serialising. The object is smart enough to realise that and
just reconnects the next time it is plotted. That is how I handled things
for the serialised BiomartGeneRegionTracks in Gviz.
Florian



On 12/10/14 20:35, "Vincent Carey"  wrote:

>I don't know exactly how you are triggering this warning.  If you have the
>ability to prefilter your content before serializing, that may be best.
>The following
>is from the gwascat package.  You have very little chance, I believe, of
>getting an
>institutional guarantee that only ascii will go into their emissions.
>
>fixNonASCII = function(df) {
> hasNonASCII = function(x) {
>   asc = iconv(x, "latin1", "ASCII")
>   any(asc != x | is.na(asc))
>   }
> havebad = sapply(df, function(x) hasNonASCII(x))
> if (!(any(havebad))) return(df)
> message("NOTE: input data had non-ASCII characters replaced by '*'.")
> badinds = which(havebad)
> for (i in 1:length(badinds))
>   df[,badinds[i]] = iconv(df[,badinds[i]], to="ASCII", sub="*")
> df
>}
>
>
>
>On Sun, Oct 12, 2014 at 2:14 PM, Martin, Tiphaine
>> wrote:
>
>> Hi,
>>
>>
>> I need to create dataset BiomartGeneRegionTrack via Gviz package to run
>> examples in my packages. But when I run
>>
>> "R CMD check coMET", i have warning message for the checking :
>>
>>
>>  checking data for non-ASCII characters ... WARNING
>>   Warning: found non-ASCII strings
>>   '[alpha cell,acidophil cell,acinar cell,adipoblast,adipocyte,amacrine
>> cell,beta cell,capsular cell,cementocyte,chief
>> cell,chondroblast,chondrocyte,chromaffin cell,chromophobic
>> cell,corticotroph,delta cell,dendritic cell,enterochromaffin
>> 
>>cell,ependymocyte,epithelium,erythroblast,erythrocyte,fibroblast,fibrocyt
>>e,follicular
>> cell,germ cell,germinal epithelium,giant cell,glial
>>cell,glioblast,goblet
>> cell,gonadotroph,granulosa cell,haemocytoblast,hair
>> cell,hepatoblast,hepatocyte,hyalocyte,interstitial cell,juxtaglomerular
>> cell,keratinocyte,keratocyte,lemmal cell,leukocyte,luteal
>>cell,lymphocytic
>> stem cell,lymphoid cell,lymphoid stem cell,macroglial
>>cell,mammotroph,mast
>> 
>>cell,medulloblast,megakaryoblast,megakaryocyte,melanoblast,melanocyte,mes
>>angial
>> cell,mesothelium,metamyelocyte,monoblast,monocyte,mucous neck
>>cell,muscle
>> cell,myelocyte,myeloid cell,myeloid stem cell,myoblast,myoepithelial
>> 
>>cell,myofibrobast,neuroblast,neuroepithelium,neuron,odontoblast,osteoblas
>>t,osteoclast,osteocy!
>>  te,oxyntic cell,parafollicular cell,paraluteal cell,peptic
>> cell,pericyte,phaeochromocyte,phalangeal
>>cell,pinealocyte,pituicyte,plasma
>> 
>>cell,platelet,podocyte,proerythroblast,promonocyte,promyeloblast,promyelo
>>cyte,pronormoblast,reticulocyte,retinal
>> pigment epithelium,retinoblast,somatotroph,stem cell,sustentacular
>> cell,teloglial cell,zymogenic cell,small cell,Th1,Cell
>>Type,Mller
>> cell,primary oocyte,Claudius' cell,Th2,follicular dendritic
>> cell,astrocyte,white,T-lymphoblast,basal cell,T-lymphocyte,helper
>>induced
>> T-lymphocyte:Th2,B-lymphocyte,neutrophil,oocyte,unclassifiable (Cell
>> Type),natural killer cell,helper induced T-lymphocyte,brown,CD4+,Hensen
>> cell,lymphocyte,cardiac muscle cell,lymphoblast,Paneth cell,alveolar
>> macrophage,macrophage,squamous cell,oligodendrocyte,smooth muscle
>> cell,gamete,spermatid,Schwann cell,CD34+,spermatocyte,helper induced
>> 
>>T-lymphocyte:Th1,astroblast,eosinophil,oligodendroblast,basophil,peripher
>>al
>> blood mononuclear cell,histiocyte,Sertoli cel!
>>  l,endothelium,granulocyte,spermatozoon,Merkel cell,skeletal muscle cel
>> l,thymocyte,foam cell,ovum,secondary spermatocyte,Langerhans
>>cell,primary
>> spermatocyte,transitional,Purkinje cell,Kupffer cell,secondary
>> oocyte,B-lymphoblast]' in object 'biomTrack'
>>
>>
>> chrom <- "chr2"
>> start <- 38290160
>> end <- 38303219
>> gen <- "hg19"
>>
>>   biomTrack <- BiomartGeneRegionTrack(genome = gen,
>>   chromosome = chr, start = start,
>>   end = end,  name = "ENSEMBL",
>>   fontcolor="black",
>>groupAnnotation =
>> "group",
>>   just.group =
>>"above",showId=showId )
>>
>>
>> Do you have an idea to correct this error? I think that we need to
>>discuss
>> with EMBL to correct that, do we ?
>>
>>
>> Tiphaine
>>
>>
>> 
>> Tiphaine Martin
>> PhD Research Student | King's College
>> The Department of Twin Research & Genetic Epidemiology | Genetics &
>> Molecular Medicine Division
>> St Thomas' Hospital
>> 4th Floor, Block D, South Wing
>> SE1 7EH, London
>> United Kingdom
>>
>> email : tiphaine.mar...@kcl.ac.uk
>> Fax

Re: [Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

2014-10-13 Thread Martin, Tiphaine
both methods work well. 
Thanks,
Tiphaine


From: Hahne, Florian 
Sent: 13 October 2014 08:46
To: Vincent Carey; Martin, Tiphaine
Cc: bioc-devel@r-project.org
Subject: Re: [Bioc-devel] Non-ASCII in datase from Biomart EMBL via Gviz package

Hi Tiphaine,
You can follow Vince¹s advice and transform all the data into proper ASCII
character. Or you can just get rid of the culprit (being the @biomart slot
of the object) before serialising. The easiest way to do that is:
foo@biomart <- NULL
The slot is only present to cache the BiomaRt connection, which is lost
anyways when serialising. The object is smart enough to realise that and
just reconnects the next time it is plotted. That is how I handled things
for the serialised BiomartGeneRegionTracks in Gviz.
Florian



On 12/10/14 20:35, "Vincent Carey"  wrote:

>I don't know exactly how you are triggering this warning.  If you have the
>ability to prefilter your content before serializing, that may be best.
>The following
>is from the gwascat package.  You have very little chance, I believe, of
>getting an
>institutional guarantee that only ascii will go into their emissions.
>
>fixNonASCII = function(df) {
> hasNonASCII = function(x) {
>   asc = iconv(x, "latin1", "ASCII")
>   any(asc != x | is.na(asc))
>   }
> havebad = sapply(df, function(x) hasNonASCII(x))
> if (!(any(havebad))) return(df)
> message("NOTE: input data had non-ASCII characters replaced by '*'.")
> badinds = which(havebad)
> for (i in 1:length(badinds))
>   df[,badinds[i]] = iconv(df[,badinds[i]], to="ASCII", sub="*")
> df
>}
>
>
>
>On Sun, Oct 12, 2014 at 2:14 PM, Martin, Tiphaine
>> wrote:
>
>> Hi,
>>
>>
>> I need to create dataset BiomartGeneRegionTrack via Gviz package to run
>> examples in my packages. But when I run
>>
>> "R CMD check coMET", i have warning message for the checking :
>>
>>
>>  checking data for non-ASCII characters ... WARNING
>>   Warning: found non-ASCII strings
>>   '[alpha cell,acidophil cell,acinar cell,adipoblast,adipocyte,amacrine
>> cell,beta cell,capsular cell,cementocyte,chief
>> cell,chondroblast,chondrocyte,chromaffin cell,chromophobic
>> cell,corticotroph,delta cell,dendritic cell,enterochromaffin
>>
>>cell,ependymocyte,epithelium,erythroblast,erythrocyte,fibroblast,fibrocyt
>>e,follicular
>> cell,germ cell,germinal epithelium,giant cell,glial
>>cell,glioblast,goblet
>> cell,gonadotroph,granulosa cell,haemocytoblast,hair
>> cell,hepatoblast,hepatocyte,hyalocyte,interstitial cell,juxtaglomerular
>> cell,keratinocyte,keratocyte,lemmal cell,leukocyte,luteal
>>cell,lymphocytic
>> stem cell,lymphoid cell,lymphoid stem cell,macroglial
>>cell,mammotroph,mast
>>
>>cell,medulloblast,megakaryoblast,megakaryocyte,melanoblast,melanocyte,mes
>>angial
>> cell,mesothelium,metamyelocyte,monoblast,monocyte,mucous neck
>>cell,muscle
>> cell,myelocyte,myeloid cell,myeloid stem cell,myoblast,myoepithelial
>>
>>cell,myofibrobast,neuroblast,neuroepithelium,neuron,odontoblast,osteoblas
>>t,osteoclast,osteocy!
>>  te,oxyntic cell,parafollicular cell,paraluteal cell,peptic
>> cell,pericyte,phaeochromocyte,phalangeal
>>cell,pinealocyte,pituicyte,plasma
>>
>>cell,platelet,podocyte,proerythroblast,promonocyte,promyeloblast,promyelo
>>cyte,pronormoblast,reticulocyte,retinal
>> pigment epithelium,retinoblast,somatotroph,stem cell,sustentacular
>> cell,teloglial cell,zymogenic cell,small cell,Th1,Cell
>>Type,Mller
>> cell,primary oocyte,Claudius' cell,Th2,follicular dendritic
>> cell,astrocyte,white,T-lymphoblast,basal cell,T-lymphocyte,helper
>>induced
>> T-lymphocyte:Th2,B-lymphocyte,neutrophil,oocyte,unclassifiable (Cell
>> Type),natural killer cell,helper induced T-lymphocyte,brown,CD4+,Hensen
>> cell,lymphocyte,cardiac muscle cell,lymphoblast,Paneth cell,alveolar
>> macrophage,macrophage,squamous cell,oligodendrocyte,smooth muscle
>> cell,gamete,spermatid,Schwann cell,CD34+,spermatocyte,helper induced
>>
>>T-lymphocyte:Th1,astroblast,eosinophil,oligodendroblast,basophil,peripher
>>al
>> blood mononuclear cell,histiocyte,Sertoli cel!
>>  l,endothelium,granulocyte,spermatozoon,Merkel cell,skeletal muscle cel
>> l,thymocyte,foam cell,ovum,secondary spermatocyte,Langerhans
>>cell,primary
>> spermatocyte,transitional,Purkinje cell,Kupffer cell,secondary
>> oocyte,B-lymphoblast]' in object 'biomTrack'
>>
>>
>> chrom <- "chr2"
>> start <- 38290160
>> end <