from:"G . Maubach"

[R] WG: Fw: Re: rmarkdown and font size

2017-06-13 Thread G . Maubach

Hi Dan,
Hi All,

I read the below post. I am wondering how do I know which "keys" are 
available, e.g. "core.r" and "pre". Where kind I find the definition of 
what can be adjusted and which "words" to use?

Kind regards

Georg


> Gesendet: Donnerstag, 08. Juni 2017 um 16:16 Uhr
> Von: "Nordlund, Dan (DSHS/RDA)" 
> An: "MacQueen, Don" , "r-help@r-project.org" 

> Betreff: Re: [R] rmarkdown and font size
>
> You can change the style, modifying a variety of things.  E.g,
> 
> ---
> title: Test
> ---
> 
> 
> 
> body{ /* Normal  */
>   font-size: 12px;
>   }
> td {  /* Table  */
>   font-size: 8px;
> }
> h1.title {
>   font-size: 38px;
>   color: DarkRed;
> }
> h1 { /* Header 1 */
>   font-size: 28px;
>   color: DarkBlue;
> }
> h2 { /* Header 2 */
> font-size: 22px;
>   color: DarkBlue;
> }
> h3 { /* Header 3 */
>   font-size: 18px;
>   font-family: "Times New Roman", Times, serif;
>   color: DarkBlue;
> }
> code.r{ /* Code block */
> font-size: 12px;
> }
> pre { /* Code block - determines code spacing between lines */
> font-size: 14px;
> }
> 
> 
> Here is some normal text.  It is a 12-point font.  The table is in 
8-point . 
> 
> ```{r example, echo=FALSE, results='asis'}
> tmp <- data.frame(a=1:5, b=letters[1:5])
> print( knitr::kable(tmp, row.names=FALSE))
> ```
> 
> 
> Hope this is helpful,
> 
> Dan
> 
> Daniel Nordlund, PhD
> Research and Data Analysis Division
> Services & Enterprise Support Administration
> Washington State Department of Social and Health Services
> 
> 
> > -Original Message-
> > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> > MacQueen, Don
> > Sent: Wednesday, June 07, 2017 4:58 PM
> > To: r-help@r-project.org
> > Subject: [R] rmarkdown and font size
> > 
> > Suppose I have a file (named "tmp.rmd") containing:
> > 
> > 
> > ---
> > title: Test
> > ---
> > 
> > ```{r example, echo=FALSE, results='asis'}
> > tmp <- data.frame(a=1:5, b=letters[1:5])
> > print( knitr::kable(tmp, row.names=FALSE))
> > ```
> > 
> > 
> > 
> > And I render it with:
> > 
> > rmarkdown::render('tmp.rmd',
> > output_format=c('html_document','pdf_document'))
> > 
> > I get two files:
> >   tmp.pdf
> >   tmp.html
> > 
> > Is there a way to control (change or specify) the font size of the 
table in the
> > pdf output?
> > (or of the entire document, if it can't be changed for just the table)
> > 
> > With my actual data, the table is too wide to fit on a page in the pdf 
output;
> > perhaps if I reduce the font size I can get it to fit.
> > 
> > I would like the html version to still look decent, but I don't care 
very much
> > what happens to its font size.
> > 
> > Thanks!
> > -Don
> > 
> > --
> > Don MacQueen
> > 
> > Lawrence Livermore National Laboratory
> > 7000 East Ave., L-627
> > Livermore, CA 94550
> > 925-423-1062
> > 
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Re: Paths in knitr

2017-06-13 Thread G . Maubach

Hi Yihui,

I took root.dir and base.dir out. Everything works fine despite the 
change.

I have implemented the solution Duncun suggested. I have difficulties with 
the scaling / image size in my report. Some plots are too big, some are 
too small. I need to adjust any plot. Steep learning curve :)

Kind regards

Georg




Von:Yihui Xie 
An: g.maub...@weinwolf.de, 
Kopie:  Duncan Murdoch , R Help 

Datum:  12.06.2017 18:29
Betreff:Re: Re: [R] Paths in knitr
Gesendet von:   xieyi...@gmail.com



Will there be anything wrong if you do not set these options?

Regards,
Yihui
--
https://yihui.name


On Mon, Jun 12, 2017 at 2:24 AM,   wrote:
> Hi Yihui,
> Hi Duncan,
>
> I corrected my typo. Unfortunately knitr did not find my plots in the
> directory where they reside which is different from the Rmd document.
>
> The documentation of knitr says:
>
> base.dir: (NULL) an absolute directory under which the plots are 
generate
> root.dir: (NULL) the root directory when evaluating code chunks; if 
NULL,
> the directory of the input document will be used
>
> From that description I thought, if the base.dir can be used for writng
> plots, it is then also used for reading plots if set? No, it is not.
> If I set the root directory to the plots/graphics directory will knitr
> then find my plots? No, it does not.
>
> Reading blog posts my thoughts looked not so strange to me, e.g.
> https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/
.
> Unfortunately, it does not work for me.
>
> I am using a RStudio project file. Could it be that this interferes 
which
> the knitr options?
>
> I tried the solution that Duncan suggested:
>
> c_path_plots <-
> "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics
>
> `r knitr::include_graphics(file.path(c_path_plots,
> "email_distribution_pie.png"))`
>
> This solution works fine. I will go with it for this project as I have 
to
> finish my report soon.
>
> I read Hadley's book on bulding R Packages (
> https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found
> it quite complicated and time consuming to build one. Thus I did not try
> yet to build my own packages. At the end of last week I heard from 
another
> library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make 
building
> packages much easier. I plan to try that shortly.
>
> On my path to become better in analytics using R, I will try to use
> modules of Rmd files which can then easily be integrated into a Rmd
> report. I have yet to see how I can include these file into a complete
> report.
>
> Kind regards
>
> Georg
>
>
> - Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47
> -
>
> Von:Yihui Xie 
> An: g.maub...@gmx.de,
> Kopie:  R Help 
> Datum:  09.06.2017 20:53
> Betreff:Re: [R] Paths in knitr
> Gesendet von:   "R-help" 
>
>
>
> I'd say it is an expert-only option. If you do not understand what it
> means, I strongly recommend you not to set it.
>
> Similarly, you set the root_dir option and I don't know why you did it,
> but
> it is a typo anyway (should be root.dir).
>
> Regards,
> Yihui
> --
> https://yihui.name
>
> On Fri, Jun 9, 2017 at 4:50 AM,  wrote:
>
>> Hi Yi,
>>
>> many thanks for your reply.
>>
>> Why I do have to se the base.dir option? Cause, to me it is not clear
> from
>> the documentation, where knitr looks for data files and how I can 
adjust
>> knitr to tell it where to look. base.dir was a try, but did not work.
>>
>> Can you give me a hint where I can find information/documentation on
> this
>> path issue?
>>
>> Kind regards
>>
>> Georg
>>
>>
>> > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr
>> > Von: "Yihui Xie" 
>> > An: g.maub...@weinwolf.de
>> > Cc: "R Help" 
>> > Betreff: Re: [R] Paths in knitr
>> >
>> > Why do you have to set the base.dir option?
>> >
>> > Regards,
>> > Yihui
>> > --
>> > https://yihui.name
>> >
>> >
>> > On Thu, Jun 8, 2017 at 6:15 AM,   wrote:
>> > > Hi All,
>> > >
>> > > I have to compile a report for the management and decided to use
>> RMarkdown
>> > > and knitr. I compiled all needed plots (using separate R scripts)
>> before
>> > > compiling the report, thus all plots reside in my graphics
> directory.
>> The
>> > > RMarkdown report needs to access these files. I have defined
>> > >
>> > > ```{r setup, include = FALSE}
>> > > knitr::opts_knit$set(
>> > >   echo = FALSE,
>> > >   xtable.type = "html",
>> > >   base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
>> > >   root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
>> > >   fig.path = "results/graphics")  # relative path required, see
>> > > http://yihui.name/knitr/options
>> > > ```
>> > >
>> > > and then referenced my plot using
>> > >
>> > > 
>> > >
>> >

Re: [R] Paths in knitr

2017-06-12 Thread G . Maubach

Hi Yihui,
Hi Duncan,

I corrected my typo. Unfortunately knitr did not find my plots in the 
directory where they reside which is different from the Rmd document.

The documentation of knitr says:

base.dir: (NULL) an absolute directory under which the plots are generate
root.dir: (NULL) the root directory when evaluating code chunks; if NULL, 
the directory of the input document will be used

>From that description I thought, if the base.dir can be used for writng 
plots, it is then also used for reading plots if set? No, it is not.
If I set the root directory to the plots/graphics directory will knitr 
then find my plots? No, it does not.

Reading blog posts my thoughts looked not so strange to me, e.g. 
https://philmikejones.wordpress.com/2015/05/20/set-root-directory-knitr/. 
Unfortunately, it does not work for me.

I am using a RStudio project file. Could it be that this interferes which 
the knitr options?

I tried the solution that Duncan suggested:

c_path_plots <- 
"H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics

`r knitr::include_graphics(file.path(c_path_plots, 
"email_distribution_pie.png"))`

This solution works fine. I will go with it for this project as I have to 
finish my report soon.

I read Hadley's book on bulding R Packages (
https://www.amazon.de/R-Packages-Hadley-Wickham/dp/1491910593) and found 
it quite complicated and time consuming to build one. Thus I did not try 
yet to build my own packages. At the end of last week I heard from another 
library (http://reaktanz.de/R/pckg/roxyPackage/) which shall make building 
packages much easier. I plan to try that shortly.

On my path to become better in analytics using R, I will try to use 
modules of Rmd files which can then easily be integrated into a Rmd 
report. I have yet to see how I can include these file into a complete 
report.

Kind regards

Georg

- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 12.06.2017 08:47 
-

Von:Yihui Xie 
An: g.maub...@gmx.de, 
Kopie:  R Help 
Datum:  09.06.2017 20:53
Betreff:Re: [R] Paths in knitr
Gesendet von:   "R-help" 

I'd say it is an expert-only option. If you do not understand what it
means, I strongly recommend you not to set it.

Similarly, you set the root_dir option and I don't know why you did it, 
but
it is a typo anyway (should be root.dir).

Regards,
Yihui
--
https://yihui.name

On Fri, Jun 9, 2017 at 4:50 AM,  wrote:

> Hi Yi,
>
> many thanks for your reply.
>
> Why I do have to se the base.dir option? Cause, to me it is not clear 
from
> the documentation, where knitr looks for data files and how I can adjust
> knitr to tell it where to look. base.dir was a try, but did not work.
>
> Can you give me a hint where I can find information/documentation on 
this
> path issue?
>
> Kind regards
>
> Georg
>
>
> > Gesendet: Donnerstag, 08. Juni 2017 um 15:05 Uhr
> > Von: "Yihui Xie" 
> > An: g.maub...@weinwolf.de
> > Cc: "R Help" 
> > Betreff: Re: [R] Paths in knitr
> >
> > Why do you have to set the base.dir option?
> >
> > Regards,
> > Yihui
> > --
> > https://yihui.name
> >
> >
> > On Thu, Jun 8, 2017 at 6:15 AM,   wrote:
> > > Hi All,
> > >
> > > I have to compile a report for the management and decided to use
> RMarkdown
> > > and knitr. I compiled all needed plots (using separate R scripts)
> before
> > > compiling the report, thus all plots reside in my graphics 
directory.
> The
> > > RMarkdown report needs to access these files. I have defined
> > >
> > > ```{r setup, include = FALSE}
> > > knitr::opts_knit$set(
> > >   echo = FALSE,
> > >   xtable.type = "html",
> > >   base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
> > >   root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
> > >   fig.path = "results/graphics")  # relative path required, see
> > > http://yihui.name/knitr/options
> > > ```
> > >
> > > and then referenced my plot using
> > >
> > > 
> > >
> > > because I want to be able to customize the plotting attributes.
> > >
> > > But that fails with the message "pandoc.exe: Could not fetch
> > > email_distribution_pie.png".
> > >
> > > If I give it the absolute path
> > > "H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/
> graphics/email_distribution_pie.png"
> > > it works fine as well if I copy the plot into the directory where 
the
> > > report.RMD file resides.
> > >
> > > How can I tell knitr to fetch the ready-made plots from the graphics
> > > directory?
> > >
> > > Kind regards
> > >
> > > Georg
>

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible

[R] Paths in knitr

2017-06-08 Thread G . Maubach

Hi All,

I have to compile a report for the management and decided to use RMarkdown 
and knitr. I compiled all needed plots (using separate R scripts) before 
compiling the report, thus all plots reside in my graphics directory. The 
RMarkdown report needs to access these files. I have defined

```{r setup, include = FALSE}
knitr::opts_knit$set(
  echo = FALSE,
  xtable.type = "html",
  base.dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
  root_dir = "H:/2017/Analysen/Kundenzufriedenheit/Auswertung",
  fig.path = "results/graphics")  # relative path required, see 
http://yihui.name/knitr/options
```

and then referenced my plot using



because I want to be able to customize the plotting attributes.

But that fails with the message "pandoc.exe: Could not fetch 
email_distribution_pie.png".

If I give it the absolute path 
"H:/2017/Analysen/Kundenzufriedenheit/Auswertung/results/graphics/email_distribution_pie.png"
 
it works fine as well if I copy the plot into the directory where the 
report.RMD file resides. 

How can I tell knitr to fetch the ready-made plots from the graphics 
directory?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] purrr::pmap does not work

2017-06-07 Thread G . Maubach

Hi All,

I try to do a scatterplot for a bunch of variables. I plot a dependent 
variable against a bunch of independent variables:

-- cut --
graphics::plot(
  v01_r01 ~ v08_01_up11,
  data = dataset,
  xlab = "Dependent",
  ylab = "Independent #1"
)

-- cut --

It is tedious to repeat the statement for all independent variables. Found 
an alternative, i.e. :

-- cut --

mu <- list(5, 10, -3)
sigma <- list(1, 5, 10)
n <- list(1, 3, 5)
fargs <- list(mean = mu, sd = sigma, n = n)
fargs %>%
  purrr::pmap(rnorm) %>%
  str()

-- cut --

I tried to use this for may scatterplot task:

-- cut --

var_battery$v08 <- paste0("v08_", formatC(1:8, width = 2, format = "d", 
flag = "0"))
v08_var_labs <- paste0("Label_", 1:8)

dataset <- as.data.frame(
  matrix(
data = sample(
  x = 1:11,
  size = 90,
  replace = TRUE),
nrow = 10,
ncol = 9))
names(dataset) <- c("v01_r01", var_battery$v08)

independent <- as.list(dataset$v01_r01)
dependent <- as.list(dataset[var_battery$v08])

fargs <- list(
  x = independent,
  y = dependent,
  ylab = v08_var_labs)

fargs %>% 
  purrr::pmap(
function(d = dataset, xvalue = x, yvalue = y,
 xlab = "Label for x variable",
 ylab = ylab) {
  graphics::plot(
xvalue ~ yvalue,
data = d,
xlab = xlab,
ylab = ylab)
}
  )

-- cut --

The last statement comes back with

Error: Element 2 has length 8, not 1 or 10.

How can I get it up n running? Do you suggest a better solution for the 
task described?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot: Pie Chart with correct labels

2017-05-30 Thread G . Maubach

Hi All,

I would like to do the following pie chart using ggplot from an official 
data source (
http://www.deutscheweine.de/fileadmin/user_upload/Website/Service/Downloads/Statistik_2016-2017-neu.pdf
, Tab 8, Page 14):

-- cut --

cat("# weinimport_piechart.R\n")


# -- Input 

d_wine_import_DE <- structure(list(Land = structure(1:24, .Label = 
c("Italien", "Frankreich", 
 "Spanien", "USA", 
"Südafrika", "Chile", "Österreich", "Australien", 
 "Portugal", 
"Griechenland", "Argentinien", "Neuseeland", "Ungarn", 
 "Mazedonien", "Schweiz", 
"Dänemark", "Moldawien", "Türkei", "Belgien/Luxemburg", 
 "Rumänien", "Ukraine", 
"Kroatien", "Israel", "Georgien"), class = "factor"), 
   Menge_hl_2015 = c(5481000, 2248000, 3824000, 493000, 
845000, 
 539000, 308000, 446000, 153000, 99000, 
64000, 43000, 123000, 
 186000, 5000, 9000, 28000, 7000, 1, 
15000, 4000, 4000, 
 2000, 2000)), .Names = c("Land", 
"Menge_hl_2015"), class = "data.frame", row.names = c(NA, 
  -24L))
names(d_wine_import_DE)

# -- Data -

d_result <- data.frame(
  country = d_wine_import_DE$Land,
  abs = d_wine_import_DE$Menge_hl_2015) %>%
  mutate(rel = round(abs / sum(abs) * 100, 1)) %>%
  dplyr::arrange(desc(abs)) %>%
  dplyr::mutate(rel_labs = paste(rel, "%")) %>%  # rev() does not work
  dplyr::mutate(breaks = cumsum(abs) - (abs / 2))  # rev() does not work

# -- Plot -

d_result %>%
   ggplot() +
   geom_bar(
 aes(x = "", y = abs, fill = country),
 stat = "identity") +
   # %SOURCE%
   # coord_polar(): Wickham: ggplot2, Springer, 2nd Ed., p. 166
   coord_polar(theta = "y", start = 0) +
   guides(
 fill = guide_legend(
   title = "Länder",
   reverse = FALSE)
   ) +
   scale_y_continuous(
 breaks = d_result$breaks,  # simply "breaks" does not work
 labels = d_result$rel_labs,  # simply "breaks" does not work
 trans = "reverse"
   ) +
   # %SOURCE%
   # Kassambra: Guide to Create Beautiful Graphics
   # in R, sthda.com, 2nd Ed., 2013, p. 136ff
   theme_minimal() +
   theme(
 panel.border = element_blank(),
 panel.grid = element_blank(),
 axis.title.x = element_blank(),
 axis.title.y = element_blank()
 # axis.text.x = element_text(size = 15)
   ) +
   labs(
 title = paste0("Weinimport nach Deutschland 2015"))

-- cut --

I can not figure out how to align the labels (values in %) with the 
reverse printed countries. Also the breaks and labels do need the dataset 
name although I thought "breaks" and "rel_labs" is sufficient due to the 
piping operator.

Can you help me by telling how to

1. get the order of the labels right
2. Why I need to reference "breaks" and "labels" completely?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Off-Topic: Project Organisation

2017-05-11 Thread G . Maubach

Hi All,

this post is somewhat off-topic cause it deals with a meta issue related to 
project organisation instead of real R code.

I have updated my blog concerning a possible directory and file structure for 
marketing research projects and data mining projects alike:

https://github.com/gmaubach/R-Know-How/wiki/R-Blog

There I condensed best practices already communicated in articels, books, 
packages and guidelines into a new universial structure. It shall serve as a 
template and construction kit which you can use to create a structure that 
suits your project best.

Comments and suggestions are welcome.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Antwort: Re: Factors and Alternatives (SOLVED)

2017-05-09 Thread G . Maubach

Hi David,
Hi Bob,

many thanks for your help.

Your solution - just to use all levels instead of just the one's found in 
the data - helped.

The original code looked like this:

-- cut --

c_v10_val_labs <- c(
  "1 = sehr gut",
  "2", "3", "4", "5",
  "6 = sehr schlecht"
)

# where c_v10_val_labs is handed over to my function as "val_labs".

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),  # 
old code
 labels = sort(unique(val_labs)))

-- cut --

If I write instead

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = seq_along(val_labs),  # new code 1st 
version
 labels = sort(unique(val_labs)))

-- cut --

Your solution builds a factor with all factor levels even if a value for 
factor is not present (not NA, but does just not occur in the data, i.e. 
not stated by any respondent).

In Zumel's book "Practical Data Science with R" (
https://www.amazon.de/Practical-Data-Science-Nina-Zumel/dp/1617291560), 
Shelter Island: Manning, 2014, p. 23-24, Listing 2-5, a mapping using 
subscripts is described:

-- cut --

mapping <- list(
'A40'='car (new)',
'A41'='car (used)',
'A42'='furniture/equipment',
'A43'='radio/television',
'A44'='domestic appliances',
...
)

for(i in 1:(dim(d))[2]) {
if(class(d[,i])=='character') {
d[,i] <- as.factor(as.character(mapping[d[,i]]))
}
}

-- cut -

Simple stated this would mean:

-- cut --

val_labs <- list(
  "1" = "1 = sehr gut",
  "2" = "2",
  "3" = "3",
  "4" = "4",
  "5" = "5",
  "6" = "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- factor(unlist(val_labs[answers]))

# or just

val_labs <- c(
  "1 = sehr gut",
  "2",
  "3",
  "4",
  "5",
  "6 = sehr schlecht"
)

set.seed(12345)
answers = c(sample(1:5, 10, replace = TRUE))

test <- val_labs[answers]

-- cut --

Adapting this to my code would give:

-- cut --

  ds_results$value <- factor(ds_results$value,
 levels = sort(unique(ds_results$value)),
 labels = 
val_labs[sort(unique(ds_results$value))])  # new code 2nd version

-- cut --

This results in a factor just as long as the vector of unique resulting 
values.

Both solutions work. Which version is best depends on the overall process 
and the purpose of the code. I document all this for use by readers who 
refer later to the list archives.

Using your version and running my code reveals that ggplot runs into 
difficulties cause the legend lacks values and the sequence and coloring 
of the legend is wrong. But that's another story.

Many thanks again for your help.

Kind regards

Georg




Von:David L Carlson 
An: "g.maub...@weinwolf.de" , "Bob O'Hara" 
, 
Kopie:  r-help 
Datum:  09.05.2017 14:37
Betreff:RE: [R] Antwort: Re:  Factors and Alternatives



I'm not sure I understand your question, but you can easily include all 
possible answers when you create the factor by using the levels= argument 
as Bob pointed out. Here is an example of values that range from 1 to 6, 
but value 3 is not represented. Notice that a factor level 3 is created 
even though it does not appear in the data:

> set.seed(42)
> x <- sample.int(6, 10, replace=TRUE)
> table(x)
x
1 2 4 5 6 
1 1 3 3 2 
> y <- factor(x, levels=1:6)
> y
 [1] 6 6 2 5 4 4 5 1 4 5
Levels: 1 2 3 4 5 6

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 13:58
Betreff:Re: Re: [R] Factors and Alternatives



For the problem you state, would it be enough to explicitly define your 
levels?

fac <- rep(c("a", "b", "d"), each=4)
fac.f <- factor(fac, levels=c("a", "b", "c", "d"))
table(fac.f)

# but be warned...
fac.f2 <- factor(fac.f)
table(fac.f2)

This has the advantage that the code explicitly documents what the
possible values are, so if something goes wrong down-stream, you know
it is a real problem (well, unless you have some type conversions
screwing things up). You might also want to do some defensive
programming, and put some checks in the code, to make sure your
factors have the right number of levels.

Bob

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of 
g.maub...@weinwolf.de
Sent: Tuesday, May 9, 2017 6:37 AM
To: Bob O'Hara 
Cc: r-help 
Subject: [R] Antwort: Re: Factors and Alternatives

Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 

I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose

[R] Antwort: Re: Factors and Alternatives

2017-05-09 Thread G . Maubach

Hi Bob,

many thanks for your reply.

I have read the documentation. In my current project I use "item 
batteries" for dimensions of touchpoints which are rated by our customers. 
I wrote functions to analyse them. If I create a factor before filtering 
and analysing I lose the original values of the variable. If I use the 
original variable for filtering and analysis I might happen that for some 
dimensions values were not selected. This means they are not NA but none 
of the respondents chose "4" for instance on a scale from 1 to 6. That 
means that creating a factor from the analysed data with the complete 
scale (1:6) fails due the different vector length (amount of remaining 
unique values in the analysis vs values in the scale). As I have a 
function doing the analysis I am looking for a way to make my function 
robust to such circumstances and be able to use it to analyse all "item 
batteries". Thus my question. I believe my findings are not odd. Maybe 
there is a way dealing with that kind of problems in R and I am eager to 
learn how it can be solved using R.

What would you suggest?

Kind regards

Georg

Von:"Bob O'Hara" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  09.05.2017 12:26
Betreff:Re: [R] Factors and Alternatives

That's easy! First
> str(test3)
 Factor w/ 2 levels "WITHOUT Contact",..: 2 2 2 2 1 1 1 1 1 1

tells you that the internal values are 1 and 2, and the labels are
"WITHOUT Contact" and "WITH Contact". If you read the help page for
factor() you'll see this:

levels: an optional vector of the values (as character strings) that
  ‘x’ might have taken.  The default is the unique set of
  values taken by ‘as.character(x)’, sorted into increasing
  order _of ‘x’_.  Note that this set can be specified as
  smaller than ‘sort(unique(x))’.

  labels: _either_ an optional character vector of (unique) labels for
  the levels (in the same order as ‘levels’ after removing
  those in ‘exclude’), _or_ a character string of length 1.

So, when you create test3 you say that test can take values 0 and 1,
and these should be labelled as "WITHOUT Contact" and "WITH Contact".
So R internally codes "1" as 1 and "0" as 2 (internally R codes
factors as integers, which can be both useful and dangerous), and then
gives them labels "WITHOUT Contact" and "WITH Contact". It now doesn't
care that they were 1 and 0, because you've told it to change the
labels.

If you want to filter by the original values, then don't change the
labels (or at least not until after you've filtered by the original
labels), or convert the filter to the new labels. You're asking for a
data structure with two sets of labels, which sounds odd in general.

Bob

On 9 May 2017 at 12:12,   wrote:
> Hi All,
>
> I am using factors in a study for the social sciences.
>
> I discovered the following:
>
> -- cut --
>
> library(dplyr)
>
> test1 <- c(rep(1, 4), rep(0, 6))
> d_test1 <- data.frame(test)
>
> test2 <- factor(test1)
> d_test2 <- data.frame(test2)
>
> test3 <- factor(test1,
> levels = c(0, 1),
> labels = c("WITHOUT Contact", "WITH Contact"))
> d_test3 <- data.frame(test3)
>
> d_test1 %>% filter(test1 == 0)  # works OK
> d_test2 %>% filter(test2 == 0)  # works OK
> d_test3 %>% filter(test3 == 0)  # does not work, why?
>
> myf <- function(ds) {
>   print(levels(ds$test3))
>   print(labels(ds$test3))
>   print(as.numeric(ds$test3))
>   print(as.character(ds$test3))
> }
>
> # This showsthat it is not possible to access the original
> # values which were the basis to build the factor:
> myf(d_test3)
>
> -- cut --
>
> Why is it not possible to use a factor with labels for filtering with 
the
> original values?
> Is there a data structure that works like a factor but gives also access
> to the original values?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Factors and Alternatives

2017-05-09 Thread G . Maubach

Hi All,

I am using factors in a study for the social sciences.

I discovered the following:

-- cut --

library(dplyr)

test1 <- c(rep(1, 4), rep(0, 6))
d_test1 <- data.frame(test)

test2 <- factor(test1)
d_test2 <- data.frame(test2)

test3 <- factor(test1, 
levels = c(0, 1),
labels = c("WITHOUT Contact", "WITH Contact"))
d_test3 <- data.frame(test3)

d_test1 %>% filter(test1 == 0)  # works OK
d_test2 %>% filter(test2 == 0)  # works OK
d_test3 %>% filter(test3 == 0)  # does not work, why?

myf <- function(ds) {
  print(levels(ds$test3))
  print(labels(ds$test3))
  print(as.numeric(ds$test3))
  print(as.character(ds$test3))
}

# This showsthat it is not possible to access the original
# values which were the basis to build the factor:
myf(d_test3)

-- cut --

Why is it not possible to use a factor with labels for filtering with the 
original values?
Is there a data structure that works like a factor but gives also access 
to the original values?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)

2017-04-26 Thread G . Maubach

Hi Bert,

many thanks for your reply. I appreciate your help a lot.

I would like to do the operation (= finding the duplicates) row-wise.

During this night a solution showed up in my dreams :) Instead of using 
duplicates() to flag and filter the values I could use unique instead with 
the same result. I tested:

# -- cut --

apply(X = c05_xx_r01, MARGIN = 1, unique)

# -- cut --

This finds the unique values for each row. That is nice but lacks the 
requirement that I need a dataframe with a set of variables back that is 
as long as the total amount of unique values for the complete 
data.frame/matrix or the amount of variable of the original data.frame 
respectively.

The result of the above operation gives a list instead of a data.frame due 
to the fact that the amount of resulting values vary from 1 to 7. 
Therefore no data.frame but a list is returned.

I search the web for a solution and found:

http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame

The complete solution would then look like:

# -- cut --

library(stringi)
library(tidyverse)
my_list <- apply(c05_xx_r01, MARGIN = 1, unique)
my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE)
# DONE !

# -- cut --

All-in-all thanks again for your help.

Kind regards

Georg

P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 
1) does not do the job, cause this looks for unique combinations of values 
on all columns. But that is not the desired outcome.




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Kopie:  R-help 
Datum:  25.04.2017 19:10
Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate 
Codes



If I understand you correctly, one way is:

> z <- rep(LETTERS[1:3],4)
> z
 [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
> z[!duplicated(z)]
[1] "A" "B" "C"


?duplicated

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 25, 2017 at 9:36 AM,   wrote:
> Hi All,
>
> in my current project I am working with multiple-response questions
> (MRSets):
>
> -- Coding --
> 100 Main Code 1
> 110 Sub Code 1.1
> 120 Sub Code 1.2
> 130 Sub Code 1.3
>
> 200 Main Code 2
> 210 Sub Code 2.1
> 220 Sub Code 2.2
> 230 Sub Code 2.3
>
> 300 Main Code 3
> 310 Sub Code 3.1
> 320 Sub Code 3.2
>
> The coding for the variables is to detailed. Therefore I have recoded 
all
> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>
> Now it happens that some respondents get several times the same main 
code.
> If the coding was done for respondent 1 with 120 and 130 after recoding
> the values are 100 and 100. If I count this, it would mean that I weight
> the multiple values of this respondent by factor 2. This is not my aim. 
I
> would like to count the 100 for the respective respondent only once.
>
> Here is my script so far:
>
> # -- cut --
>
> library(expss)
>
> d_sample <-
>   structure(
> list(
>   c05_01 = c(
> 110,
> 110,
> 130,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 999,
> 110,
> 495,
> 160,
> 110,
> 410
>   ),
>   c05_02 = c(NA,
>  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>  NA, 130),
>   c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>  NA, NA, NA, NA, NA, NA, NA),
>   c05_04 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   ),
>   c05_05 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   )
> ),
> .Names = c("c05_01",
>"c05_02", "c05_03", "c05_04", "c05_05"),
> row.names = c(
>   "1",
>   "2",
>   "3",
>   "4",
>   "5",
>   "10",
>   "11",
>   "12",
>   "13",
>   "14",
>   "15",
>   "20",
>   "21",
>   "22",
>   "23",
>   "24",
>   "25"
> ),
> class = "data.frame"
>   )
>
> c05_xx_r01 <- d_sample %>%
>   select(starts_with("c05_")) %>%
>   recode(c(
> 110 %thru% 195 ~ 100,
> 210 %thru% 295 ~ 200,
>

[R] Antwort: Re: Multiple-Response Analysis: Cleaning of Duplicate Codes (SOLVED)

2017-04-26 Thread G . Maubach

Hi Bert,

many thanks for your reply. I appreciate your help a lot.

I would like to do the operation (= finding the duplicates) row-wise.

During this night a solution showed up in my dreams :) Instead of using 
duplicates() to flag and filter the values I could use unique instead with 
the same result. I tested:

# -- cut --

apply(X = c05_xx_r01, MARGIN = 1, unique)

# -- cut --

This finds the unique values for each row. That is nice but lacks the 
requirement that I need a dataframe with a set of variables back that is 
as long as the total amount of unique values for the complete 
data.frame/matrix or the amount of variable of the original data.frame 
respectively.

The result of the above operation gives a list instead of a data.frame due 
to the fact that the amount of resulting values vary from 1 to 7. 
Therefore no data.frame but a list is returned.

I search the web for a solution and found:

http://stackoverflow.com/questions/15753091/convert-mixed-length-named-list-to-data-frame

The complete solution would then look like:

# -- cut --

library(stringi)
library(tidyverse)
my_list <- apply(c05_xx_r01, MARGIN = 1, unique)
my_tibble <- as_tibble(stringi::stri_list2matrix(my_list, byrow = TRUE)
# DONE !

# -- cut --

All-in-all thanks again for your help.

Kind regards

Georg

P.S: I had a look into ?unique. The statement "unique(c05_xx_r01, MARGIN = 
1) does not do the job, cause this looks for unique combinations of values 
on all columns. But that is not the desired outcome.




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Kopie:  R-help 
Datum:  25.04.2017 19:10
Betreff:Re: [R] Multiple-Response Analysis: Cleaning of Duplicate 
Codes



If I understand you correctly, one way is:

> z <- rep(LETTERS[1:3],4)
> z
 [1] "A" "B" "C" "A" "B" "C" "A" "B" "C" "A" "B" "C"
> z[!duplicated(z)]
[1] "A" "B" "C"


?duplicated

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Apr 25, 2017 at 9:36 AM,   wrote:
> Hi All,
>
> in my current project I am working with multiple-response questions
> (MRSets):
>
> -- Coding --
> 100 Main Code 1
> 110 Sub Code 1.1
> 120 Sub Code 1.2
> 130 Sub Code 1.3
>
> 200 Main Code 2
> 210 Sub Code 2.1
> 220 Sub Code 2.2
> 230 Sub Code 2.3
>
> 300 Main Code 3
> 310 Sub Code 3.1
> 320 Sub Code 3.2
>
> The coding for the variables is to detailed. Therefore I have recoded 
all
> sub codes to the respective main code, e.g. all 110, 120 and 130 to 100,
> all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.
>
> Now it happens that some respondents get several times the same main 
code.
> If the coding was done for respondent 1 with 120 and 130 after recoding
> the values are 100 and 100. If I count this, it would mean that I weight
> the multiple values of this respondent by factor 2. This is not my aim. 
I
> would like to count the 100 for the respective respondent only once.
>
> Here is my script so far:
>
> # -- cut --
>
> library(expss)
>
> d_sample <-
>   structure(
> list(
>   c05_01 = c(
> 110,
> 110,
> 130,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 110,
> 999,
> 110,
> 495,
> 160,
> 110,
> 410
>   ),
>   c05_02 = c(NA,
>  NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA,
> 170,
>  NA, 130),
>   c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
>  NA, NA, NA, NA, NA, NA, NA),
>   c05_04 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   ),
>   c05_05 = c(
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_,
> NA_real_
>   )
> ),
> .Names = c("c05_01",
>"c05_02", "c05_03", "c05_04", "c05_05"),
> row.names = c(
>   "1",
>   "2",
>   "3",
>   "4",
>   "5",
>   "10",
>   "11",
>   "12",
>   "13",
>   "14",
>   "15",
>   "20",
>   "21",
>   "22",
>   "23",
>   "24",
>   "25"
> ),
> class = "data.frame"
>   )
>
> c05_xx_r01 <- d_sample %>%
>   select(starts_with("c05_")) %>%
>   recode(c(
> 110 %thru% 195 ~ 100,
> 210 %thru% 295 ~ 200,
>

[R] Multiple-Response Analysis: Cleaning of Duplicate Codes

2017-04-25 Thread G . Maubach

Hi All,

in my current project I am working with multiple-response questions 
(MRSets):

-- Coding --
100 Main Code 1
110 Sub Code 1.1
120 Sub Code 1.2
130 Sub Code 1.3

200 Main Code 2
210 Sub Code 2.1
220 Sub Code 2.2
230 Sub Code 2.3

300 Main Code 3
310 Sub Code 3.1
320 Sub Code 3.2

The coding for the variables is to detailed. Therefore I have recoded all 
sub codes to the respective main code, e.g. all 110, 120 and 130 to 100, 
all 210, 220 and 230 to 200 and all 310, 320 and 330 to 300.

Now it happens that some respondents get several times the same main code. 
If the coding was done for respondent 1 with 120 and 130 after recoding 
the values are 100 and 100. If I count this, it would mean that I weight 
the multiple values of this respondent by factor 2. This is not my aim. I 
would like to count the 100 for the respective respondent only once.

Here is my script so far:

# -- cut --

library(expss)

d_sample <-
  structure(
list(
  c05_01 = c(
110,
110,
130,
110,
110,
110,
110,
110,
110,
110,
110,
999,
110,
495,
160,
110,
410
  ),
  c05_02 = c(NA,
 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
 NA, 130),
  c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
 NA, NA, NA, NA, NA, NA, NA),
  c05_04 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_05 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  )
),
.Names = c("c05_01",
   "c05_02", "c05_03", "c05_04", "c05_05"),
row.names = c(
  "1",
  "2",
  "3",
  "4",
  "5",
  "10",
  "11",
  "12",
  "13",
  "14",
  "15",
  "20",
  "21",
  "22",
  "23",
  "24",
  "25"
),
class = "data.frame"
  )

c05_xx_r01 <- d_sample %>%
  select(starts_with("c05_")) %>%
  recode(c(
110 %thru% 195 ~ 100,
210 %thru% 295 ~ 200,
310 %thru% 395 ~ 300,
410 %thru% 495 ~ 400,
510 %thru% 595 ~ 500,
810 %thru% 895 ~ 800,
910 %thru% 999 ~ 900))
names(c05_xx_r01) <- paste0("c05_0", 1:5, "_r01")
d_sample <- cbind(d_sample, c05_xx_r01)

# -- cut --

I would like to eliminate all duplicates codes, e. g. 100 and 100 for 
respondents in row 3, 6, 13, 14 and 15 to 100 only once:

# -- cut --
d_sample_1 <-
  structure(
list(
  c05_01 = c(
110,
110,
130,
110,
110,
110,
110,
110,
110,
110,
110,
999,
110,
495,
160,
110,
410
  ),
  c05_02 = c(NA,
 NA, 120, NA, NA, 150, NA, NA, 170, 160, NA, NA, NA, NA, 
170,
 NA, 130),
  c05_03 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 410,
 NA, NA, NA, NA, NA, NA, NA),
  c05_04 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_05 = c(
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_,
NA_real_
  ),
  c05_01_r01 = c(
100,
100,
100,
100,
100,
100,
100,
100,
100,
100,
100,
900,
100,
400,
100,
100,
400
  ),
  c05_02_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, 100),
  c05_03_r01 = c(NA, NA,
 NA, NA, NA, NA, NA, NA, NA, 400, NA, NA, NA, NA, NA, 
NA, NA),
  c05_04_r01 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA),
  c05_05_r01 = c(NA, NA, NA, NA, NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)
),
.Names = c(
  "c05_01",
  "c05_02",
  "c05_03",
  "c05_04",
  "c05_05",
  "c05_01_r01",
  "c05_02_r01",
  "c05_03_r01",

[R] Follow-up: RStudio: Place for Storing Options (as plain text)

2017-04-19 Thread G . Maubach

Hi All,

some time ago I asded a question about the places where RStudio stores it 
configuration information. I came across this posting

https://support.rstudio.com/hc/en-us/articles/206382178?version=1.0.136=desktop

explaining RStudio keybindings (predefined and customized). At the end of 
the article is the information that RStudio stores keybindings in

~/.R/rstudio/keybindings/rstudio_commands.json
~/.R/rstudio/keybindings/editor_commands.json

I want to share this with you.

Kind regards

Georg

- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 19.04.2017 10:10 
-

Von:Georg Maubach/WWBO/WW/HAW
An: R-help mailing list , 
Kopie:  Martin Maechler , Jeff Newmiller 

Datum:  08.03.2017 08:59
Betreff:Follow-up: [R] RStudio: Place for Storing Options (as 
plain text)

Hi All,

I got a late reply from RStudio Support concerning the question where 
RStudio store options and configurations:

-- cut --

The post RStudio Config Files has a new comment. 
. . .
Unfortunately, it's unlikely that we'll be able to provide a programmatic 
R interface in the near future -- the way we lay out and store RStudio's 
client state does not make it as amenable to public consumption as we 
might hope.
That said, you can generally copy everything within that folder to a new 
machine (at the same relative path from the user home directory), and 
expect preferences to be respected + restored as you might expect.
. . .
--cut --

The result of the discussion is:

We can copy the complete RStudio directory for storing options and 
configurations under

%localappdata%\RStudio-Desktop or 
C:\Users\\AppData\Local\RStudio-Desktop

and copy it completely to a new installation of RStudio.

A programmatic approach to edit RStudio options and configurations is not 
possible due to design decisions.

The purpose of the initial question was to find a way to save RStudio 
options and configurations, e g. on git/github or similar. This is 
possible by initialising the above given directory with git or similar.

An open question is what happens if a new RStudio release makes changes to 
the options and configurations. If the stored directory can be completely 
used would need additional clearification, i.e. for each new version.

Kind regards

Georg

Von:Martin Maechler 
An: 
Kopie:   ,
Datum:  23.02.2017 08:37
Betreff:Re: [R] RStudio: Place for Storing Options

> Jeff Newmiller 
> on Sat, 11 Feb 2017 08:09:36 -0800 writes:

> For the record, then, Google listened to my incantation of
> "rstudio configuration file" and the second result was:

> 
https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State

> RStudio Desktop is also open source, so you can download
> the source code and look at the operating-system-specific
> bits (for "where") if the above link goes out of date or
> disappears.

Thanks a lot, Jeff!

And for the archives:  On reasonable OS's,  the hidden
directory/folder containing all the info is
  ~/.rstudio-desktop/
and if "things are broken" the recommendation is to rename that
   mv ~/.rstudio-desktop  ~/backup-rstudio-desktop
and (zip and) send along with your e-mail to the experts for diagnosis.

> On Thu, 9 Feb 2017, Martin Maechler wrote:

>> 
>>> Ulrik Stervbo  on Thu, 9
>>> Feb 2017 14:37:57 + writes:
>> 
>> > Hi Georg, > maybe someone here knows, but I think you
>> are more likely to get answers to > Rstudio related
>> questions with RStudio support: >
>> https://support.rstudio.com/hc/en-us
>> 
>> > Best, > Ulrik
>> 
>> Indeed, thank you, Ulrik.
>> 
>> In this special case, however, I'm quite sure many
>> readers of R-help would be interested in the answer; so
>> once you receive an answer, please post it (or a link to
>> a public URL with it) here on R-help, thank you in
>> advance.
>> 
>> We would like to be able to *save*, or sometimes *set* /
>> *reset* such options "in a scripted manner", e.g. for
>> controlled exam sessions.
>> 
>> Martin Maechler, ETH Zurich
>> 
>> > On Thu, 9 Feb 2017 at 12:35 
>> wrote:
>> 
>> >> Hi All, >> I would like to make a backup of my RStudio
>> IDE options I configure using >> "Tools/Global Options"
>> from the menu bar. Searching the >> web did not reveal
>> anything.
>> 
>> >> Can you tell me where RStudio IDE does store its
>> configuration?
>> 
>> >> Kind regards >> Georg
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do

[R] ggplot2: ..n.. and ..count.. in geom_text

2017-04-18 Thread G . Maubach

Hi All,

I have the following code:

-- cut 

(g03_02_p02 <- ggplot(data = d_kzb_input) +
  geom_bar(
mapping = aes(x = v03_02_r01, y = round(..prop.. * 100, 0)),
fill = c_ww_palette["blue"]) +
  scale_y_continuous(limits = c(0, c_y_limit)) +
  theme_classic() +
  ggtitle(paste0("Question 3",
"(n = ", <>, ")")) +  # How can I refer to the number of cases 
for this plot? Is there something like "..n.."?
  xlab("Orders") +
  ylab("Percent") +
  geom_text(
aes(label = ..count..),  # How can I refer to the counts for the 
labels of the columns?
color = "white",
position = position_stack(vjust = 0.5)))

-- cut --

I would like to refer to the internal statistics of the geom_bar():

How can I refer to the number of cases for this plot? Is there something 
like "..n.."?
How can I refer to the counts for the labels of the columns?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color (SOLVED)

2017-04-11 Thread G . Maubach

Hi David,

many thanks for your answer.

I followed your suggesting and came up with the following code:

-- cut --

ggplot(
  d_result,
  aes(x = variable, y = n, fill = value)) +
  geom_bar(
stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  coord_flip() +
  scale_y_continuous(name = "Percent") +
  scale_fill_manual(
values = rev(
  c(
"forestgreen", "limegreen",
"gold", "orange1",
"tomato3", "darkred"))) +
  ggtitle(
paste(
  "Question 8: Some Text")) +
  labs(fill = "Rating") +
  scale_x_discrete(
name = element_blank(),
drop = FALSE) +  # keep factor levels if no value exists
  geom_text(
aes(label = n),
color = "white",
position = position_stack(vjust = 0.5)) +
  theme_minimal() +
  theme(
legend.position = "right") +
  guides(fill = guide_legend(reverse = TRUE))

-- cut --

In addition to your suggestion I changed "fill = rev(factor(value))" to 
"fill = value" and I added

guides(fill = guide_legend(reverse = TRUE))

to get the legend in the order from 1 .. 6 instead of 6 .. 1.

In my data I added the counts (n) before the mean value in the labels of 
the left hand side. Now it looks to me as a version conforming to the 
ESOMAR and BVM standards.

Many thanks again for your help.

Kind regards

Georg




Von:David Winsemius 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  10.04.2017 22:21
Betreff:Re: [R] Antwort: Re: Antwort: Re: Way to Plot Multiple 
Variables and Change Color




> On Apr 10, 2017, at 1:06 PM, David Winsemius  
wrote:
> 
> 
>> On Apr 10, 2017, at 7:45 AM, g.maub...@weinwolf.de wrote:
>> 
>> Hi Ulrik,
>> 
>> many thanks for your reply. I had to take an unplanned break and was 
not 
>> in the office during the last two weeks. Thus my late reply.
>> 
>> I followed your advice and converted the variable in argument "fill" to 

>> factor. Now the color change works:
>> 
>> -- cut --
>> 
>> d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 
(ø = 3.3) ",
>>   "Item 1 (ø = 3.3) ", "Item 1 (ø = 
3.3) ", "Item 1 (ø = 3.3) ",
>>   "Item 1 (ø = 3.3) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
>>   "Item 2 (ø = 3.8) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
>>   "Item 2 (ø = 3.8) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
>>   "Item 3 (ø = 3.4) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
>>   "Item 3 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
>>   "Item 4 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
>>   "Item 4 (ø = 3.4) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
>>   "Item 5 (ø = 3.5) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
>>   "Item 5 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
>>   "Item 6 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
>>   "Item 6 (ø = 3.5) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
>>   "Item 7 (ø = 3.4) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
>>   "Item 7 (ø = 3.4) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
>>   "Item 8 (ø = 3.3) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
>>   "Item 8 (ø = 3.3) "), value = 
>> structure(c(1L, 2L, 3L, 4L, 5L,
>>   6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
>>   4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
>>   2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very 

>> satisfied",
>>   "2", "3", 

>> "4", "5", "6 = very dissatified"), class = "factor"),
>>  n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, 
>> 21L, 20L, 14L,
>>23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, 
>> 17L, 15L, 16L, 20L,
>>12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, 
>> 15L, 18L, 18L, 16L,
>>17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, 
>> 25L, 16L, 17L, 8L,
>>20L)), .Names = c("variable", "value", 
>> "n"), row.names =
>>   c(NA,
>> -48L), vars = list("variable"), drop = TRUE, 
>> indices =
>>   list(0:5,
>>6:11, 12:17, 18:23, 24:29, 30:35, 36:41, 
>> 42:47),
>> group_sizes = c(6L,
>> 6L, 6L, 6L, 6L, 6L, 6L, 6L),
>>

[R] Antwort: Re: Antwort: Re: Way to Plot Multiple Variables and Change Color

2017-04-10 Thread G . Maubach

Hi Ulrik,

many thanks for your reply. I had to take an unplanned break and was not 
in the office during the last two weeks. Thus my late reply.

I followed your advice and converted the variable in argument "fill" to 
factor. Now the color change works:

-- cut --

d_result <- structure(list("variable" = c("Item 1 (ø = 3.3) ", "Item 1 (ø 
= 3.3) ",
"Item 1 (ø = 3.3) ", "Item 1 (ø = 
3.3) ", "Item 1 (ø = 3.3) ",
"Item 1 (ø = 3.3) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
"Item 2 (ø = 3.8) ", "Item 2 (ø = 
3.8) ", "Item 2 (ø = 3.8) ",
"Item 2 (ø = 3.8) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
"Item 3 (ø = 3.4) ", "Item 3 (ø = 
3.4) ", "Item 3 (ø = 3.4) ",
"Item 3 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
"Item 4 (ø = 3.4) ", "Item 4 (ø = 
3.4) ", "Item 4 (ø = 3.4) ",
"Item 4 (ø = 3.4) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
"Item 5 (ø = 3.5) ", "Item 5 (ø = 
3.5) ", "Item 5 (ø = 3.5) ",
"Item 5 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
"Item 6 (ø = 3.5) ", "Item 6 (ø = 
3.5) ", "Item 6 (ø = 3.5) ",
"Item 6 (ø = 3.5) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
"Item 7 (ø = 3.4) ", "Item 7 (ø = 
3.4) ", "Item 7 (ø = 3.4) ",
"Item 7 (ø = 3.4) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
"Item 8 (ø = 3.3) ", "Item 8 (ø = 
3.3) ", "Item 8 (ø = 3.3) ",
"Item 8 (ø = 3.3) "), value = 
structure(c(1L, 2L, 3L, 4L, 5L,
6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L,
2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("1 = very 
satisfied",
"2", "3", 
"4", "5", "6 = very dissatified"), class = "factor"),
   n = c(14L, 20L, 24L, 14L, 16L, 14L, 9L, 15L, 
21L, 20L, 14L,
 23L, 19L, 17L, 16L, 14L, 16L, 20L, 22L, 
17L, 15L, 16L, 20L,
 12L, 19L, 15L, 16L, 15L, 18L, 19L, 18L, 
15L, 18L, 18L, 16L,
 17L, 17L, 20L, 17L, 17L, 14L, 16L, 16L, 
25L, 16L, 17L, 8L,
 20L)), .Names = c("variable", "value", 
"n"), row.names =
c(NA,
  -48L), vars = list("variable"), drop = TRUE, 
indices =
list(0:5,
 6:11, 12:17, 18:23, 24:29, 30:35, 36:41, 
42:47),
  group_sizes = c(6L,
  6L, 6L, 6L, 6L, 6L, 6L, 6L),
  biggest_group_size = 6L,
  labels = structure(list(
"variable" = structure(1:8, .Label = c("Item 1 (ø 
= 3.3) ",
 "Item 2 (ø = 
3.8) ", "Item 3 (ø = 3.4) ", "Item 4 (ø = 3.4) ",
 "Item 5 (ø = 
3.5) ", "Item 6 (ø = 3.5) ", "Item 7 (ø = 3.4) ",
 "Item 8 (ø = 
3.3) "), class = "factor")),
row.names = c(NA,
  -8L), class = "data.frame", vars = 
list("variable"),
drop = TRUE, .Names = "variable"),
  class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))

ggplot(
  d_result,
  aes(x = variable, y = n, fill = rev(factor(value +
  geom_bar(
stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  coord_flip() +
  scale_y_continuous(name = "Percent") +
  scale_fill_manual(
values = rev(
  c(
"forestgreen", "limegreen",
"gold", "orange1",
"tomato3", "darkred"))) +
  ggtitle(
paste(
  "Question 8: Satisfaction?")) +
  labs(fill = "Rating") +
  scale_x_discrete(
name = element_blank()) +
  # scale_color_manual(
  #   values = rev(
  # c(
  #   "forestgreen", "limegreen",
  #   "gold", "orange1",
  #   "tomato3", "darkred"))) +
  geom_text(
aes(label = n),
color = "white",
position = position_stack(vjust = 0.5)) +
  theme_minimal() +
  theme(
legend.position = "right")

-- cut --

I tried to change the order of the items on the y-axis,  e.g. Item 8 
should be last and Item 1 first. I tried to reverse the order of the items 
within ggplot using rev()

Re: [R] Archive format

2017-04-08 Thread G . Maubach

Hi Joe,

I have read your question with great interest. I am a little bit astonished to 
read about your project. There is a big national institute in Germany called 
GESIS 
(https://de.wikipedia.org/wiki/GESIS_%E2%80%93_Leibniz-Institut_f%C3%BCr_Sozialwissenschaften)
 which does the same job you are trying to set-up since 1986 now. You could try 
to exchange ideas with them.

Your subject is very complex with regard to reproducible research. You might 
want to have a look at

(1) https://cran.r-project.org/web/views/ReproducibleResearch.html
(2) Gandrud, Christopher: Reproducible Research with R and R Studio 
(https://www.amazon.com/Reproducible-Research-Studio-Second-Chapman/dp/1498715370)

Kind regards

Georg

> Gesendet: Mittwoch, 29. März 2017 um 10:44 Uhr
> Von: "Joe Gain" 
> An: R-help@r-project.org
> Cc: bwfdm-i...@lists.kit.edu
> Betreff: [R] Archive format
>
> Hello,
> 
> we are collecting information on the subject of research data management 
> in German on the webplatform:
> 
> www.forschungsdaten.info
> 
> One of the topics, which we are writing about, is how to *archive* data. 
> Unfortunately, none of us in the project is an expert with respect to R 
> and so I would like to ask the list, what they recommend? A related 
> question is to do with the sharing of data. We have already asked some 
> academics, who have basically replied that they don't really know other 
> than to strongly recommend a plain text format.
> 
> We would also like to know, if members of the list recommend converting 
> formats from commercial software such as S-Plus, Terr, SPSS etc. to an 
> R-compatible format for long term archivation? Are there any general 
> rules and best practices, when it comes to archiving (and sharing) 
> statistical data and statistical programs?
> 
> Any comments would be much appreciated!
> Joe
> 
> -- 
> B 1003
> Kommunikations-, Informations-, Medienzentrum (KIM)
> Universitaet Konstanz
> 
> t: ++49-7531-883234
> e: joe.g...@uni-konstanz.de
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Way to Plot Multiple Variables and Change Color

2017-03-28 Thread G . Maubach

Hi Ulrik,

your answer is very valuable to me. If you do not know what I do, others 
don't either. So I should definitely adapt my code.

The result of your code and my code is the same. Thus, I use your code 
cause it is better readable.

My other question was how I can change the color palette for the stacked 
bars. Could you give me a hint where I need to look in ggplot2 
documentation?

Kind regards

Georg




Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  28.03.2017 16:35
Betreff:Re: [R] Way to Plot Multiple Variables and Change Color



Hi Georg,

I am a little unsure of what you want to do, but maybe this:

mdf <- melt(dfr)
d_result <- mdf  %>%
  dplyr::group_by(variable, value) %>%
  summarise(n = n())

ggplot(
  d_result,
  aes(variable, y = n, fill = value)) +
  geom_bar(stat = "identity") 

HTH
Ulrik

On Tue, 28 Mar 2017 at 15:11  wrote:
Hi All,

in my current project I have to plot a whole bunch of related variables
(item batteries, e.g. How do you rate ... a) Accelaration, b) Horse Power,
c) Color Palette, etc.) which are all rated on a scale from 1 .. 4.

I need to present the results as stacked bar charts where the variables
are columns and the percentages of the scales values (1 .. 4) are the
chunks of the stacked bar for each variable. To do this I have transformed
my data from wide to long and calculated the percentage for each variable
and value. The code for this is as follows:

-- cut --

dfr <- structure(
  list(
v07_01 = c(3, 1, 1, 4, 3, 4, 4, 1, 3, 2, 2, 3,
   4, 4, 4, 1, 1, 3, 3, 4),
v07_02 = c(1, 2, 1, 1, 2, 1, 4, 1, 1,
   4, 4, 1, 4, 4, 1, 3, 2, 3, 3, 1),
v07_03 = c(3, 2, 2, 1, 4, 1,
   2, 3, 3, 1, 4, 2, 3, 1, 4, 1, 4, 2, 2, 3),
v07_04 = c(3, 1, 1,
   4, 2, 4, 4, 2, 2, 2, 4, 1, 2, 1, 3, 1, 2, 4, 1, 4),
v07_05 = c(1,
   2, 2, 2, 4, 4, 1, 1, 4, 4, 2, 1, 2, 1, 4, 1, 2, 4, 1, 4),
v07_06 = c(1,
   2, 1, 2, 1, 1, 3, 4, 3, 2, 2, 3, 3, 2, 4, 2, 3, 1, 4, 3),
v07_07 = c(3,
   2, 3, 3, 1, 1, 3, 3, 4, 4, 1, 3, 1, 3, 2, 4, 1, 2, 3, 4),
v07_08 = c(3,
   2, 1, 2, 2, 2, 3, 3, 4, 4, 1, 1, 1, 2, 3, 1, 4, 2, 2, 4),
cased_id = structure(
  1:20,
  .Label = c(
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"10",
"11",
"12",
"13",
"14",
"15",
"16",
"17",
"18",
"19",
"20"
  ),
  class = "factor"
)
  ),
  .Names = c(
"v07_01",
"v07_02",
"v07_03",
"v07_04",
"v07_05",
"v07_06",
"v07_07",
"v07_08",
"cased_id"
  ),
  row.names = c(NA, -20L),
  class = c("tbl_df", "tbl",
"data.frame")
)

mdf <- melt(df)
d_result <- mdf  %>%
  dplyr::group_by(variable) %>%
  count(value)

ggplot(
  d_result,
  aes(variable, y = n, fill = value)) +
  geom_bar(stat = "identity") +
  coord_cartesian(ylim = c(0,100))

-- cut --

Is there an easier way of doing this, i. e. a way without need to
transform the data?

How can I change the colors for the data points 1 .. 4?

I tried

-- cut --

  d_result,
  aes(variable, y = n, fill = value)) +
  geom_bar(stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  scale_fill_manual(values = RColorBrewer::brewer.pal(4, "Blues"))

-- cut -

but this does not work cause I am mixing continuous and descrete values.

How can I change the colors for the bars?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Way to Plot Multiple Variables and Change Color

2017-03-28 Thread G . Maubach

Hi Richard,

many thanks for your reply.

Your solution is not exactly what I was looking for. I would like to know 
how I can change the colors of the stacked bars in my plot and not use the 
default values. How can this be done?

Kind regards

Georg




Von:"Richard M. Heiberger" 
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  28.03.2017 17:40
Betreff:Re: [R] Way to Plot Multiple Variables and Change Color



I think you are looking for the likert function in the HH package.
>From ?likert


Diverging stacked barcharts for Likert, semantic differential, rating
scale data, and population pyramids.


This will get you started.  Much more fine control is available.  See
the examples and demo.

## install.packages("HH") ## if not yet on your system.

library(HH)

AA <- dfr[,-9]

labels <- sort(unique(as.vector(data.matrix(AA
result.template <- integer(length(labels))
names(result.template) <- labels

BB <- apply(AA, 2, function(x, result=result.template) {
  tx <- table(x)
  result[names(tx)] <- tx
  result
}
)

BB

likert(t(BB), ReferenceZero=0, horizontal=FALSE)


On Tue, Mar 28, 2017 at 6:05 AM,   wrote:
> Hi All,
>
> in my current project I have to plot a whole bunch of related variables
> (item batteries, e.g. How do you rate ... a) Accelaration, b) Horse 
Power,
> c) Color Palette, etc.) which are all rated on a scale from 1 .. 4.
>
> I need to present the results as stacked bar charts where the variables
> are columns and the percentages of the scales values (1 .. 4) are the
> chunks of the stacked bar for each variable. To do this I have 
transformed
> my data from wide to long and calculated the percentage for each 
variable
> and value. The code for this is as follows:
>
> -- cut --
>
> dfr <- structure(
>   list(
> v07_01 = c(3, 1, 1, 4, 3, 4, 4, 1, 3, 2, 2, 3,
>4, 4, 4, 1, 1, 3, 3, 4),
> v07_02 = c(1, 2, 1, 1, 2, 1, 4, 1, 1,
>4, 4, 1, 4, 4, 1, 3, 2, 3, 3, 1),
> v07_03 = c(3, 2, 2, 1, 4, 1,
>2, 3, 3, 1, 4, 2, 3, 1, 4, 1, 4, 2, 2, 3),
> v07_04 = c(3, 1, 1,
>4, 2, 4, 4, 2, 2, 2, 4, 1, 2, 1, 3, 1, 2, 4, 1, 4),
> v07_05 = c(1,
>2, 2, 2, 4, 4, 1, 1, 4, 4, 2, 1, 2, 1, 4, 1, 2, 4, 1, 4),
> v07_06 = c(1,
>2, 1, 2, 1, 1, 3, 4, 3, 2, 2, 3, 3, 2, 4, 2, 3, 1, 4, 3),
> v07_07 = c(3,
>2, 3, 3, 1, 1, 3, 3, 4, 4, 1, 3, 1, 3, 2, 4, 1, 2, 3, 4),
> v07_08 = c(3,
>2, 1, 2, 2, 2, 3, 3, 4, 4, 1, 1, 1, 2, 3, 1, 4, 2, 2, 4),
> cased_id = structure(
>   1:20,
>   .Label = c(
> "1",
> "2",
> "3",
> "4",
> "5",
> "6",
> "7",
> "8",
> "9",
> "10",
> "11",
> "12",
> "13",
> "14",
> "15",
> "16",
> "17",
> "18",
> "19",
> "20"
>   ),
>   class = "factor"
> )
>   ),
>   .Names = c(
> "v07_01",
> "v07_02",
> "v07_03",
> "v07_04",
> "v07_05",
> "v07_06",
> "v07_07",
> "v07_08",
> "cased_id"
>   ),
>   row.names = c(NA, -20L),
>   class = c("tbl_df", "tbl",
> "data.frame")
> )
>
> mdf <- melt(df)
> d_result <- mdf  %>%
>   dplyr::group_by(variable) %>%
>   count(value)
>
> ggplot(
>   d_result,
>   aes(variable, y = n, fill = value)) +
>   geom_bar(stat = "identity") +
>   coord_cartesian(ylim = c(0,100))
>
> -- cut --
>
> Is there an easier way of doing this, i. e. a way without need to
> transform the data?
>
> How can I change the colors for the data points 1 .. 4?
>
> I tried
>
> -- cut --
>
>   d_result,
>   aes(variable, y = n, fill = value)) +
>   geom_bar(stat = "identity") +
>   coord_cartesian(ylim = c(0,100)) +
>   scale_fill_manual(values = RColorBrewer::brewer.pal(4, "Blues"))
>
> -- cut -
>
> but this does not work cause I am mixing continuous and descrete values.
>
> How can I change the colors for the bars?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Way to Plot Multiple Variables and Change Color

2017-03-28 Thread G . Maubach

Hi All,

in my current project I have to plot a whole bunch of related variables 
(item batteries, e.g. How do you rate ... a) Accelaration, b) Horse Power, 
c) Color Palette, etc.) which are all rated on a scale from 1 .. 4.

I need to present the results as stacked bar charts where the variables 
are columns and the percentages of the scales values (1 .. 4) are the 
chunks of the stacked bar for each variable. To do this I have transformed 
my data from wide to long and calculated the percentage for each variable 
and value. The code for this is as follows:

-- cut --

dfr <- structure(
  list(
v07_01 = c(3, 1, 1, 4, 3, 4, 4, 1, 3, 2, 2, 3,
   4, 4, 4, 1, 1, 3, 3, 4),
v07_02 = c(1, 2, 1, 1, 2, 1, 4, 1, 1,
   4, 4, 1, 4, 4, 1, 3, 2, 3, 3, 1),
v07_03 = c(3, 2, 2, 1, 4, 1,
   2, 3, 3, 1, 4, 2, 3, 1, 4, 1, 4, 2, 2, 3),
v07_04 = c(3, 1, 1,
   4, 2, 4, 4, 2, 2, 2, 4, 1, 2, 1, 3, 1, 2, 4, 1, 4),
v07_05 = c(1,
   2, 2, 2, 4, 4, 1, 1, 4, 4, 2, 1, 2, 1, 4, 1, 2, 4, 1, 4),
v07_06 = c(1,
   2, 1, 2, 1, 1, 3, 4, 3, 2, 2, 3, 3, 2, 4, 2, 3, 1, 4, 3),
v07_07 = c(3,
   2, 3, 3, 1, 1, 3, 3, 4, 4, 1, 3, 1, 3, 2, 4, 1, 2, 3, 4),
v07_08 = c(3,
   2, 1, 2, 2, 2, 3, 3, 4, 4, 1, 1, 1, 2, 3, 1, 4, 2, 2, 4),
cased_id = structure(
  1:20,
  .Label = c(
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"10",
"11",
"12",
"13",
"14",
"15",
"16",
"17",
"18",
"19",
"20"
  ),
  class = "factor"
)
  ),
  .Names = c(
"v07_01",
"v07_02",
"v07_03",
"v07_04",
"v07_05",
"v07_06",
"v07_07",
"v07_08",
"cased_id"
  ),
  row.names = c(NA, -20L),
  class = c("tbl_df", "tbl",
"data.frame")
)

mdf <- melt(df)
d_result <- mdf  %>%
  dplyr::group_by(variable) %>%
  count(value)

ggplot(
  d_result,
  aes(variable, y = n, fill = value)) +
  geom_bar(stat = "identity") +
  coord_cartesian(ylim = c(0,100))

-- cut --

Is there an easier way of doing this, i. e. a way without need to 
transform the data?

How can I change the colors for the data points 1 .. 4?

I tried

-- cut --

  d_result,
  aes(variable, y = n, fill = value)) +
  geom_bar(stat = "identity") +
  coord_cartesian(ylim = c(0,100)) +
  scale_fill_manual(values = RColorBrewer::brewer.pal(4, "Blues"))

-- cut -

but this does not work cause I am mixing continuous and descrete values.

How can I change the colors for the bars?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] ggplot2: Adjusting title and labels

2017-03-16 Thread G . Maubach

Hi All,

I have a question to ggplot 2. My code is the following:

-- cut --

library(ggplot2)
library(scales)

df <-
  data.frame(group = c("Male", "Female", "Child"),
 value = c(25, 25, 50))

blank_theme <- theme_minimal() + theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  axis.text.x = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.ticks = element_blank(),
  plot.title = element_text(size = 4, face = "bold"))

ggplot(df, aes(x = "", y = value, fill = group)) +
  geom_bar(
width = 1,
stat = "identity") +
  coord_polar("y", start = 0) +
  scale_fill_brewer(
name = "Gruppe",
palette = "Blues") +
  blank_theme +
  geom_text(
aes(
  y = c(10, 40, 75),
  label = scales::percent(value/100)),
size = 5) +
  labs(title = "Pie Title")

-- cut --

Is there a way to give the position of the labels to the chunks of the pie 
in a generalized form instead of finding the value interatively by 
trial-n-error?

How can I adjust the title of the graph converning font height and postion 
(e. g. center)?

Kind regards

Georg


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Approach for Storing Result Data

2017-03-09 Thread G . Maubach

Hi Gunter,
Hi Jeff,
Hi Readers,

many thanks for your reply.

My questions seems to be a little off topic cause it is not about using 
the programming language itself but how to use it in a analytics context. 
It is about processes and approaches of how to do things in R from a 
conception point of view. That is a subject I don't see in the community 
but would help me a lot to enhance my work.

Do you know I place where these things are discussed?

Kind regards

Georg



Von:Jeff Newmiller 
An: r-help@r-project.org, g.maub...@weinwolf.de, 
Datum:  08.03.2017 17:54
Betreff:Re: [R] Approach for Storing Result Data



Seems pretty normal except that your one-by-one lookup process usually 
gets old eventually, and comparing results is much easier if you merge the 
study data with the lookup data all at once and then use aggregate() (or 
any of numerous equivalents from contributed packages) to collect results 
or color/linetype/panel/etc plotted graphical presentations with lattice 
or ggplot2.



Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Kopie:  R-help 
Datum:  08.03.2017 17:43
Betreff:Re: [R] Approach for Storing Result Data



This does not appear to be a legitimate topic for r-help: it is are
not a consulting service. Please see the posting guide.

Of course, others may disagree and reply. Wouldn't be the first time I'm 
wrong.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Mar 8, 2017 at 7:27 AM,   wrote:
> Hi All,
>
> today I have a more general question concerning the approach of storing
> different values from the analysis of multiple variables.
>
> My task is to compare distributions in a universe with distributions 
from
> the respondents using a whole bunch of variables. Comparison shall be 
done
> on relative frequencies (proportions).
>
> I was thinking about the structure I should store the results in and 
came
> up with the following:
>
> -- cut --
>
> library(stringi)
>
> # Result data frame
> # Some sort of tidytidy data set where
> # each value is stored as an identity.
> # This way all values for all variables could be stored in
> # one unique data structure.
> # If an additional variable added for the name of the
> # research one could also build result data set across
> # surveys.
> # Values for measure could be "number" for 'raw' values or
> # "freq" for frequencies/counts.
> # Values for unit could be "n" for 'numbers' and
> # "%" for percentages.
> d_test <- data.frame(
> group = rep(c("Universe", "Respondents"), each = 16),
> variable = rep("State", 32),
> value = rep(c(11.3,
> 12.7,
> 3.3,
> 5,
> 0.6,
> 8.1,
> 6.2,
> 5.8,
> 6.4,
> 14.5,
> 8.3,
> 0.3,
> 3.8,
> 2.5,
> 8.1,
> 3), 2),
> label = rep(c("Baden-Wuerttemberg",
> "Bayern",
> "Berlin",
> "Brandenburg",
> "Bremen",
> "Hamburg",
> "Hessen",
> "Mecklenburg-Vorpommern",
> "Niedersachsen",
> "Nordrhein-Westfalen",
> "Rheinland-Pfalz",
> "Saarland",
> "Sachsen",
> "Sachsen-Anhalt",
> "Schleswig-Holstein",
> "Thueringen"),2),
> measure = rep("freq", 32),
> unit = rep("%", 32),
> stringsAsFactors = FALSE
> )
>
> # This way the variables can be selected using simple
> # value selection from Base R functionality.
> data <- d_test[d_test$variable == "State" ,]
>
> # And plot results for every variable.
> ggplot(
>   data = data,
>   aes(
> x = label,
> y = value,
> fill = group)) +
>   geom_bar(stat = "identity", position = "dodge") +
>   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>   scale_fill_discrete(name = 
stringi::stri_trans_totitle(names(data)[1]))
> +
>   scale_x_discrete(name = data$variable[1]) +
>   scale_y_discrete(name = data$unit[1])
>
> -- cut --
>
> The reporting / presentation is done in R Markdown. I would load the
> result data set once at the beginning and running the comparisons as 
plots
> on each variable named in the results data set under "variable".
>
> If I follow this approach for my customer relationship survey, do think 
I
> would face drawbacks or run into serious trouble?
>
> I am interested in your opinion and open for other approaches and
> suggestions.
>
> Kind regards
>
> Georg
>
> __
>

[R] Approach for Storing Result Data

2017-03-08 Thread G . Maubach

Hi All,

today I have a more general question concerning the approach of storing 
different values from the analysis of multiple variables.

My task is to compare distributions in a universe with distributions from 
the respondents using a whole bunch of variables. Comparison shall be done 
on relative frequencies (proportions).

I was thinking about the structure I should store the results in and came 
up with the following:

-- cut --

library(stringi)

# Result data frame
# Some sort of tidytidy data set where
# each value is stored as an identity.
# This way all values for all variables could be stored in
# one unique data structure.
# If an additional variable added for the name of the
# research one could also build result data set across
# surveys.
# Values for measure could be "number" for 'raw' values or
# "freq" for frequencies/counts.
# Values for unit could be "n" for 'numbers' and
# "%" for percentages.
d_test <- data.frame(
group = rep(c("Universe", "Respondents"), each = 16),
variable = rep("State", 32),
value = rep(c(11.3,
12.7,
3.3,
5,
0.6,
8.1,
6.2,
5.8,
6.4,
14.5,
8.3,
0.3,
3.8,
2.5,
8.1,
3), 2),
label = rep(c("Baden-Wuerttemberg",
"Bayern",
"Berlin",
"Brandenburg",
"Bremen",
"Hamburg",
"Hessen",
"Mecklenburg-Vorpommern",
"Niedersachsen",
"Nordrhein-Westfalen",
"Rheinland-Pfalz",
"Saarland",
"Sachsen",
"Sachsen-Anhalt",
"Schleswig-Holstein",
"Thueringen"),2),
measure = rep("freq", 32),
unit = rep("%", 32),
stringsAsFactors = FALSE
)

# This way the variables can be selected using simple
# value selection from Base R functionality.
data <- d_test[d_test$variable == "State" ,]

# And plot results for every variable.
ggplot(
  data = data,
  aes(
x = label,
y = value,
fill = group)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1])) 
+
  scale_x_discrete(name = data$variable[1]) +
  scale_y_discrete(name = data$unit[1])

-- cut --

The reporting / presentation is done in R Markdown. I would load the 
result data set once at the beginning and running the comparisons as plots 
on each variable named in the results data set under "variable".

If I follow this approach for my customer relationship survey, do think I 
would face drawbacks or run into serious trouble?

I am interested in your opinion and open for other approaches and 
suggestions.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Follow-up: RStudio: Place for Storing Options (as plain text)

2017-03-08 Thread G . Maubach

Hi All,

I got a late reply from RStudio Support concerning the question where 
RStudio store options and configurations:

-- cut --

The post RStudio Config Files has a new comment. 
. . .
Unfortunately, it's unlikely that we'll be able to provide a programmatic 
R interface in the near future -- the way we lay out and store RStudio's 
client state does not make it as amenable to public consumption as we 
might hope.
That said, you can generally copy everything within that folder to a new 
machine (at the same relative path from the user home directory), and 
expect preferences to be respected + restored as you might expect.
. . .
--cut --

The result of the discussion is:

We can copy the complete RStudio directory for storing options and 
configurations under

%localappdata%\RStudio-Desktop or 
C:\Users\\AppData\Local\RStudio-Desktop

and copy it completely to a new installation of RStudio.

A programmatic approach to edit RStudio options and configurations is not 
possible due to design decisions.

The purpose of the initial question was to find a way to save RStudio 
options and configurations, e g. on git/github or similar. This is 
possible by initialising the above given directory with git or similar.

An open question is what happens if a new RStudio release makes changes to 
the options and configurations. If the stored directory can be completely 
used would need additional clearification, i.e. for each new version.

Kind regards

Georg

Von:Martin Maechler 
An: 
Kopie:   ,
Datum:  23.02.2017 08:37
Betreff:Re: [R] RStudio: Place for Storing Options

> Jeff Newmiller 
> on Sat, 11 Feb 2017 08:09:36 -0800 writes:

> For the record, then, Google listened to my incantation of
> "rstudio configuration file" and the second result was:

> 
https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State

> RStudio Desktop is also open source, so you can download
> the source code and look at the operating-system-specific
> bits (for "where") if the above link goes out of date or
> disappears.

Thanks a lot, Jeff!

And for the archives:  On reasonable OS's,  the hidden
directory/folder containing all the info is
  ~/.rstudio-desktop/
and if "things are broken" the recommendation is to rename that
   mv ~/.rstudio-desktop  ~/backup-rstudio-desktop
and (zip and) send along with your e-mail to the experts for diagnosis.

> On Thu, 9 Feb 2017, Martin Maechler wrote:

>> 
>>> Ulrik Stervbo  on Thu, 9
>>> Feb 2017 14:37:57 + writes:
>> 
>> > Hi Georg, > maybe someone here knows, but I think you
>> are more likely to get answers to > Rstudio related
>> questions with RStudio support: >
>> https://support.rstudio.com/hc/en-us
>> 
>> > Best, > Ulrik
>> 
>> Indeed, thank you, Ulrik.
>> 
>> In this special case, however, I'm quite sure many
>> readers of R-help would be interested in the answer; so
>> once you receive an answer, please post it (or a link to
>> a public URL with it) here on R-help, thank you in
>> advance.
>> 
>> We would like to be able to *save*, or sometimes *set* /
>> *reset* such options "in a scripted manner", e.g. for
>> controlled exam sessions.
>> 
>> Martin Maechler, ETH Zurich
>> 
>> > On Thu, 9 Feb 2017 at 12:35 
>> wrote:
>> 
>> >> Hi All, >> I would like to make a backup of my RStudio
>> IDE options I configure using >> "Tools/Global Options"
>> from the menu bar. Searching the >> web did not reveal
>> anything.
>> 
>> >> Can you tell me where RStudio IDE does store its
>> configuration?
>> 
>> >> Kind regards >> Georg
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
>> 

> 
---
> Jeff Newmiller The .  .  Go Live...
> DCN: Basics: ##.#.  ##.#.  Live
> Go...  Live: OO#.. Dead: OO#..  Playing Research Engineer
> (Solar/Batteries O.O#.  #.O#.  with /Software/Embedded
> Controllers) .OO#.  .OO#.  rocks...1k
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Follow-up: RStudio: Place for Storing Options

2017-03-08 Thread G . Maubach

Hi All,

I got a late reply from RStudio Support concerning the question where 
RStudio store options and configurations:

-- cut --

The post RStudio Config Files has a new comment. 
. . .
Unfortunately, it's unlikely that we'll be able to provide a programmatic 
R interface in the near future -- the way we lay out and store RStudio's 
client state does not make it as amenable to public consumption as we 
might hope.
That said, you can generally copy everything within that folder to a new 
machine (at the same relative path from the user home directory), and 
expect preferences to be respected + restored as you might expect.
. . .
--cut --

The result of the discussion is:

We can copy the complete RStudio directory for storing options and 
configurations under

%localappdata%\RStudio-Desktop or 
C:\Users\\AppData\Local\RStudio-Desktop

and copy it completely to a new installation of RStudio.

A programmatic approach to edit RStudio options and configurations is not 
possible due to design decisions.

The purpose of the initial question was to find a way to save RStudio 
options and configurations, e g. on git/github or similar. This is 
possible by initialising the above given directory with git or similar.

An open question is what happens if a new RStudio release makes changes to 
the options and configurations. If the stored directory can be completely 
used would need additional clearification, i.e. for each new version.

Kind regards

Georg







Von:Martin Maechler 
An: Jeff Newmiller , 
Kopie:  Martin Maechler , 
, R-help mailing list 
Datum:  23.02.2017 08:37
Betreff:Re: [R] RStudio: Place for Storing Options



> Jeff Newmiller 
> on Sat, 11 Feb 2017 08:09:36 -0800 writes:

> For the record, then, Google listened to my incantation of
> "rstudio configuration file" and the second result was:

> 
https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State


> RStudio Desktop is also open source, so you can download
> the source code and look at the operating-system-specific
> bits (for "where") if the above link goes out of date or
> disappears.

Thanks a lot, Jeff!

And for the archives:  On reasonable OS's,  the hidden
directory/folder containing all the info is
  ~/.rstudio-desktop/
and if "things are broken" the recommendation is to rename that
   mv ~/.rstudio-desktop  ~/backup-rstudio-desktop
and (zip and) send along with your e-mail to the experts for diagnosis.


> On Thu, 9 Feb 2017, Martin Maechler wrote:

>> 
>>> Ulrik Stervbo  on Thu, 9
>>> Feb 2017 14:37:57 + writes:
>> 
>> > Hi Georg, > maybe someone here knows, but I think you
>> are more likely to get answers to > Rstudio related
>> questions with RStudio support: >
>> https://support.rstudio.com/hc/en-us
>> 
>> > Best, > Ulrik
>> 
>> Indeed, thank you, Ulrik.
>> 
>> In this special case, however, I'm quite sure many
>> readers of R-help would be interested in the answer; so
>> once you receive an answer, please post it (or a link to
>> a public URL with it) here on R-help, thank you in
>> advance.
>> 
>> We would like to be able to *save*, or sometimes *set* /
>> *reset* such options "in a scripted manner", e.g. for
>> controlled exam sessions.
>> 
>> Martin Maechler, ETH Zurich
>> 
>> > On Thu, 9 Feb 2017 at 12:35 
>> wrote:
>> 
>> >> Hi All, >> I would like to make a backup of my RStudio
>> IDE options I configure using >> "Tools/Global Options"
>> from the menu bar. Searching the >> web did not reveal
>> anything.
>> 
>> >> Can you tell me where RStudio IDE does store its
>> configuration?
>> 
>> >> Kind regards >> Georg
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
>> 

> 
---
> Jeff Newmiller The .  .  Go Live...
> DCN: Basics: ##.#.  ##.#.  Live
> Go...  Live: OO#.. Dead: OO#..  Playing Research Engineer
> (Solar/Batteries O.O#.  #.O#.  with /Software/Embedded
> Controllers) .OO#.  .OO#.  rocks...1k
> 
---


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To

[R] xtable: Width of Columns

2017-03-02 Thread G . Maubach

Hi All,

I have the following code in R Markdown document:

```{r, results = "asis", echo = FALSE}
library(xtable)
response <- as.data.frame(matrix(NA, 2, 2))
colnames(response) <- c("Anzahl", "Prozent")
rownames(response) <- c("gesamte R�cksendungen (brutto)  ",
"auswertbare Frageb�gen (netto)  ")
response[[1, 1]] <- 1
response[[1, 2]] <- 2.0
response[[2, 1]] <- 3
response[[2, 2]] <- 4.0

response_table <- xtable(
  response, 
  caption = "R�cklauf und R�cklaufquote",
  label = "Responsequote",
  display = c("s","d","f"),
  digits = 1,
  align = c("l", "c", "c") #  auto = TRUE
  )

print.xtable(
  response_table,
  type = "html",
  caption.placement = "top",
  format.args = list(
big.mark = ".",
decimal.mark = ","),
  size = 500,
  width = 100)
```

and would like to control the width of the columns. But columns width is 
always aligned to the content.

Is there a way to give the columns width, e.g. 25 characters, for all 
columns or for each column separately to get more spacing for the text and 
the borders of the table?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: RStudio: Place for Storing Options

2017-02-23 Thread G . Maubach

Hi Martin,

the command

%localappdata%\RStudio-Desktop

gives on my machine

"The command is written wrong or could not be found.".

I found "RStudio-Desktop" under

C:\Users\\AppData\Local\RStudio-Desktop

There references on created notebooks and presentations are stored in the 
folder "RStudio-Desktop". RStudio config is not documented yet.

Kind regards

Georg




Von:Martin Maechler 
An: Jeff Newmiller , 
Kopie:  Martin Maechler , 
, R-help mailing list 
Datum:  23.02.2017 08:37
Betreff:Re: [R] RStudio: Place for Storing Options



> Jeff Newmiller 
> on Sat, 11 Feb 2017 08:09:36 -0800 writes:

> For the record, then, Google listened to my incantation of
> "rstudio configuration file" and the second result was:

> 
https://support.rstudio.com/hc/en-us/articles/200534577-Resetting-RStudio-Desktop-s-State


> RStudio Desktop is also open source, so you can download
> the source code and look at the operating-system-specific
> bits (for "where") if the above link goes out of date or
> disappears.

Thanks a lot, Jeff!

And for the archives:  On reasonable OS's,  the hidden
directory/folder containing all the info is
  ~/.rstudio-desktop/
and if "things are broken" the recommendation is to rename that
   mv ~/.rstudio-desktop  ~/backup-rstudio-desktop
and (zip and) send along with your e-mail to the experts for diagnosis.


> On Thu, 9 Feb 2017, Martin Maechler wrote:

>> 
>>> Ulrik Stervbo  on Thu, 9
>>> Feb 2017 14:37:57 + writes:
>> 
>> > Hi Georg, > maybe someone here knows, but I think you
>> are more likely to get answers to > Rstudio related
>> questions with RStudio support: >
>> https://support.rstudio.com/hc/en-us
>> 
>> > Best, > Ulrik
>> 
>> Indeed, thank you, Ulrik.
>> 
>> In this special case, however, I'm quite sure many
>> readers of R-help would be interested in the answer; so
>> once you receive an answer, please post it (or a link to
>> a public URL with it) here on R-help, thank you in
>> advance.
>> 
>> We would like to be able to *save*, or sometimes *set* /
>> *reset* such options "in a scripted manner", e.g. for
>> controlled exam sessions.
>> 
>> Martin Maechler, ETH Zurich
>> 
>> > On Thu, 9 Feb 2017 at 12:35 
>> wrote:
>> 
>> >> Hi All, >> I would like to make a backup of my RStudio
>> IDE options I configure using >> "Tools/Global Options"
>> from the menu bar. Searching the >> web did not reveal
>> anything.
>> 
>> >> Can you tell me where RStudio IDE does store its
>> configuration?
>> 
>> >> Kind regards >> Georg
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and
>> more, see https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide
>> commented, minimal, self-contained, reproducible code.
>> 

> 
---
> Jeff Newmiller The .  .  Go Live...
> DCN: Basics: ##.#.  ##.#.  Live
> Go...  Live: OO#.. Dead: OO#..  Playing Research Engineer
> (Solar/Batteries O.O#.  #.O#.  with /Software/Embedded
> Controllers) .OO#.  .OO#.  rocks...1k
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: packrat: Failed to download current version of foreign(0.8-67)

2017-02-21 Thread G . Maubach

Packrat does a beautiful job, creating local project repositories of all 
used libraries. If only one library is missing the complete repository is 
not stored. Having all but one library in the repository is far better 
than having none.

I suggest to change the behaviour of packrat to store all libraries it can 
get in the directory "packrat" and not delete it if one library is 
missing. This would help a lot.

Is this possible?

Kind regards

Georg




Von:Uwe Ligges 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  21.02.2017 09:50
Betreff:Re: Antwort: Re: [R] packrat: Failed to download current 
version of foreign(0.8-67)



Yes, then we cannot help and you have to ask your company how to get the 
files, of course.

Best,
Uwe Ligges



On 21.02.2017 08:16, g.maub...@weinwolf.de wrote:
> Hi Mr. Ligges,
>
> doing as you said R responds with
>
> install.packages("foreign")
> trying URL
> 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip'
> Warning in install.packages :
>   cannot open URL
> 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip':
> HTTP status was '403 Forbidden (Content blocked by Trustwave Secure Web
> Gateway)'
> Error in download.file(url, destfile, method, mode = "wb", ...) :
>   cannot open URL
> 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip'
> Warning in install.packages :
>   download of package ‘foreign’ failed
>
> Running
>
> install.packages("foreign", type = "source")
> trying URL
> 'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz'
> Warning in install.packages :
>   cannot open URL
> 'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz': HTTP
> status was '403 Forbidden (Content blocked by Trustwave Secure Web
> Gateway)'
> Error in download.file(url, destfile, method, mode = "wb", ...) :
>   cannot open URL
> 'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz'
> Warning in install.packages :
>   download of package ‘foreign’ failed
>
> The firewall in my company blocks all binary files. Foreign is 
downloaded
> in "wb" mode. Thus I have no chance to get it. The first fresh
> installation was done from an external drive. As packrat is also
> downloading the binaries instead of the source my download will always
> fail.
>
> My sessionInfo() is
> sessionInfo()
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=German_Germany.1252
> [2] LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods
> [7] base
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.2
>
> Do have a suggestion?
>
> Kind regards
>
> Georg
>
>
>
>
>
> Von:Uwe Ligges 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  20.02.2017 21:29
> Betreff:Re: [R] packrat: Failed to download current version of
> foreign(0.8-67)
>
>
>
> foreign is a recommended package that is already part of your R
> installation. and there shoudl not be a problem to install a recent
> version of it.
>
> What is the error message of you run
> install.packages("foreign") from a new R session?
>
> Best,
> Uwe Ligges
>
>
>
> On 20.02.2017 17:33, g.maub...@weinwolf.de wrote:
>> Hi All,
>>
>> I tried to use packrat on
>>
>> R version 3.3.2 (2016-10-31)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>
>> locale:
>> [1] LC_COLLATE=German_Germany.1252
>> [2] LC_CTYPE=German_Germany.1252
>> [3] LC_MONETARY=German_Germany.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=German_Germany.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods
>> [7] base
>>
>> other attached packages:
>> [1] packrat_0.4.8-1
>>
>> loaded via a namespace (and not attached):
>> [1] tools_3.3.2
>>
>> Due to internal firewall restrictions the package "foreign" could not 
be
>> downloaded as source. I assume that the package also contains some
> binary
>> parts which will be blocked by the firewall.
>>
>> When running packrat a directory "packrat" and a file called .Rprofile
>> were created in the project directory. A lot of library sources were
>> download, but not for "foreign".
>>
>> After finishing the process the directory "packrat" and the file
> .Rprofile
>> were deleted from the project directory.
>>
>> Why is that? Just one source library missing and the whole directory is
>> gone? Having all libraries for my project without just one is better
> than
>> none!
>>
>> How can I use packrat with the missing library "foreign"?
>>
>> Kind regards
>>
>> Georg
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,

[R] Antwort: Re: packrat: Failed to download current version of foreign(0.8-67)

2017-02-20 Thread G . Maubach

Hi Mr. Ligges,

doing as you said R responds with

install.packages("foreign")
trying URL 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip'
Warning in install.packages :
  cannot open URL 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip': 
HTTP status was '403 Forbidden (Content blocked by Trustwave Secure Web 
Gateway)'
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 
'https://cran.uni-muenster.de/bin/windows/contrib/3.3/foreign_0.8-67.zip'
Warning in install.packages :
  download of package ‘foreign’ failed

Running

install.packages("foreign", type = "source")
trying URL 
'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz'
Warning in install.packages :
  cannot open URL 
'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz': HTTP 
status was '403 Forbidden (Content blocked by Trustwave Secure Web 
Gateway)'
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 
'https://cran.uni-muenster.de/src/contrib/foreign_0.8-67.tar.gz'
Warning in install.packages :
  download of package ‘foreign’ failed

The firewall in my company blocks all binary files. Foreign is downloaded 
in "wb" mode. Thus I have no chance to get it. The first fresh 
installation was done from an external drive. As packrat is also 
downloading the binaries instead of the source my download will always 
fail.

My sessionInfo() is
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods 
[7] base 

loaded via a namespace (and not attached):
[1] tools_3.3.2

Do have a suggestion?

Kind regards

Georg





Von:Uwe Ligges 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  20.02.2017 21:29
Betreff:Re: [R] packrat: Failed to download current version of 
foreign(0.8-67)



foreign is a recommended package that is already part of your R 
installation. and there shoudl not be a problem to install a recent 
version of it.

What is the error message of you run
install.packages("foreign") from a new R session?

Best,
Uwe Ligges



On 20.02.2017 17:33, g.maub...@weinwolf.de wrote:
> Hi All,
>
> I tried to use packrat on
>
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=German_Germany.1252
> [2] LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252
> [4] LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods
> [7] base
>
> other attached packages:
> [1] packrat_0.4.8-1
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.2
>
> Due to internal firewall restrictions the package "foreign" could not be
> downloaded as source. I assume that the package also contains some 
binary
> parts which will be blocked by the firewall.
>
> When running packrat a directory "packrat" and a file called .Rprofile
> were created in the project directory. A lot of library sources were
> download, but not for "foreign".
>
> After finishing the process the directory "packrat" and the file 
.Rprofile
> were deleted from the project directory.
>
> Why is that? Just one source library missing and the whole directory is
> gone? Having all libraries for my project without just one is better 
than
> none!
>
> How can I use packrat with the missing library "foreign"?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: RStudio: Place for Storing Options

2017-02-20 Thread G . Maubach

Hi Martin,
Hi Ulrik,

I am still working on the answer. I got a message from RStudio team but I 
am still working on the clearification of the answer and a possible 
solution.

Kind regards

Georg

Von:Martin Maechler 
An: , 
Kopie:  Ulrik Stervbo , R-help mailing list 

Datum:  09.02.2017 16:05
Betreff:Re: [R] RStudio: Place for Storing Options

> Ulrik Stervbo 
> on Thu, 9 Feb 2017 14:37:57 + writes:

> Hi Georg,
> maybe someone here knows, but I think you are more likely to get 
answers to
> Rstudio related questions with RStudio support:
> https://support.rstudio.com/hc/en-us

> Best,
> Ulrik

Indeed, thank you, Ulrik.

In this special case, however, I'm quite sure many readers of
R-help would be interested in the answer; so once you receive an
answer, please post it (or a link to a public URL with it) here
on R-help, thank you in advance.

We would like to be able to *save*, or sometimes *set* / *reset*
such options  "in a scripted manner", e.g. for
controlled exam sessions.

Martin Maechler,
ETH Zurich

> On Thu, 9 Feb 2017 at 12:35  wrote:

>> Hi All,
>> I would like to make a backup of my RStudio IDE options I configure 
using 
>> "Tools/Global Options" from the menu bar. Searching the
>> web did not reveal anything.

>> Can you tell me where RStudio IDE does store its configuration?

>> Kind regards
>> Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] packrat: Failed to download current version of foreign(0.8-67)

2017-02-20 Thread G . Maubach

Hi All,

I tried to use packrat on

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods 
[7] base 

other attached packages:
[1] packrat_0.4.8-1

loaded via a namespace (and not attached):
[1] tools_3.3.2

Due to internal firewall restrictions the package "foreign" could not be 
downloaded as source. I assume that the package also contains some binary 
parts which will be blocked by the firewall.

When running packrat a directory "packrat" and a file called .Rprofile 
were created in the project directory. A lot of library sources were 
download, but not for "foreign".

After finishing the process the directory "packrat" and the file .Rprofile 
were deleted from the project directory.

Why is that? Just one source library missing and the whole directory is 
gone? Having all libraries for my project without just one is better than 
none!

How can I use packrat with the missing library "foreign"?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RStudio: Place for Storing Options

2017-02-09 Thread G . Maubach

Hi All,

I would like to make a backup of my RStudio IDE options I configure using 
"Tools/Global Options" from the menu bar. Searching the web did not reveal 
anything.

Can you tell me where RStudio IDE does store its configuration?

Kind regards

Georg


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Authentication and Web Site Scraping

2017-01-21 Thread G . Maubach

Hi All,

I would like to learn how to scrape a web site which is password protected. I 
do my training with my Delicious web site. I will obey all rules and 
legislation existent.

The delicious export api was shut down. I assume that the web site will be shut 
down in the foreseeable future. In my Coursera Course I learned that it is 
possible to scrape web sites and extract the information in it. I would like to 
use this possibility to download the bookmark pages and extract the bookmarks 
with its accompanying tags as an alternative to the non-existant export api.

I started with

-- cut --
url_base <- "https://del.icio.us/gmaubach?=;

data_created <- as.character(Sys.Date())
filename_base <-
  paste0(
data_created,
"_Delicious_Page_")

page_start <- 1
page_end <- 670

for (page in seq_along(page_start:page_end))
{
  download.file(
url = paste0(
  url_base,
  as.character(page)),
destfile = paste0(
  filename_base,
  as.character(page)))
}
-- cut --

This way approx. 1000 bookmarks are not loaded cause only the public bookmarks 
are shown. I know that it is possible to authenticate using something like

-- cut --
page <- GET("https://del.icio.us;,
   authenticate("user", "password"))
-- cut --

To not have to authenticate over and over again, it is possible to use handles 
like

-- cut --
delicious <- handle("https://del.icio.us;)
-- cut --

I do not know how I have to put it all together. What would be a statement 
sequence in getting all stored booksmarks on the pages 1..670 using 
authentication?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Assessing the name of an object within an argument

2017-01-10 Thread G . Maubach

Hi All,

I have a function like

my_func <- function(dataset)
{
  some operation
}

Now I would like not only to operate on the dataset (how this is done is 
obvious) but I would like to get the name of the dataset handed over as an 
argument.

Example:

my_func <- function(dataset = iris)
{
  print(dataset)  # here I do not want to print the dataset but the name 
of the object - iris in this case - instead
  # quote() does not do the trick cause it prints "dataset" instead of 
"iris"
  # as.name() gives an error saying that the object can not coerced to a 
symbol
}

Is there a way to do this?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SOLVED: Re: Source into a specified environment

2017-01-09 Thread G . Maubach

Hi Jim,

many thanks for your answer.

That's exactly what I need.

Many thanks again.

Kind regards

Georg




Von:jim holtman 
An: g.maub...@weinwolf.de, 
Kopie:  R mailing list 
Datum:  10.01.2017 03:59
Betreff:Re: [R] Source into a specified environment



?sys.source

Here is an example of the way I use it:

# read my functions into a environment
.my.env.jph <- new.env()
.sys.source('~/C_Drive/perf/bin/perfmon.r', envir=.my.env.jph)
attach(.my.env.jph)


Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Jan 9, 2017 at 11:21 AM,  wrote:
Hi All,

I wish everyone a happy new year.

I have the following code:

-- cut --

modules <- c("t_calculate_RFM_model.R", "t_count_na.R",
"t_export_table_2_xls.R",
 "t_find_duplicates_in_variable.R",
"t_find_originals_and_duplicates.R",
 "t_frequencies.R", "t_inspect_dataset.R",
"t_merge_variables.R",
 "t_openxlsx_shortcuts.r", "t_rename_variables.R",
"t_select_chunks.R")

toolbox <- new.env(parent = emptyenv())

for (file in modules)
{
  source(file = file.path(
c_path_full$modules,  # path to modules
file),
echo = TRUE)
}

-- cut --

I would like to know how I can source the modules into the newly created
environment called "toolbox"?

I had a look at the help file for ?source but this function can read in
only in the current environment or the global environment (= default).

I tried also the following

-- cut --

for (file in modules))
{
  do.call(
what = "source",
args = list(
  file = file.path(c_path_full$modules,
   file),
  echo = TRUE
),
envir = toolbox
  )
}

-- cut --

But this did not work, i. e. it did not load the modules into the
environment "toolbox" but into the .GlobalEnv.

I also had a look at "assign", but assign() askes for a name of an object
in quotes. This way I could not figure out how to use it in a loop or
function to name the element in "toolbox" after the modules names:

assign("t_add_sheet", t_add_sheet, envir = toolbox)  # works
assign(quote(t_add_sheet), t_add_sheet, envir = toolbox)  # does NOT work
assign(as.name(t_add_sheet), t_add_sheet, envir = toolbix)  # does NOT
work


Is there a way to load the modules directly into the "toolbox"
environment?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Source into a specified environment

2017-01-09 Thread G . Maubach

Hi All,

I wish everyone a happy new year.

I have the following code:

-- cut --

modules <- c("t_calculate_RFM_model.R", "t_count_na.R", 
"t_export_table_2_xls.R",
 "t_find_duplicates_in_variable.R", 
"t_find_originals_and_duplicates.R",
 "t_frequencies.R", "t_inspect_dataset.R", 
"t_merge_variables.R",
 "t_openxlsx_shortcuts.r", "t_rename_variables.R", 
"t_select_chunks.R")

toolbox <- new.env(parent = emptyenv())

for (file in modules)
{
  source(file = file.path(
c_path_full$modules,  # path to modules
file),
echo = TRUE)
}

-- cut --

I would like to know how I can source the modules into the newly created 
environment called "toolbox"?

I had a look at the help file for ?source but this function can read in 
only in the current environment or the global environment (= default).

I tried also the following

-- cut --

for (file in modules))
{
  do.call(
what = "source",
args = list(
  file = file.path(c_path_full$modules,
   file),
  echo = TRUE
),
envir = toolbox
  )
}

-- cut --

But this did not work, i. e. it did not load the modules into the 
environment "toolbox" but into the .GlobalEnv.

I also had a look at "assign", but assign() askes for a name of an object 
in quotes. This way I could not figure out how to use it in a loop or 
function to name the element in "toolbox" after the modules names:

assign("t_add_sheet", t_add_sheet, envir = toolbox)  # works
assign(quote(t_add_sheet), t_add_sheet, envir = toolbox)  # does NOT work
assign(as.name(t_add_sheet), t_add_sheet, envir = toolbix)  # does NOT 
work


Is there a way to load the modules directly into the "toolbox" 
environment?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] openxlsx: No Formatting of Numbers

2016-12-05 Thread G . Maubach

Hi All,
Dear Readers,

I am using openxlsx to export data to Microsoft Excel 2013, 32-Bit, German 
Version:

--- schnipp ---

library("openxlsx")

dataset <- structure(
  list(
a = c(1126039.81, 45636.44, 14847.41),
b = c(1194447.5,
  88310.53, 18699.68),
c = c(1560307.73, 34203.73, 24755.99),
d = c(1068790.67,
  67581.86, 12378.55)
  ),
  .Names = c("a", "b", "c", "d"),
  row.names = c(NA,
3L),
  class = "data.frame"
)

xlsx_workbook <- openxlsx::createWorkbook()
openxlsx::addWorksheet(
  wb = xlsx_workbook,
  sheetName = "Numbers")

openxlsx::writeData(
  wb = xlsx_workbook,
  sheet = "Numbers",
  x = dataset,
  rowNames = TRUE,
  colNames = TRUE,
  startRow = 2,
  startCol = 2,
  borders = c("surrounding")
)

myStyle <- openxlsx::createStyle(numFmt = "###.###.##0")

openxlsx::addStyle(wb = xlsx_workbook,
   sheet = "Numbers",
   style = myStyle,
   rows = 1:1,
   cols = 10:10,
   gridExpand = TRUE,
   stack = TRUE)

openxlsx::saveWorkbook(
  wb = xlsx_workbook,
  file = "C:/temp/openxlsx_example.xlsx",
  overwrite = TRUE
)

--- schnipp ---

The problem with this is, that it does not apply the number formats to the 
Excel cell on the sheet. Also, sometimes the boarder of the data on the 
Excel sheet is delete. I could not find out yet what the cause for this 
behaviour is.

My sessionInfo() output is:

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] tools stats graphics  grDevices utils 
[6] datasets  methods   base 

other attached packages:
[1] tidyr_0.5.1stringr_1.1.0  reshape2_1.4.1
[4] openxlsx_3.0.0 dplyr_0.5.0 

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 plyr_1.8.4 magrittr_1.5 
[4] R6_2.2.0   assertthat_0.1 DBI_0.4-1 
[7] tibble_1.1 Rcpp_0.12.5stringi_1.1.1 

I do not want to round the numbers in R, cause my clients would like to 
use them as they are in further calculations.

How can I export a dataframe to Excel, print a border around the complete 
table/dataset (not the single cells) and format the numbers like 
123.456.789 (thousand delimiter dot ".", all numbers without decimals)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] openxlsx: No Formatting of Numbers (TEXT ONLY)

2016-12-05 Thread G . Maubach

Hi All,
Dear Readers,

I am using openxlsx to export data to Microsoft Excel 2013, 32-Bit, German 
Version:

--- schnipp ---

library("openxlsx")

dataset <- structure(
  list(
a = c(1126039.81, 45636.44, 14847.41),
b = c(1194447.5,
  88310.53, 18699.68),
c = c(1560307.73, 34203.73, 24755.99),
d = c(1068790.67,
  67581.86, 12378.55)
  ),
  .Names = c("a", "b", "c", "d"),
  row.names = c(NA,
3L),
  class = "data.frame"
)

xlsx_workbook <- openxlsx::createWorkbook()
openxlsx::addWorksheet(
  wb = xlsx_workbook,
  sheetName = "Numbers")

openxlsx::writeData(
  wb = xlsx_workbook,
  sheet = "Numbers",
  x = dataset,
  rowNames = TRUE,
  colNames = TRUE,
  startRow = 2,
  startCol = 2,
  borders = c("surrounding")
)

myStyle <- openxlsx::createStyle(numFmt = "###.###.##0")

openxlsx::addStyle(wb = xlsx_workbook,
   sheet = "Numbers",
   style = myStyle,
   rows = 1:1,
   cols = 10:10,
   gridExpand = TRUE,
   stack = TRUE)

openxlsx::saveWorkbook(
  wb = xlsx_workbook,
  file = "C:/temp/openxlsx_example.xlsx",
  overwrite = TRUE
)

--- schnipp ---

The problem with this is, that it does not apply the number formats to the 
Excel cell on the sheet. Also, sometimes the boarder of the data on the 
Excel sheet is delete. I could not find out yet what the cause for this 
behaviour is.

My sessionInfo() output is:

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 
[2] LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] tools stats graphics  grDevices utils 
[6] datasets  methods   base 

other attached packages:
[1] tidyr_0.5.1stringr_1.1.0  reshape2_1.4.1
[4] openxlsx_3.0.0 dplyr_0.5.0 

loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 plyr_1.8.4 magrittr_1.5 
[4] R6_2.2.0   assertthat_0.1 DBI_0.4-1 
[7] tibble_1.1 Rcpp_0.12.5stringi_1.1.1 

I do not want to round the numbers in R, cause my clients would like to 
use them as they are in further calculations.

How can I export a dataframe to Excel, print a border around the complete 
table/dataset (not the single cells) and format the numbers like 
123.456.789 (thousand delimiter dot ".", all numbers without decimals)?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: for loop is looping only once [SOLVED]

2016-11-17 Thread G . Maubach

Hi Ulrik,

oh no! What a mistake did I make. But I definitely did not see the 
failure.

Many thanks for helping me.

Kind regards

Georg




Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  17.11.2016 12:24
Betreff:Re: [R] for loop is looping only once



Hi Georg,

Your for loop iterates over just one value, to get it to work as you 
intend use for(item in 1:length(kpis)){}

HTH
Ulrik

On Thu, 17 Nov 2016 at 12:18  wrote:
Hi All,

I need to execute a loop on variables to compute several KPIs.
Unfortunately the for loop is executed only once for the last KPI given.
The code below illustrates my current solution but is not completely
necessary to spot the problem. I just give an idea what I am doing
overall. Looks much but isn't if copied and run in RStudio. The problem
occurs in function f_create_kpi_table() in lines 150 to 157:

  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)

Here is the complete example code with example data:

- cut --
dataset <-
  structure(
list(
  to_2012 = c(
85,
822,
891,
700,
386,
127,
938,
381,
871,
254,
793,
0,
934,
217,
163,
755,
607,
794,
477
  ),
  to_2013 = c(
289,
0,
963,
243,
608,
47,
0,
941,
998,
775,
326,
0,
0,
470,
248,
439,
212,
0,
0
  ),
  to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0,
  0, 405, 0, 0, 642, 0, 0, 0, 47, 299),
  to_2015 = c(
705,
134,
659,
0,
609,
807,
783,
0,
0,
304,
141,
500,
0,
0,
764,
790,
851,
0,
802
  ),
  kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 1, 1),
  kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
1, 1, 0, 1, 1, 1, 0, 0),
  kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0),
  kpi1_2016 = c(0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1),
  kpi2_2013 = c(1, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0),
  kpi2_2014 = c(0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1),
  kpi2_2015 = c(1,
1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1),
  kpi2_2016 = c(1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0)
),
.Names = c(
  "to_2012",
  "to_2013",
  "to_2014",
  "to_2015",
  "kpi1_2013",
  "kpi1_2014",
  "kpi1_2015",
  "kpi1_2016",
  "kpi2_2013",
  "kpi2_2014",
  "kpi2_2015",
  "kpi2_2016"
),
row.names = c(NA, 19L),
class = "data.frame"
  )

f_compute_kpi <- function(
  years,
  kpi,
  kpi_base)
{
  print(years)
  print(kpi)
  print(kpi_base)

  ds_result <- data.frame()

  for (year in years) {
current_year  <- year
previous_year <- year - 1
result <- sum(dataset[dataset[[paste0(kpi,
  "_",
  current_year)]] == 1 ,
  paste0(kpi_base,
 "_", previous_year)],
  na.rm = TRUE)
ds_result <- rbind(ds_result, result)
  }

  ds_result   <- t(ds_result)
  rownames(ds_result) <- kpi
  colnames(ds_result) <- years

  invisible(ds_result)
}

f_create_kpi_table <- function(
  years,
  kpis,
  kpi_bases)
{
  print(length(kpis))

#-- Problematic loop --
  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)
  }
  # This for loop is executed only once for kpi2 instead of
  # as many times as given kpis in length(kpis), i. e.
  # kpi1 AND kpi2.
  # Why?
  # What do I do wrong?
}
-- cut --

What do I need to change to get the loop work correctly and loop over two
elements instead of one when calling the function

f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases
= c("to", "to"))

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] for loop is looping only once

2016-11-17 Thread G . Maubach

Hi All,

I need to execute a loop on variables to compute several KPIs. 
Unfortunately the for loop is executed only once for the last KPI given. 
The code below illustrates my current solution but is not completely 
necessary to spot the problem. I just give an idea what I am doing 
overall. Looks much but isn't if copied and run in RStudio. The problem 
occurs in function f_create_kpi_table() in lines 150 to 157:

  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)

Here is the complete example code with example data:

- cut --
dataset <-
  structure(
list(
  to_2012 = c(
85,
822,
891,
700,
386,
127,
938,
381,
871,
254,
793,
0,
934,
217,
163,
755,
607,
794,
477
  ),
  to_2013 = c(
289,
0,
963,
243,
608,
47,
0,
941,
998,
775,
326,
0,
0,
470,
248,
439,
212,
0,
0
  ),
  to_2014 = c(0, 0, 71, 0, 0, 434, 0, 282, 0,
  0, 405, 0, 0, 642, 0, 0, 0, 47, 299),
  to_2015 = c(
705,
134,
659,
0,
609,
807,
783,
0,
0,
304,
141,
500,
0,
0,
764,
790,
851,
0,
802
  ),
  kpi1_2013 = c(0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
0, 0, 0, 1, 1),
  kpi1_2014 = c(1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
1, 1, 0, 1, 1, 1, 0, 0),
  kpi1_2015 = c(0, 0, 0, 1, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0),
  kpi1_2016 = c(0, 1, 0, 1, 0,
1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1),
  kpi2_2013 = c(1, 0,
1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0),
  kpi2_2014 = c(0,
0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1),
  kpi2_2015 = c(1,
1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1),
  kpi2_2016 = c(1,
0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0)
),
.Names = c(
  "to_2012",
  "to_2013",
  "to_2014",
  "to_2015",
  "kpi1_2013",
  "kpi1_2014",
  "kpi1_2015",
  "kpi1_2016",
  "kpi2_2013",
  "kpi2_2014",
  "kpi2_2015",
  "kpi2_2016"
),
row.names = c(NA, 19L),
class = "data.frame"
  )

f_compute_kpi <- function(
  years,
  kpi,
  kpi_base)
{
  print(years)
  print(kpi)
  print(kpi_base)

  ds_result <- data.frame()

  for (year in years) {
current_year  <- year
previous_year <- year - 1
result <- sum(dataset[dataset[[paste0(kpi,
  "_",
  current_year)]] == 1 ,
  paste0(kpi_base,
 "_", previous_year)],
  na.rm = TRUE)
ds_result <- rbind(ds_result, result)
  }

  ds_result   <- t(ds_result)
  rownames(ds_result) <- kpi
  colnames(ds_result) <- years

  invisible(ds_result)
}

f_create_kpi_table <- function(
  years,
  kpis,
  kpi_bases)
{
  print(length(kpis))

#-- Problematic loop --
  for (item in length(kpis))  # This loop runs only once!
  {
print(kpis[[item]])
ds_kpi <- f_compute_kpi(
  years= years,
  kpi  = kpis[[item]],
  kpi_base = kpi_bases[[item]])
print(ds_kpi)
  }
  # This for loop is executed only once for kpi2 instead of
  # as many times as given kpis in length(kpis), i. e.
  # kpi1 AND kpi2.
  # Why?
  # What do I do wrong?
}
-- cut --

What do I need to change to get the loop work correctly and loop over two 
elements instead of one when calling the function

f_create_kpi_table(years = 2013:2016, kpis = c("kpi1", "kpi2"), kpi_bases 
= c("to", "to"))

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Different results when converting a matrix to a data.frame

2016-11-16 Thread G . Maubach

Hi All,

I build an empty dataframe to fill it will values later. I did the 
following:

-- cut --
matrix(NA, 2, 2)
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> data.frame(matrix(NA, 2, 2))
  X1 X2
1 NA NA
2 NA NA
> as.data.frame(matrix(NA, 2, 2))
  V1 V2
1 NA NA
2 NA NA
-- cut --

Why does data.frame deliver different results than as.data.frame with 
regard to the variable names (V instead of X)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] openxlsx Error: length of rows and cols must be

2016-11-15 Thread G . Maubach

Hi All,

when using 

-- cut --

number_style <- openxlsx::createStyle(
  numFmt = "COMMA"
)

openxlsx::addStyle(
  wb = xlsx_workbook,
  sheet = "Kundenliste",
  style = number_style,
  rows = 2:nrow(customer_list),
  cols = 4:5
  )
--cut --

I get the error

Error in openxlsx::addStyle(wb = xlsx_workbook, sheet = "Kundenliste",  : 
  Length of rows and cols must be equal.

The customer_list can be of any arbritrary length due to subgroup 
definitons. I do not see why the argument "rows" and "cols" should be of 
the same length. This would mean that number formatting can only be done 
for rectangular areas.

What do I need to change to format my numbers in the given area correctly?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Storing long string with white space in variable

2016-10-19 Thread G . Maubach

Hi All,

I would like to store a long string with white space in a variable:

-- cut --
  # Create README.md
  readme <- "---
title: "Your project title here"
author: "Author(s) name(s) here"
date: "Current date here"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
```
# Project Context

# Goals

# Approach

# Reference to main program
´´´{r}
source("main_program.R")
´´´

# Information on used system and configuration
```{r}
cat("Gathering system information ...\n)
sessionInfo()
```
"
cat(readme, file = "README.md")

-- cut --

I am looking for an equivalent to Pythons """  """ long string feature.

I searched the web and found this:

http://stackoverflow.com/questions/6329962/split-code-over-multiple-lines-in-an-r-script
https://stat.ethz.ch/pipermail/r-help/2006-October/115358.html

But this is not the solution to the problem.

How can I store long strings with white space in a variable?

Kind regards

Georg

PS: This is a template for a project folder for each project. I would like 
to create it with R script instead of distributing it as a template file. 
This way one needs only the R script to setup a project like this:

#---
# Module: t_setup_project_directory.R
# Author: Georg Maubach
# Date  : 2016-10-19
# Update: 2016-10-19
# Description   : Setup a directory structure for a new analytics
# project
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7--

t_version = "2016-10-19"
t_module_name = "t_setup_project_directory.R"
t_status = "development"

cat(
paste0(
"\n",
t_module_name,
" (Version: ",
t_version,
", Status: ",
t_status,
")",
"\n",
"\n",
"Copyright (C) Georg Maubach 2016

This software comes with ABSOLUTELY NO WARRANTY.",
"\n",
"\n"
)
)

library(svDialogs)

# If do_test is not defined globally define it here locally by 
un-commenting it
t_do_test <- FALSE

# [ Function Defintion 
]
t_setup_project_directory <- function() {
 
#-
  # Setup a directory structure for a new analytics
  #
  # Args:
  #   None.
  #
  # Operation:
  #   The user can create or select a directory for the projects files.
  #   The function then places all sub directories in this project
  #   folder.
  #   The function saves a RData file with objects containing the path
  #   to project directory and its sub folders.
  #
  # Returns:
  #   Nothing.
  #
  # Error handling:
  #   None.
  #
  # See also:
  #   ./.
 
#-

  # Get and/or create project directory
  v_project_dir <- svDialogs::dlgDir()$res

  # Define names for sub directories
  data  <- "data" # data to be loaded into or
  # saved from R
  documentation <- "documentation"# explanatory material for results
  # (e. g. knitR documents)
  fundamentals  <- "fundamentals" # background knowledge
  input <- "data/input"   # input data eventually manually
  # revised for import
  meta  <- "data/meta"# meta data (e. g. lookup tables)
  output<- "data/output"
  raw   <- "data/raw" # a copy of all input data never
  # touched for safety reasons and
  # not read by R
  program   <- "program"  # all scripts and runnable files
  modules   <- "program/modules"  # project specific packages, files
  # or functions in separate files as
  # well as all other sub routines to
  # be sourced or loaded
  results   <- "results"  # container for all resulring data
  # in an aggregated form
  graphics  <- "results/graphics"
  tables<- "results/tables"
  presentations <- "results/presentations"
  temp  <- "temp"

  v_paths_relative <- list(
project   = v_project_dir,
documentation = documentation,
fundamentals  = fundamentals,
input = input,
meta  = meta,
output= output,
raw   = raw,
program   = program,
modules   = modules,
graphic   = graphics,
table = tables,
presentation  =

[R] Reshaping geographic data

2016-10-17 Thread G . Maubach

Hi All,

I need to reshape an ESRI shape file: http://arnulf.us/PLZ and resp 
http://www.metaspatial.net/download/plz.tar.gz

I found an instruction for T-SQL Server:

https://blog.oraylis.de/2010/05/german-map-spatial-data-for-plz-postal-code-regions/

How can I do this using R?

Kind regards

Georg

-- cut --
Here's my code so far:

download.file(
url = "http://www.metaspatial.net/download/plz.tar.gz;,
destfile = "C:/temp/plz.tar.gz")

untar(tarfile = "C:/temp/plz.tar.gz",
  exdir = "C:/temp",
  compressed = "gzip")

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach

Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  13.10.2016 12:34
Betreff:Re: Antwort: Re: [R] Visibility of libraries called from 
within functions



On 13/10/2016 6:21 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> many thanks for your reply.
>
> Your suggestion of using requireNamespace() together with explicit
> namespace calling using the "::" operator is what I was looking for:
>
> -- cut --
>
> f_test <- function() {
> requireNamespace("openxlsx")
> cat("Loaded packages AFTER loading library")
> print(search())
> xlsx::read.xlsx(file = "c:/temp/test.xlsx",
> sheetName = "test")
> }

Not sure if that's a typo in your message or a real error, but you 
require "openxlsx" and then use "xlsx".

It's a typo!


>
> cat("Loaded packages BEFORE function call ")
> search()
>
> f_test()
>
> cat("Loaded packages AFTER function call -")
> search()
>
> -- cut  --
>
> When reading ?requireNamespace I did not really get how R operates 
behind
> the scenes.
>
> Using "library" attaches the namespace to the search path. Using
> "requireNamespace" does not do that.
>
> But how does R find the namespace then? What kind of list or directory
> used R to to store the namespace and lookup the correct function or
> methods of this namespace?

R has an internal list of packages that are loaded.  Functions in them 
are only visible to user code if the package is *also* on the search 
list, or if the package name prefix is used with ::.

Can I have a look at this internal list like I can do with search() for 
pachages or ls() for objects?

If xlsx is loaded, xlsx::read.xlsx will just use it; if it is not 
loaded, the package will be loaded to make the call.  So you don't need 
the requireNamespace call if you can be sure that xlsx will be found. 
You would normally use its return value (FALSE if the package is not 
found) to test whether it will be safe to make the xlsx::read.xlsx call.

Got it!



Duncan Murdoch

>
> Kind regards
>
> Georg
>
>
>
>
> Von:Duncan Murdoch 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  13.10.2016 10:43
> Betreff:Re: [R] Visibility of libraries called from within
> functions
>
>
>
> On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote:
>> Hi All,
>>
>> in my R programs I use different libraries to work with Excel sheets, 
i.
>> e. xlsx, excel.link.
>>
>> When running chunks of code repeatedly and not always in the order the
>> program should run for development purposes I ran into trouble. There
> were
>> conflicts between the methods within these functions causing R to 
crash.
>>
>> I thought about defining functions for the different task and calling
> the
>> libraries locally to there functions. Doing this test
>>
>> -- cut --
>>
>> f_test <- function() {
>> library(xlsx)
>> cat("Loaded packages AFTER loading library")
>> print(search())
>> }
>>
>> cat("Loaded packages BEFORE function call 
")
>> search()
>>
>> f_test()
>>
>> cat("Loaded packages AFTER function call 
-")
>> search()
>>
>> -- cut --
>>
>> showed that the library "xlsx" was loaded into the global environment
> and
>> stayed there although I had expected R to unload the library when
> leaving
>> the function. Thus confilics can occur more often.
>>
>> I had a look into ?library and saw that there is no argument telling R
> to
>> hold the library in the calling environment.
>>
>> How can I load libraries locally to the calling functions?
>
> You can detach at the end of your function, but that's tricky to get
> right:  the package might have been on the search list before your
> function was called.  It's better not to touch the search list at all.
>
> The best solution is to use :: notation to get functions without putting
> them on the search list.  For example, use
>
> xlsx::write.xlsx(data, file)
>
> If you are not sure if your user has xlsx installed, you can use
> requireNamespace() to check.
>
> Duncan Murdoch
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach

Hi Duncan,

many thanks for your reply.

Your suggestion of using requireNamespace() together with explicit 
namespace calling using the "::" operator is what I was looking for:

-- cut --

f_test <- function() {
requireNamespace("openxlsx")
cat("Loaded packages AFTER loading library")
print(search())
xlsx::read.xlsx(file = "c:/temp/test.xlsx",
sheetName = "test")
}

cat("Loaded packages BEFORE function call ")
search()

f_test()

cat("Loaded packages AFTER function call -")
search()

-- cut  --

When reading ?requireNamespace I did not really get how R operates behind 
the scenes.

Using "library" attaches the namespace to the search path. Using 
"requireNamespace" does not do that.

But how does R find the namespace then? What kind of list or directory 
used R to to store the namespace and lookup the correct function or 
methods of this namespace?

Kind regards

Georg

Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  13.10.2016 10:43
Betreff:Re: [R] Visibility of libraries called from within 
functions

On 13/10/2016 4:18 AM, g.maub...@weinwolf.de wrote:
> Hi All,
>
> in my R programs I use different libraries to work with Excel sheets, i.
> e. xlsx, excel.link.
>
> When running chunks of code repeatedly and not always in the order the
> program should run for development purposes I ran into trouble. There 
were
> conflicts between the methods within these functions causing R to crash.
>
> I thought about defining functions for the different task and calling 
the
> libraries locally to there functions. Doing this test
>
> -- cut --
>
> f_test <- function() {
> library(xlsx)
> cat("Loaded packages AFTER loading library")
> print(search())
> }
>
> cat("Loaded packages BEFORE function call ")
> search()
>
> f_test()
>
> cat("Loaded packages AFTER function call -")
> search()
>
> -- cut --
>
> showed that the library "xlsx" was loaded into the global environment 
and
> stayed there although I had expected R to unload the library when 
leaving
> the function. Thus confilics can occur more often.
>
> I had a look into ?library and saw that there is no argument telling R 
to
> hold the library in the calling environment.
>
> How can I load libraries locally to the calling functions?

You can detach at the end of your function, but that's tricky to get 
right:  the package might have been on the search list before your 
function was called.  It's better not to touch the search list at all.

The best solution is to use :: notation to get functions without putting 
them on the search list.  For example, use

xlsx::write.xlsx(data, file)

If you are not sure if your user has xlsx installed, you can use 
requireNamespace() to check.

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Visibility of libraries called from within functions

2016-10-13 Thread G . Maubach

Hi All,

in my R programs I use different libraries to work with Excel sheets, i. 
e. xlsx, excel.link.

When running chunks of code repeatedly and not always in the order the 
program should run for development purposes I ran into trouble. There were 
conflicts between the methods within these functions causing R to crash.

I thought about defining functions for the different task and calling the 
libraries locally to there functions. Doing this test

-- cut --

f_test <- function() {
library(xlsx)
cat("Loaded packages AFTER loading library")
print(search())
}

cat("Loaded packages BEFORE function call ")
search()

f_test()

cat("Loaded packages AFTER function call -")
search()

-- cut --

showed that the library "xlsx" was loaded into the global environment and 
stayed there although I had expected R to unload the library when leaving 
the function. Thus confilics can occur more often.

I had a look into ?library and saw that there is no argument telling R to 
hold the library in the calling environment.

How can I load libraries locally to the calling functions?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Documenting a function using roxygen2

2016-10-11 Thread G . Maubach

Hi All,

I began to document my functions using roxygen2. This is an example of a 
function I would like to write for training and testing purposes:

t_simple_table <- function(variable,
   useNA = TRUE,
   print = FALSE) {
#' @title Create a simple table for one variable.
#'
#' @description t_simple_table() creates absolute and relative 
#' frequencies, cumulative sums and column sums for both as well as
#' overall statistics about valid N and missing values.
#' 
#' 
#' @param variable (vector, list, data.frame): variable the table is
#' created for.
#' @param useNA (logical): flag to include or exclude missing values
#' from the computation.
#' @param print (logical): flag to print/not print a table before
#' returning it as an object.
#' 
#' @operation
#' Coerces the given variable to a factor.
#' If useNA = TRUE NA is also transformed to a valid value,
#' if useNA = FALSE it is disregarded in all operations.
#' 
#' @return Returns a table with the following statistics:
#' 
#'  Frequencies   Percent   Cumulative
#' Percent
#' Valid . .
#' Missing   . .
#' Total .   100
#' Categories
#'   Cat 1   . ..
#'   Cat 2   . ..
#'   Cat 3   . ..
#'   ... . .  100
#'   Total   .   100
#'
#' @errorhandling None
#' 
#' @version "0.1"
#' 
#' @created "2016-10-11"
#' @updated "2016-10-11"
#' 
#' @status development
#'
#' @see Manderscheid: Sozialwissenschaftliche Datenanalyse mit R, 
#' p. 79ff
#'
#' @author Georg
#'
#' @license GPL-2
 
# function body to be defined

}

Is this a correct header for a function?

How could I do better?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: How to plot a bunch of dichotomous code variables in one plot using ggplot2

2016-10-05 Thread G . Maubach

Hi Bob,
Hi John,
Hi readers,

many thanks for your reply.

I did

barplot(colSums(dataset %>% select(FirstVar:LastVar)))

and it worked fine.

How would I do it with ggplot2?

Kind regards

Georg




Von:"Fox, John" 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  05.10.2016 15:01
Betreff:RE: [R] How to plot a bunch of dichotomous code variables 
in one plot using ggplot2



Dear Georg,

How about barplot(colSums(ds)) ?

Best,
 John

-
John Fox, Professor
McMaster University
Hamilton, Ontario
Canada L8S 4M4
Web: socserv.mcmaster.ca/jfox


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: October 5, 2016 8:47 AM
> To: r-help@r-project.org
> Subject: [R] How to plot a bunch of dichotomous code variables in one 
plot
> using ggplot2
> 
> Hi All,
> 
> I have a bunch of dichotomous code variables which shall be plotted in 
one
> graph using one of their values, this is "1" in this case.
> 
> The dataset looks like this:
> 
> -- cut --
> var1 <- c(1,0,1,0,0,1,1,1,0,1)
> var2 <- c(0,1,1,1,1,0,0,0,0,0)
> var3 <- c(1,1,1,1,1,1,1,1,0,1)
> 
> ds <- data.frame(var1, var2, var3)
> -- cut --
> 
> I would like to have a bar plot like this
> 
> 
> 
>   *
>   *
>   *
>   *
> * *
> * *
> *  *  *
> *  *  *
> *  *  *
> *  *  *
> -
> var1  var2   var3
> 
> If this possible in R? If so, how can I achieve this?
> 
> Kind regards
> 
> Georg
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to plot a bunch of dichotomous code variables in one plot using ggplot2

2016-10-05 Thread G . Maubach

Hi All,

I have a bunch of dichotomous code variables which shall be plotted in one 
graph using one of their values, this is "1" in this case.

The dataset looks like this:

-- cut --
var1 <- c(1,0,1,0,0,1,1,1,0,1)
var2 <- c(0,1,1,1,1,0,0,0,0,0)
var3 <- c(1,1,1,1,1,1,1,1,0,1)

ds <- data.frame(var1, var2, var3)
-- cut --

I would like to have a bar plot like this



  *
  *
  *
  *
* *
* *
*  *  *
*  *  *
*  *  *
*  *  *
-
var1  var2   var3

If this possible in R? If so, how can I achieve this?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Putting a bunch of Excel files as data.frames into a list fails

2016-09-28 Thread G . Maubach

Hi All,

I need to read a bunch of Excel files and store them in R.

I decided to store the different Excel files in data.frames in a named 
list where the names are the file names of each file (and that is 
different from the sources as far as I can see):

-- cut --
# Sources:
# - 
http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r
# - 
http://stackoverflow.com/questions/9564489/opening-all-files-in-a-folder-and-applying-a-function
# - 
http://stackoverflow.com/questions/12945687/how-to-read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frame-e

v_file_path <- "H:/2016/Analysen/Neukunden/Input"
v_file_pattern <- "*.xlsx"

v_files <- list.files(path = v_file_path,
  pattern = v_file_pattern,
  ignore.case = TRUE)
print(v_files)

v_list_of_files <- list()

for (v_file in v_files) {
  v_list_of_files[v_file] <- openxlsx::read.xlsx(
file.path(v_file_path,
  v_file))
}

This code does not work cause it stores only the first variable of each 
Excel file in a named list.

What do I need to change to get it running?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Iteration over variables

2016-09-06 Thread G . Maubach

Hi All,

I would like to write a program that iterates over a set of dynamically 
generated variables and produces some stats or prints parts of the data.

# --- data
v_turnover_2011 <- c(10, 20, 30, 40 , 50)
v_customer_2011 <- c(0, 1, NA, 0, 1)
v_turnover_2012 <- c(10, 20, 30, 40 , 50)
v_customer_2012 <- c(0, 1, NA, 0, 1)
d_dataset <- data.frame(v_turnover_2011, v_turnover_2012,
v_customer_2011, v_customer_2012)

# -- Aim is to iterate over dynamically generated variables and compute
# -- statistics or print parts of the data

# -- Does not produce any output
for (year in 2011:2012) {
  head(d_dataset[, c(paste0("v_turnover_", year),
 paste0("v_customer_", year))])
}

# -- Does not produce any output
aux_func <- function(year) {
  head(d_dataset[, c(paste0("v_turnover_", year),
 paste0("v_customer_", year))])
}

for (year in 2011:2012) {
  aux_func(year = year)
}


d_results <- data.frame()
for (year in 2011:2012) {
  d_results <- rbind(d_results,
 paste0("mean", year) = mean(d_dataset[, 
c(paste0("v_turnover_", year))]))
}

Is there a way to iterate over variables and compute statistics and print 
parts of the dataset?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installation of rJava fails

2016-08-17 Thread G . Maubach

Hi All,

I try to install RWeka on Debian GNU Linux 8 Jessie (uname -a: 3.16.0-4-amd64 
#1 SMP Debian 3.16.7-ckt25-2+deb8u3 (2016-07-02) x86_64) which has a dependency 
to "rJava".
I did

apt-get install openjdk-8-jre

which went OK.

Java is installed in:

/var/lib/dpkg/alternatives/java
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
/usr/lib/jvm/java-8-openjdk-amd64/bin/java
/etc/alternatives/java

When doing this

install.packages("rJava")

I get

* installing *source* package ‘rJava’ ...
** Paket ‘rJava’ erfolgreich entpackt und MD5 Summen überprüft

interpreter : '/usr/lib/jvm/default-java/jre/bin/java'
archiver: '/usr/lib/jvm/default-java/bin/jar'
compiler: '/usr/lib/jvm/default-java/bin/javac'
header prep.: '/usr/lib/jvm/default-java/bin/javah'
cpp flags   : '-I/usr/lib/jvm/default-java/include'
java libs   : '-L/usr/lib/jvm/default-java/jre/lib/amd64/server -ljvm'
checking whether Java run-time works... 
./configure: line 3736: /usr/lib/jvm/default-java/jre/bin/java: No such file or 
directory
no
configure: error: Java interpreter '/usr/lib/jvm/default-java/jre/bin/java' 
does not work
ERROR: configuration failed for package ‘rJava’
* removing ‘/usr/local/lib/R/site-library/rJava’
Warning in install.packages :
  installation of package ‘rJava’ had non-zero exit status

Do I need to use another Java version or installation? How do I tell 
install.packages() where my Java installation resides?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Accessing an object using a string

2016-08-16 Thread G . Maubach

Hi Greg 
and all others who replied to my question,

many thanks for all your answers and help. Currently I store all my 
objects in .GlobalEnv = Workspace. I am not yet familiar working with 
different environments nor did I see that this would be necessary for my 
analysis.

Could you explain why working with different environments would be 
helpful?

You suggested to read variables into lists rather than storing them in 
global variables. This sounds interesting. Could you provide an example of 
how to define and use this?

Kind regards

Georg

Von:Greg Snow <538...@gmail.com>
An: g.maub...@weinwolf.de, 
Kopie:  r-help 
Datum:  15.08.2016 20:33
Betreff:Re: [R] Accessing an object using a string

The names function is a primitive, which means that if it does not
already do what you want, it is generally not going to be easy to
coerce it to do it.

However, the names of an object are generally stored as an attribute
of that object, which can be accessed using the attr or attributes
functions.  If you change your code to not use the names function and
instead use attr or attributes to access the names then it should work
for you.

You may also want to consider changing your workflow to have your data
objects read into a list rather than global variables, then process
using lapply/sapply (this would require a change in how your data is
saved from your example, but if you can change that then everything
after can be cleaner/simpler/easier/more fool proof/etc.)

On Mon, Aug 15, 2016 at 2:49 AM,   wrote:
> Hi All,
>
> I would like to access an object using a sting.
>
> # Create example dataset
> var1 <- c(1, 2, 3)
> var2 <- c(4, 5, 6)
> data1 <- data.frame(var1, var2)
>
> var3 <- c(7, 8, 9)
> var4 <- c(10, 11, 12)
> data2 <- data.frame(var3, var4)
>
> save(file = "c:/temp/test.RData", list = c("data1", "data2"))
>
> # Define function
> t_load_dataset <- function(file_path,
>file_name) {
>   file_location <- file.path(file_path, file_name)
>
>   print(paste0('Loading ', file_location, " ..."))
>   cat("\n")
>
>   object_list <- load(file = file_location,
>   envir = .GlobalEnv)
>
>   print(paste(length(object_list), "dataset(s) loaded from",
> file_location))
>   cat("\n")
>
>   print("The following objects were loaded:")
>   print(object_list)
>   cat("\n")
>
>   for (i in object_list) {
> print(paste0("Object '", i, "' in '", file_name, "' contains:"))
> str(i)
> names(i)  # does not work
>   }
> }
>
> I have only the character vector object_list containing the names of the
> objects as strings. I would like to access the objects in object_list to
> be able to print the names of the variables within the object (usuallly 
a
> data frame).
>
> Is it possible to do this? How is it done?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Accessing an object using a string (SOLVED)

2016-08-15 Thread G . Maubach

Hi All,

I found the function get() which returns an object.

My whole function looks like this:

-- cut --

#---
# Module: t_load_dataset.R
# Author: Georg Maubach
# Date  : 2016-08-15
# Update: 2016-08-15
# Description   : Load dataset and print information on contents
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_module_name = "t_load_dataset"
t_version = "2016-08-15"
t_status = "released"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ", Status: ", t_status, 
")", "\n", "\n",
 "Copyright (C) Georg Maubach 2016

This software comes with ABSOLUTELY NO WARRANTY.", "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
# Switch t_do_test to TRUE to run test
t_do_test <- FALSE

# [ Function Defintion 
]
t_load_dataset <- function(file_path,
   file_name) {
  # Loads and RData file with all objects in it and prints information on 
its
  # contents
  #
  # Args:
  #  file_path (string):
  #String with path name.
  #  file_name (string):
  #String with file name.
  #
  # Operation:
  #   Loads the RData file with all its objects, stores the objects in the
  #   global environment .GlobalEnv and prints information about the 
objects.
  #
  # Usage:
  #   The function is designed to work only on data frames.
  #
  # Returns:
  #   Nothing, but stores loaded objects directly into the global 
environment.
  #
  # Error handling:
  #   None.
 
#-
 
  cat("--- [ t_load_dataset() ] 
--\n\n")
 
  file_location <- file.path(file_path, file_name)
 
  cat(paste0('Loading ', file_location, " ...\n\n"))
 
  dataset_list <- load(file = file_location,
   envir = .GlobalEnv)
 
  cat(paste0(
length(dataset_list),
" dataset(s) loaded:\n"))
  cat(dataset_list)
  cat("\n\n")

  for (dataset in dataset_list) {
cat(paste0("Dataset '", dataset, "' contains ",
nrow(get(dataset, envir = .GlobalEnv)),
" cases in ",
ncol(get(dataset, envir = .GlobalEnv)),
" variables:\n"))
cat(names(get(dataset, envir = .GlobalEnv)))
cat("\n\n")
  }
 
  cat("-- [ Done ] 
---\n\n")
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
 
# Example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
d_data1 <- data.frame(var1, var2)
 
var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
d_data2 <- data.frame(var3, var4)
 
# Save datasets
v_file_name <- "test_t_load_dataset.RData"
 
save(file = file.path(getwd(),
  v_file_name),
 list = c("d_data1", "d_data2"))
 
# Call function
t_load_dataset(file_path = getwd(), file_name = v_file_name)
 
# Cleanup
unlink(file.path(getwd(), v_file_name))
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_module_name", "t_version", "t_status", "t_do_test", "t_test")

# EOF

-- cut --

I will include it later the toolbox of R function on Sourceforge.net.

Kind regards

Georg




Von:g.maub...@weinwolf.de
An: r-help@r-project.org, 
Datum:  15.08.2016 10:51
Betreff:[R] Accessing an object using a string
Gesendet von:   "R-help" 



Hi All,

I would like to access an object using a sting.

# Create example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
data1 <- data.frame(var1, var2)

var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
data2 <- data.frame(var3, var4)

save(file = "c:/temp/test.RData", list = c("data1", "data2"))

# Define function
t_load_dataset <- function(file_path,
   file_name) {
  file_location <- file.path(file_path, file_name)
 
  print(paste0('Loading ', file_location, " ..."))
  cat("\n")
 
  object_list <- load(file = file_location,
  envir = .GlobalEnv)
 
  print(paste(length(object_list), "dataset(s) loaded from", 
file_location))
  cat("\n")
 
  print("The following objects were loaded:")
  print(object_list)
  cat("\n")
 
  for (i in object_list) {
print(paste0("Object '", i, "' in '", file_name, "' contains:"))
str(i)
names(i)  # does not work
  }
}

I have only the

[R] Accessing an object using a string

2016-08-15 Thread G . Maubach

Hi All,

I would like to access an object using a sting.

# Create example dataset
var1 <- c(1, 2, 3)
var2 <- c(4, 5, 6)
data1 <- data.frame(var1, var2)

var3 <- c(7, 8, 9)
var4 <- c(10, 11, 12)
data2 <- data.frame(var3, var4)

save(file = "c:/temp/test.RData", list = c("data1", "data2"))

# Define function
t_load_dataset <- function(file_path,
   file_name) {
  file_location <- file.path(file_path, file_name)
 
  print(paste0('Loading ', file_location, " ..."))
  cat("\n")
 
  object_list <- load(file = file_location,
  envir = .GlobalEnv)
 
  print(paste(length(object_list), "dataset(s) loaded from", 
file_location))
  cat("\n")
 
  print("The following objects were loaded:")
  print(object_list)
  cat("\n")
 
  for (i in object_list) {
print(paste0("Object '", i, "' in '", file_name, "' contains:"))
str(i)
names(i)  # does not work
  }
}

I have only the character vector object_list containing the names of the 
objects as strings. I would like to access the objects in object_list to 
be able to print the names of the variables within the object (usuallly a 
data frame).

Is it possible to do this? How is it done?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Re: Spread data.frame on 2 variables (SOLVED)

2016-08-02 Thread G . Maubach

Hi Ulrik,

many thanks for your help.

The problem was that R regards a dataset with a combination like

caseID  custID  channel unit
1   100010  10
2   100020  10
3   100020  30

as two diffrenet sets of cases: 1 set = case 1, 2 set = case 2 and 3 due 
to the different values of unit in case 3 value 30, althought all cases 
should be restructured based just on custID.

To get a dataset like

caseID  custID  channel -10 channel-20  unit-10 
unit-30
1   10001   1   1 1

instead of

caseID  custID  channel -10 channel-20  unit-10 
unit-30
1   10001   1   1 NA
2   1000NA  1   NA 1

I used the approach you suggested:

1. I created a subset of my data with the first variable to be 
restructured:

d_temp1 <- dataset[ , c("custID", "channel"))

2. I deleted all the cases the were dupliates

d_temp1 <- duplicated(d_temp1, c("custID", "channel")

3. I introduced a dummy variable delivering the values for the new 
variables created by dplyr:spread()

d_temp1$dummy <- 1 

4. Then I restructured the subset
d_temp1 <- dplyr::spread(d_temp1, key_variable = "channel", value = 
d_temp1$dummy)

5. I repeaed steps 1 to 4 with the other variable "unit" (instead of 
"channel") creating a new dataset named d_temp2.

6. I deleted the variables used for restructuring in steps 1 to 5 
"channel" and "unit" from the original dataset "dataset".

dataset$channel <- NULL
dataset$unit <- NULL

7. I checked if I still had duplicates

duplicates <- duplicated(dataset, key_variable = c("Debitor"))

sum(duplicates)  # was 0 it this time

8. I merged the datasets back together

dataset_2 <- merge(x = dataset, y = d_temp1, by.x = "Debitor", by.y = 
"Debitor", all.x = TRUE, all.y = TRUE)  # leaving out all.y would be fine
dataset_2 <- merge(x = dataset2, y = d_temp2, by.x = "Debitor", by.y = 
"Debitor", all.x = TRUE, all.y = TRUE)  # leaving out all.y would be fine

There might be a combination of commands and functions doing the same 
thing in one step but I find that this is clear, comprehensible and 
reproducable even at a later date or by other readers willing to use base 
R for their work.

Many thanks again for your help.

Kind regards

Georg





Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, R-help , 
Datum:  28.07.2016 14:20
Betreff:Re: Re: [R] Spread data.frame on 2 variables



Hi Georg,

it is difficult to figure out what happens between your expectation and 
the outcome if we cannot see a minimal dataset.

Based on your description I did this

library(tidyr)
library(dplyr)

test_df <- data_frame(channel = LETTERS[1:5], unit = letters[1:5], custID 
= c(1:5), dummy = 1)
test_df %>% spread(channel, dummy) %>% mutate(dummy = 1) %>% spread(unit, 
dummy) 

which seems to be working fine as I get wide data. If a combination is 
missing in the long form it will also be missing in the wide form. Maybe 
you are looking for something like this:

channel_wide <- test_df  %>% select(channel, custID) %>% spread(channel, 
custID) 
unit_wide <- test_df  %>% select(unit, custID) %>% spread(unit, custID) 
bind_cols(channel_wide, unit_wide)

Apologies for the HTML - it's gmail

Best wishes,
Ulrik

On Thu, 28 Jul 2016 at 13:54  wrote:
Hi Ulrik,

I have included a reproducable example. I ran the code and it did exactly
what I wanted to show you.

You are right: the solution shall merge cases in the end cause the values
on the variables are either missing or the same.

Example 1: Values are the same
If you look at 6 and 7 and variable 70 the value is 1 in both cases. This
is in this context the same information and cases 6 and 7 with custID can
be merged to 1 for variable 70.

Example 2: Values are missing and not missing
If you look at cases 8 and 9 the value for case 8 at variable 40, 50 and
2000 is missing whereas the variables 40, 50 and 2000 have all 1 for case
9. Case 8 and 9 could be merged together cause the missing values are
overwritten what is correct in this case.

The solution I am looking for is to transform the data from long into wide
form and keep all but missing value information.

Did I explain my problem in a comprehensible way? Are there any further
questions?

Kind regards

Georg





Von:Ulrik Stervbo 
An: g.maub...@weinwolf.de, r-help@r-project.org,
Datum:  28.07.2016 12:59
Betreff:Re: [R] Spread data.frame on 2 variables



Hi Georg,

it's hard to tell without a reproducible example.

Should spread really merge elements? Does spread know anything about
CustID? Maybe you need to make a useful key of the CustIDs first and
spread on that?

Maybe I'm all off, because I'm really just

[R] Spread data.frame on 2 variables

2016-07-28 Thread G . Maubach

Hi All,

I need to spread a data.frame on 2 variables, e. g. "channel" and "unit".

If I do it in two steps spreads keeps all cases that does not look like 
the one before although it contains the same values for a specific case.

Here is what I have right now:

-- cut --

test1$dummy <- 1
test2 <- spread(data = test1, key = 'channel', value = "dummy")
test2
cat("First spread is OK!")

test2$dummy <- 1
test3 <- spread(data = test2, key = 'unit', value = 'dummy')

test1
# test2
test3
warning(paste0("Second spread is not OK cause spread does not merge 
cases\n",
   "with CustID 700 and 800 into one case,\n",
   "cause they have values on different variables,\n",
   "although the corresponding values of the cases with",
   "custID 700 and 800 are missing."))

cat("What I would like to have is:\n")
target4 <- structure(list(custID = c(100, 200, 300, 500, 600, 700, 800, 
900),
  `10` = c(1, NA, NA, NA, NA, NA, NA, NA),
  `20` = c(1, NA, NA, NA, NA, NA, NA, NA), 
  `30` = c(NA, NA, NA, NA, NA, NA, 1, 1),
  `40` = c(NA, NA, NA, NA, 1, NA, 1, 1),
  `50` = c(NA, NA, 1, NA, NA, NA, 1, 1), 
  `60` = c(NA, NA, NA, NA, NA, 1, NA, NA),
  `70` = c(NA, NA, NA, NA, NA, 1, NA, NA), 
  `99` = c(NA, 1, NA, 1, NA, NA, NA, NA), 
  `1000` = c(1, NA, NA, NA, NA, NA, 1, 1), 
  `2000` = c(NA, NA, NA, NA, 1, 1, 1, NA),
  `3000` = c(NA, NA, 1, NA, NA, 1, NA, NA),
  `4000` = c(NA, NA, 1, NA, NA, NA, NA, NA),
  `6000` = c(NA, NA, NA, NA, 1, NA, NA, NA),
  `` = c(NA, 1, NA, 1, NA, NA, NA, NA)),
.Names = c("custID",
 "10",  "20",  "30",  "40",  "50",  "60",  "70",  "99", 
 "1000",  "2000",  "3000",  "4000",  "6000",  ""),
row.names = c(NA, 8L), class = "data.frame")

target4

cat("What would be a proper way to create target4 from test1?")

-- cut --

What would be the proper way to create target4 from test1?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error when installing packages

2016-07-26 Thread G . Maubach

Hi All,

I try to install packages on Debian GNU Linux 8 (Kernel 3.16.0-4-amd64).

My sessionInfo() is

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=de_DE.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=de_DE.UTF-8LC_COLLATE=de_DE.UTF-8
 [5] LC_MONETARY=de_DE.UTF-8LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.1

Installing the following packages

Warning in install.packages :
  packages ‘excel.link’, ‘installr’ are not available (for R version 3.3.1)
Warning in install.packages :
  dependencies ‘latticist’, ‘graph’, ‘RBGL’, ‘pkgDepTools’, ‘Rgraphviz’ are not 
available
also installing the dependencies ‘RCurl’, ‘RWekajars’

results in the following messages:

(1)
* installing *source* package ‘RCurl’ ...
checking for curl-config... no
Cannot find curl-config

(2)
* installing *source* package ‘RWekajars’ ...
./configure: 1: ./configure: /usr/lib/jvm/default-java/jre/bin/java: not found
./configure: 50: test: -ge: unexpected operator
./configure: 51: test: -eq: unexpected operator
Need at least Java version 1.6/6.0.
ERROR: configuration failed for package ‘RWekajars’

Annotation: I have openjdk-8-jre installed.

(3)
* installing *source* package ‘cairoDevice’ ...
ERROR: gtk+2. not found by pkg-config.
ERROR: configuration failed for package ‘cairoDevice’

(4)
* installing *source* package ‘rgdal’ ...
configure: CC: gcc -std=gnu99
configure: CXX: g++
configure: rgdal: 1.1-10
checking for /usr/bin/svnversion... no
configure: svn revision: 622
checking for gdal-config... no
no
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for package ‘rgdal’

(5)
* installing *source* package ‘rgeos’ ...
configure: CC: gcc -std=gnu99
configure: CXX: g++
configure: rgeos: 0.3-19
checking for /usr/bin/svnversion... no
configure: svn revision: 524
checking for geos-config... no
no
configure: error: geos-config not found or not executable.
ERROR: configuration failed for package ‘rgeos’

... and much more.

Do all these error messages have something in common?

How could I fix the installation?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R Toolbox (Release 2 of 2016-07-21)

2016-07-21 Thread G . Maubach

Hi All,

I have uploaded a new release of the R Toolbox.

R Toolbox is a collection of simple but useful functions which I developed 
for myself to shorten the develoment process. Currently all functions use 
base R. No other packages are needed. One exception is "t_openxlsx" cause 
this module deals explicitly with the openxlsx package.

It is simple to install the functions. Just copy them to an appropriety 
place on your hard disk and adjust the variable "t_toolbox_location" to 
the place you stored the toolbox in. Running "r_toolbox.R" from that 
location will load all modules.

In addition to new functions (see Release Comparison below) some functions 
were improved. The are called with their package names, e. g. 
openxlsx::read.xlsx() instead of "read.xlsx()". This way confusion with 
functions having the same name but comming from other packages is avoided.

Pleae be aware that I have include some not tested function in this 
release. All modules have a variable "t_status" now, stating the 
development status, e. g. "development", "testing", "release". 

Here is a Releae Comparison:

-- cut --

release_comparison <-
   structure(list(Module = c("r_toolbox.R", "t_adjust_packages.R", 
  "t_conventions.r", 
"t_create_variable.R", "t_definitions.R", 
  "t_find_originals_and_duplicates.R", 
"t_get_factor_levels.R", 
  "t_merge_variables.R", "t_n_miss.R", 
"t_n_valid.R", "t_openxlsx_shortcuts.r", 
  "t_rename_variables.R", 
"t_replace_na.R", "t_report_memory.R", 
  "t_select_vars_by_type.R"), Release1 = 
c(TRUE, FALSE, FALSE, 
 FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, 
 FALSE, FALSE), Release2 = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
 TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE)), .Names = c("Module", 
   "Release1", "Release2"), row.names = c(NA, 15L), 
class = "data.frame")
edit(release_comparison)

-- cut ---

Release 1 is of 2016-05-31, Releae 2 of 2016-06-21.

You can download the toolbox from

https://sourceforge.net/projects/r-project-utilities/

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Choropleth: Turnover by ZipCode

2016-07-11 Thread G . Maubach

Hi All,
Dear Readers,

I need to create a choropleth graph with turnover by zipcode. This is what 
I have so far:

# Not run (Begin)
# Install packages if needed
# install.packages(pkgs = c("maptools", "rgdal", "RColorBrewer", 
"grDevices"))
# Not run (End)

# Load libraries
library(maptools); library(rgdal); library(RColorBrewer); 
library(grDevices)

# Configuration
# Adjust if needed!
file_path <- file.path("C:", "temp")

# Read data 
# Source: http://arnulf.us/PLZ
url <- "http://www.metaspatial.net/download/plz.tar.gz;
file_name_gzip <- basename(url)
file_name_extract <- "post_pl.shp"

download.file(url, file.path(file_path, file_name_gzip))

untar(tarfile = file.path(file_path, file_name_gzip),
  compressed = "gzip",
  exdir = file_path)

# Dataset
# I have the data for all zipcodes available in my region
ds_temp <-
  structure(
list(
  ZipCode = c(1099, 10178, 13125, 21406, 32429, 41569),
  Sales = c(4, 2, 9, 5, 7, 3),
  Revenue = c(12, 9, 100, 80, 90,
  25)
),
.Names = c("ZipCode", "Sales", "Revenue"),
row.names = c(NA,
  6L),
class = "data.frame"
  )
print(ds_temp)

# Prepare graphic
file_name_pdf <- file.path(file_path, "sales-and-revenue-by-zipcodes.pdf")
cairo_pdf(bg = "grey98", file_name_pdf, width = 16, height = 9)

y <- readShapeSpatial(file.path(file_path, file_name_extract),
  proj4string = CRS("+proj=longlat"))
x <- spTransform(y,CRS=CRS("+proj=merc"))

# How do I need to change this line?
# Needs to be replaced by turnover from ds_temp
color <- sample(1:7, length(x), replace=T) 

# Create graphic
plot(x, 
 col = brewer.pal(7, "Oranges")[color],
 border = F)  # How to I tell R to plot turnover from ds_temp?

# Title
mtext(
  "Turnover by Zipcodes",
  side = 3,
  line = -4,
  adj = 0,
  cex = 1.7
)

# Write to disc
dev.off()

# Cleanup
rm("ds_temp", "color", "file_name_extract",
   "file_name_gzip", "file_name_pdf", "file_path",
   "url", "x", "y")
unlink(file.path(file_path, "plz.tar.gz"))
unlink(file.path(file_path, "post_pl.dbf"))
unlink(file.path(file_path, "post_pl.shp"))
unlink(file.path(file_path, "post_pl.shx"))

# unlink(file.path(file_path, "sales-and-revenue-by-zipcodes.pdf"))

What do I need to do to color the amount of turnover or the frequencies of 
sales from the ds_temp dataset in the graph?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Formatting ggplot2 graph

2016-07-06 Thread G . Maubach

Hi All,

my current code looks lke this:

freq_ls <- structure(list(Var1 = c("zldkkd", "aakdkdk", 
   "aaakdkd", "aaieiwo", "vöalsl", 
"ssddkdk", 
   "glowowp", "laoiw", "ruklow", 
"rolsl", 
   "delk", "inslvnz"), Anzahl = c(1772L, 
761L, 
 536L, 317L, 197L, 160L, 30L, 20L, 10L, 6L, 6L, 1L), Prozent = c(46.4, 
 19.9, 14, 
8.3, 5.2, 4.2, 0.8, 0.5, 0.3, 0.2, 0.2, 0)), .Names = c("Var1", 
  "Anzahl", 
"Prozent"), class = c("tbl_df", "data.frame"), row.names = c(NA, 
 -12L))
ggplot(freq_ls) +
  geom_bar(aes(x = Var1,
   y = Anzahl),
   stat = "identity",
   fill = "gray") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Title of the Plot")

I would like to add the abolute and relative frequencies on top of the 
bars. In addition I want the values printed in descending ording according 
to the data.

I searched the web and found:

geom_text(stat='bin',aes(label=..count..),vjust=-1)

(Source: 
http://stackoverflow.com/questions/26553526/how-to-add-frequency-count-labels-to-the-bars-in-a-bar-graph-using-ggplot2
)

but this does not work in my case. Inserting the code

ggplot(freq_ls) +
  geom_bar(aes(x = Var1,
   y = Anzahl),
   stat = "identity",
   fill = "gray") +
  geom_text(stat='bin',aes(label=..count..),vjust=-1) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  ggtitle("Title of the Plot")

results in

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: Removed 1 rows containing missing values (geom_text). 


I looked in the book Wickhan: ggplot2 but could find an answer to the 
question:

- How to show number if tey are pre-calculated?
- How to sort the bars according to the sequence of values in descending 
order or if - pre-ordered - in the given order?

What do I have to change in my code to do it?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] WG: Fw: Re: dplyr : row total for all groups in dplyr summarise

2016-07-06 Thread G . Maubach

Hi All,

if I run the suggested code

mtcars %>%
  group_by (am, gear) %>%
  summarise (n = n()) %>%
  mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>%
  ungroup() %>%
  plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq =
  "100%”))

I get

> mtcars %>%
+   group_by (am, gear) %>%
+   summarise (n = n()) %>%
+   mutate(rel.freq = paste0(round(100 * n / sum(n), 0), "%")) %>%
+   ungroup() %>%
+   plyr::rbind.fill(data.frame(n = nrow(mtcars), rel.freq =
+   "100%”))




+ 


R stops execution cause something within the prgram syntax is missing.

What has to be changed to be able to run the code?

Kind regards

Georg Maubach


> Gesendet: Dienstag, 05. Juli 2016 um 18:30 Uhr
> Von: "David Winsemius" 
> An: mai...@infomed.sld.cu
> Cc: r-help@r-project.org
> Betreff: Re: [R] dplyr : row total for all groups in dplyr summarise
>
> 
> 
> mtcars %>%
>group_by (am, gear) %>%
>summarise (n=n()) %>%
>mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>ungroup() %>% plyr::rbind.fill(data.frame( 
n=nrow(mtcars),rel.freq="100%”))
> 
> 
> > On Jul 5, 2016, at 4:47 AM, mai...@infomed.sld.cu wrote:
> > 
> > Sorry, what I wanted to do was to add a total row at the end of the 
summary. The marginal totals by columns correspond to 100% and the sum of 
levels.
> > best reagard
> > Maicel Monzon
> > 
> > 
> > Ulrik Stervbo  escribió:
> > 
> >> Yes. But in the sample code the data is summarised. In which case you 
get 4
> >> rows and not the correct 32.
> >> 
> >> On Tue, 5 Jul 2016, 07:48 David Winsemius,  
wrote:
> >> 
> >>> nrow(mtcars)
> >>> 
> >>> 
> >>> Sent from my iPhone
> >>> 
> >>> On Jul 4, 2016, at 9:03 PM, Ulrik Stervbo  
wrote:
> >>> 
> >>> That will give you the wrong result when used on summarised data
> >>> 
> >>> David Winsemius  schrieb am Di., 5. Juli 
2016
> >>> 02:10:
> >>> 
>  I thought there was an nrow() function?
>  
>  Sent from my iPhone
>  
>  On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo 
>  wrote:
>  
>  If you want the total number of rows in the original data.frame 
after
>  counting the rows in each group, you can ungroup and sum the row 
counts,
>  like:
>  
>  library("dplyr")
>  
>  
>  mtcars %>%
>    group_by (am, gear) %>%
>    summarise (n=n()) %>%
>    mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>    ungroup() %>%
>    mutate(row.tot = sum(n))
>  
>  HTH
>  Ulrik
>  
>  On Mon, 4 Jul 2016 at 18:23 David Winsemius 

>  wrote:
>  
> > 
> > > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote:
> > >
> > > Hello,
> > > How can I aggregate row total for all groups in dplyr summarise 
?
> > 
> > Row total ? of what? Aggregate ? how? What is the desired answer?
> > 
> > 
> > 
> > > library(dplyr)
> > > mtcars %>%
> > >  group_by (am, gear) %>%
> > >  summarise (n=n()) %>%
> > >  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
> > >
> > > best regard
> > > Maicel Monzon
> > >
> > >
> > >
> > > 
> > >
> > >
> > >
> > >
> > > --
> > > Este mensaje le ha llegado mediante el servicio de correo 
electronico
> > que ofrece Infomed para respaldar el cumplimiento de las misiones 
del
> > Sistema Nacional de Salud. La persona que envia este correo asume 
el
> > compromiso de usar el servicio a tales fines y cumplir con las 
regulaciones
> > establecidas
> > >
> > > Infomed: http://www.sld.cu/
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, 
see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible 
code.
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>  
>  
> >> 
> > 
> > 
> > 
> > 
> > This message was sent using IMP, the Internet Messaging Program.
> > 
> > 
> > 
> > --
> > Este mensaje le ha llegado mediante el servicio de correo electronico 
que ofrece Infomed para respaldar el cumplimiento de las misiones del 
Sistema Nacional de Salud. La

[R] Antwort: Re: dplyr : row total for all groups in dplyr summarise

2016-07-05 Thread G . Maubach

Hi guys,

I checked out your example but I can't follow the results.:

> mtcars %>%
+   group_by (am, gear) %>%
+   summarise (n=n()) %>%
+   mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
+   ungroup() %>%
+   mutate(row.tot = sum(n))
Source: local data frame [4 x 5]

 am  gear n rel.freq row.tot
  (dbl) (dbl) (int)(chr)   (int)
1 0 315  79%  32
2 0 4 4  21%  32
3 1 4 8  62%  32
4 1 5 5  38%  32

We have a total of 32 cases and 15 * 100 / 32 = 48,9 % instead of 79 %. 
The same with the other columns. How is 79 % calculated?

When searching the web I saw this example:

-- cut --

#-- not run --
url <- "http://www.lock5stat.com/datasets/HollywoodMovies2011.csv;
response <- GET(url)
Hollywoodmovies2011 <- content(x = GET(url), as = data.frame)
#-- end not run

Hollywoodmovies2011 %>% 
  group_by(genre) %>%
  summarize(count = n()) %>%
  mutate(rf = count / sum(count))

-- cut --

which gives

Source: local data frame [9 x 3]

  Genre count   %
 (fctr) (int)   (dbl)
1Action32 0.235294118
2 Adventure 1 0.007352941
3 Animation12 0.088235294
4Comedy27 0.198529412
5 Drama21 0.154411765
6   Fantasy 2 0.014705882
7Horror17 0.12500
8   Romance11 0.080882353
9  Thriller13 0.095588235

Here the % correspond to the count and the sum of count, e. g. sum = 136 
and 32 / 136 = 0,2352941.

What is the difference when counting? What do the relative counts in the 
first example mean?

Kind regards

Georg





Von:Ulrik Stervbo 
An: David Winsemius , 
Kopie:  r-help@r-project.org, mai...@infomed.sld.cu
Datum:  05.07.2016 06:06
Betreff:Re: [R] dplyr : row total for all groups in dplyr 
summarise
Gesendet von:   "R-help" 



That will give you the wrong result when used on summarised data

David Winsemius  schrieb am Di., 5. Juli 2016 
02:10:

> I thought there was an nrow() function?
>
> Sent from my iPhone
>
> On Jul 4, 2016, at 9:59 AM, Ulrik Stervbo  
wrote:
>
> If you want the total number of rows in the original data.frame after
> counting the rows in each group, you can ungroup and sum the row counts,
> like:
>
> library("dplyr")
>
>
> mtcars %>%
>group_by (am, gear) %>%
>summarise (n=n()) %>%
>mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%")) %>%
>ungroup() %>%
>mutate(row.tot = sum(n))
>
> HTH
> Ulrik
>
> On Mon, 4 Jul 2016 at 18:23 David Winsemius 
> wrote:
>
>>
>> > On Jul 4, 2016, at 6:56 AM, mai...@infomed.sld.cu wrote:
>> >
>> > Hello,
>> > How can I aggregate row total for all groups in dplyr summarise ?
>>
>> Row total … of what? Aggregate … how? What is the desired answer?
>>
>>
>>
>> > library(dplyr)
>> > mtcars %>%
>> >  group_by (am, gear) %>%
>> >  summarise (n=n()) %>%
>> >  mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))
>> >
>> > best regard
>> > Maicel Monzon
>> >
>> >
>> >
>> > 
>> >
>> >
>> >
>> >
>> > --
>> > Este mensaje le ha llegado mediante el servicio de correo electronico
>> que ofrece Infomed para respaldar el cumplimiento de las misiones del
>> Sistema Nacional de Salud. La persona que envia este correo asume el
>> compromiso de usar el servicio a tales fines y cumplir con las 
regulaciones
>> establecidas
>> >
>> > Infomed: http://www.sld.cu/
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Dump of new Methods (SOLVED)

2016-07-04 Thread G . Maubach

Hi Bert,

many thanks.

Found them.

Kind regards

Georg




Von:Bert Gunter 
An: g.maub...@weinwolf.de, 
Datum:  04.07.2016 16:43
Betreff:Re: [R] Dump of new Methods



?getwd

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jul 4, 2016 at 1:34 AM,   wrote:
> Dear Readers,
> Hi All,
>
> to drive my R knowlegde a bit further I followed the advice of some of 
you
> by reading Chambers: Programming with data.
>
> I tried some examples from the book:
>
> -- cut --
>
> setClass("track", representation (x = "numeric",
> y = "numeric"))
>
> track <- function(x, y) {
>   # an object representing measurements 'y', tracked at positions 'x'
>   x <- as(x, "numeric")
>   y <- as(y, "numeric")
>   if(length(x) != length(y)) {
> stop("x, y should have equal length!")
>   }
>   new("track", x = x, y = y)
> }
>
> dumpMethod("track", "track")
>
> setMethod("show", "track",
>   function(object) {
> xy = rbind(object@x, object@y)
> dimanmes(xy) = list(c("x", "y"),
> 1:ncol(y))
> show(xy)
>   })
>
> setMethod("plot",
>   signature(x = "track", y = "missing"),
>   function(x, y, ...)
> plot(unclass(x), xlab = "Position", ylab = "Value", ...)
>   )
>
> dumpMethod("plot", "track")
>
> -- cut --
>
> Where do I find the dumped data? Is it in a single file or is every dump
> stored in a separate file? Where is it stored on my drive?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Dump of new Methods

2016-07-04 Thread G . Maubach

Dear Readers,
Hi All,

to drive my R knowlegde a bit further I followed the advice of some of you 
by reading Chambers: Programming with data.

I tried some examples from the book:

-- cut --

setClass("track", representation (x = "numeric",
y = "numeric"))

track <- function(x, y) {
  # an object representing measurements 'y', tracked at positions 'x'
  x <- as(x, "numeric")
  y <- as(y, "numeric")
  if(length(x) != length(y)) {
stop("x, y should have equal length!")
  }
  new("track", x = x, y = y)
}

dumpMethod("track", "track")

setMethod("show", "track",
  function(object) {
xy = rbind(object@x, object@y)
dimanmes(xy) = list(c("x", "y"),
1:ncol(y))
show(xy)
  })

setMethod("plot",
  signature(x = "track", y = "missing"),
  function(x, y, ...)
plot(unclass(x), xlab = "Position", ylab = "Value", ...)
  )

dumpMethod("plot", "track")

-- cut --

Where do I find the dumped data? Is it in a single file or is every dump 
stored in a separate file? Where is it stored on my drive?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Documenting data

2016-06-30 Thread G . Maubach

Hi Bert,
Hi Readers,

I did not know much about attributes in R and how to use them. If it is that 
flexible you are right and I have learnt something.

Kind regards

Georg

> Gesendet: Donnerstag, 30. Juni 2016 um 20:06 Uhr
> Von: "Bert Gunter" 
> An: g.maub...@gmx.de
> Cc: "Pito Salas" , "R Help" 
> Betreff: Re: [R] Documenting data
>
> I believe Georg's pronouncements are wrong. See inline below.
> 
> -- Bert
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> "...
> 
> > Within R there are some limitations for storing the informtation about what 
> > a variable or a value within a variable means.
> 
> That is FALSE. There are no limitations. For example, just attach a
> "doc" attribute to your data that says whatever you wish to about
> them. e.g.
> 
> > somedata <- runif(10)
> > attr(somedata,"doc") <- "Anything you want to say about the data"
> 
> > attr(somedata,"doc")
> [1] "Anything you want to say about the data"
> 
> 
> You can go as crazy as you want to with this, e.g. creating a (S3 or
> S4 )class "documented" with appropriate methods for printing it from
> classes that inherit from data frames, lists, etc. See also the
> roxygen2 package for data documentation and R's ?promptData function
> for data documentation file in Rd format.
> 
> R is Turing complete -- so it can do anything any other programming
> language can do. You could program SAS in R if you wanted. The
> difference is that SAS has pre-programmed some capabilities that R
> leaves for users, including contributed packages -- like Sweave,
> knitr, etc.  You may or may not like this extra flexibility (and extra
> work, depending on whether someone else has already done the work for
> you), and efficiency may or may not be an issue; but to say that R has
> "limitations" is a gross misrepresentation, imho.
> 
> 
> 
> Possibilities to store this information is in other software packages
> like SAS or SPSS much broader implemented. In R you can work with
> meaningful variable names and the data type/class factor which can
> store mappings between values and value descriptions.
> >
> > Example
> > -- cut --
> > var1 <- c(rep(1:5, 3))
> > ds_example <- data.frame(var1)
> >
> > var1_labels <- c("1 = Strongly Agree",
> > "2 = Agree",
> > "3 = Neither agree/nor disagree",
> > "4 = Disagree",
> > "5 = Strongly disagree")
> >
> > ds_example[["var1"]] <- factor(ds_example[["var1"]],
> >levels = c(1, 2, 3, 4, 5),
> >labels = var1_labels)
> >
> > summary(ds_example["var1"])
> > -- cut --
> >
> > In addition you find methods to work with variable labels and value labels 
> > in the pacakges Hmisc and memisc. They can also produce a thing called 
> > codebook which contains all variable names, variable labels, values, value 
> > labels and summaries of the distribution of values within the variables.
> >
> > 3. In addition to this you could structure your script in a modular way 
> > according to the analysis process, e. g.
> > importing, cleaning, preparation for analysis, analysis, reporting. Other 
> > structure may be more sufficient in your case. These modules could have a 
> > number in the file name indicating in which sequence the scripts should be 
> > run.
> >
> > 4. I find it valuable to use a software repository like Github, Sourceforge 
> > or others to keep the revisions save and seucre in case you would like to 
> > go back to a version with code you deleted before and figure out that you 
> > need it now again. The R Studio IDE has an interface to git if you like to 
> > go with that. Good commit message can help you track what has changed. 
> > Commits also help you to prepare precise steps when developing your scripts.
> >
> > 5. I have no experience with Sweave or knitr but you could also compile a 
> > simple documentation through copying comments to an Excel sheet using 
> > R-2-Excel libraries like excel.link or others.
> >
> > Example
> > install.packages("excel.link")
> > library(excel.link)
> > xlc["A1"] <- "Project Documentation"
> > xlc["A2"] <- "Step XY"
> > xlc["A3"] <- "Some explanation about step xy"
> >
> > This way you have the documentation in your code and in an external source.
> >
> > Which approach you chose depends on your experience with R and its 
> > libraries as well as the size of your project and the need for 
> > documentation.
> >
> > 6. It can be helpful to store interim results in a format that can be read 
> > by non-R-users, e. g. Excel.
> >
> > 7. Documenting code can be done using roxygen2.
> >
> > If there are different opinions to my suggestions please say so.
> >
> > Kind regards
> >
> > Georg
> >
> >
> >> Gesendet: Donnerstag, 30. Juni 2016 um 16:51

Re: [R] Documenting data

2016-06-30 Thread G . Maubach

Hi Pito,
Dear Readers,

as other have already mentioned, there are good practices for documenting code 
and data. I would like to summarize them and add a few not mentioned earlier:

1. You should have always two things: your raw data and your R script/s. The 
raw data is immutable whereas the R script/s produce the results.

2. You might want to distinguish between documentating your CODE and 
documenting your DATA. Documenting code is similar to what you already know 
from your programmng experiences. Documenting data is somewhat different cause 
you store information about the meaning of you data directly in your data.

Example
You have a variable with codes ranging from 1 to 5. But what do they mean? 
Perhaps it could be

1 = Strongly agree
2 = Agree
3 = Neither agree/nor disagree
4 = Disagree
5 = Strongly Disagree

But it could also be the other way round:

1 = Strongly Disagree
2 = Disagree
3 = Nether agree/nor disagree
4 = Agree
5 = Strongly Agree

What the codes in your variable means depends on the systems oder processes you 
derived your data from.

Within R there are some limitations for storing the informtation about what a 
variable or a value within a variable means. Possibilities to store this 
information is in other software packages like SAS or SPSS much broader 
implemented. In R you can work with meaningful variable names and the data 
type/class factor which can store mappings between values and value 
descriptions.

Example
-- cut --
var1 <- c(rep(1:5, 3))
ds_example <- data.frame(var1)

var1_labels <- c("1 = Strongly Agree",
"2 = Agree",
"3 = Neither agree/nor disagree",
"4 = Disagree",
"5 = Strongly disagree")

ds_example[["var1"]] <- factor(ds_example[["var1"]],
   levels = c(1, 2, 3, 4, 5),
   labels = var1_labels)

summary(ds_example["var1"])
-- cut --

In addition you find methods to work with variable labels and value labels in 
the pacakges Hmisc and memisc. They can also produce a thing called codebook 
which contains all variable names, variable labels, values, value labels and 
summaries of the distribution of values within the variables.

3. In addition to this you could structure your script in a modular way 
according to the analysis process, e. g. 
importing, cleaning, preparation for analysis, analysis, reporting. Other 
structure may be more sufficient in your case. These modules could have a 
number in the file name indicating in which sequence the scripts should be run.

4. I find it valuable to use a software repository like Github, Sourceforge or 
others to keep the revisions save and seucre in case you would like to go back 
to a version with code you deleted before and figure out that you need it now 
again. The R Studio IDE has an interface to git if you like to go with that. 
Good commit message can help you track what has changed. Commits also help you 
to prepare precise steps when developing your scripts.

5. I have no experience with Sweave or knitr but you could also compile a 
simple documentation through copying comments to an Excel sheet using R-2-Excel 
libraries like excel.link or others.

Example
install.packages("excel.link")
library(excel.link)
xlc["A1"] <- "Project Documentation"
xlc["A2"] <- "Step XY"
xlc["A3"] <- "Some explanation about step xy"

This way you have the documentation in your code and in an external source.

Which approach you chose depends on your experience with R and its libraries as 
well as the size of your project and the need for documentation.

6. It can be helpful to store interim results in a format that can be read by 
non-R-users, e. g. Excel.

7. Documenting code can be done using roxygen2.

If there are different opinions to my suggestions please say so.

Kind regards

Georg


> Gesendet: Donnerstag, 30. Juni 2016 um 16:51 Uhr
> Von: "Pito Salas" 
> An: r-help@r-project.org
> Betreff: [R] Documenting data
>
> I am studying statistics and using R in doing it. I come from software 
> development where we document everything we do.
> 
> As I “massage” my data, adding columns to a frame, computing on other data, 
> perhaps cleaning, I feel the need to document in detail what the meaning, or 
> background, or calculations, or whatever of the data is. After all it is now 
> derived from my raw data (which may have been well documented) but it is 
> “new.” 
> 
> Is this a real problem? Is there a “best practice” to address this?
> 
> Thanks!
> 
> Pito Salas
> Brandeis Computer Science
> Feldberg 131
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org

[R] Writing a formula to Excel

2016-06-30 Thread G . Maubach

Hi All,

I am using excel.link to work seemslessly with Excel.

In addition to values, like numbers and strings, I would like to insert a 
full operational formula into a cell.


xlc["G14"] <- print(paste("=G9*100/G6"), quote = FALSE)


The strings is put into the cell, but the cell is not evaluated. Thus the 
string is show as result of the computation.

If I open that cell b pressing "F2" or by double-clicking the cell and 
pressing RETURN will start the evaluation of the expession.


xlc["G14"] <- parse("=G9*100/G6") # does not run


How can I put a formula into Excel that is evaluated right away?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [RE OPENED]

2016-06-30 Thread G . Maubach

Hi Duncan,

I would not have changed the COMPILED_BY option unless I thought I have 
to.

In my "C:\R-Project\Rtools\mingw_32\bin" I have 

c++.exe
g++.exe
gcc.exe
i686-w64-mingw32-c++.exe
i686-w64-mingw32-g++.exe
i686-w64-mingw32-gcc-4.9.3.exe
i686-w64-mingw32-gcc.exe

In my "C:\R-Project\Rtools\mingw_64\bin" I have

c++.exe
cpp.exe
g++.exe
gcc.exe
x86_64-w64-mingw32-c++.exe
x86_64-w64-mingw32-g++.exe
x86_64-w64-mingw32-gcc-4.9.3.exe
x86_64-w64-mingw32-gcc.exe

Which one should I configure and use?

Kind regards

Georg




Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  29.06.2016 17:34
Betreff:Re: Antwort: Re: Antwort: Re: [R] Installing from source 
on Windows 7: tibble [SOLVED]



On 29/06/2016 10:48 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> indeed, I did not see the other part of your message.
>
> I did
>
> BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/
> COMPILED_BY = g++ # instead of gcc-4.9.3

I wouldn't change the COMPILED_BY; some packages use it to configure 
themselves for gcc-4.9.3, as opposed to the previous version gcc-4.6.3.

>
> in "C:\R-Project\R-3.3.0\etc\i386\Makeconf"
>
> and
>
> BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/
> COMPILED_BY = g++ # instead of gcc-4.9.3
>
> in "C:\R-Project\R-3.3.0\etc\x64\Makeconf"
>
> Now I could compile the package with no futher errors.
>
> Messages are
>
> -- cut --
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> matrixToDataFrame.cpp -o matrixToDataFrame.o
> C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o
> tibble.dll tmp.def RcppExports.o matrixToDataFrame.o
> -Ld:/Compiler/gcc-4.9.3/local330/lib/i386
> -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR
> installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386
>
> *** arch - x64
> C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include"
> -DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> matrixToDataFrame.cpp -o matrixToDataFrame.o
> C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o
> tibble.dll tmp.def RcppExports.o matrixToDataFrame.o
> -Ld:/Compiler/gcc-4.9.3/local330/lib/x64
> -Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR
> installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64
> ** R
> ** inst
> ** preparing package for lazy loading
> ** help
> *** installing help indices
> ** building package indices
> ** installing vignettes
> ** testing if installed package can be loaded
> *** arch - i386
> *** arch - x64
> * DONE (tibble)
> -- cut --
>
> So - complete success.
>
> Many thanks for your help.
>
> One last questions: Why did Rtools.exe not create a directory named
> "gcc-4.9.3" in "C:\R-Project\Rtools" and putting"
> C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" 
directly
> in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way.

The 4.6.3 compiler was compiled for "multilib" operation:  the same 
compiler took command line options to distinguish between 32 bit and 64 
bit compiles.  The newer version doesn't support that, so we need two 
separate installs.

Duncan Murdoch

> Kind regards
>
> Georg
>
>
>
>
>
> Von:Duncan Murdoch 
> An: g.maub...@weinwolf.de,
> Kopie:  r-help@r-project.org
> Datum:  29.06.2016 16:21
> Betreff:Re: Antwort: Re: [R] Installing from source on Windows 
7:
> tibble
>
>
>
> On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote:
> > Hi Duncan,
> >
> > many thanks for your reply.
> >
> > I did insert die paths to the g++ compiler because I got the message
> about
> > the not existent compiler.
> >
> > I took the directories for the compiler out again:
> >
> > C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program
> > Files\Python 3.5\Scripts\;C:\Program Files\Python
> > 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.
> >
> > Calling
> >
> > install.packages("tibble", type  = "source")
> >
> >
> > gives this message:
> >
> > -- cut --
> > * installing *source* package 'tibble' ...
> > ** Paket 'tibble' erfolgreich entpackt und MD5 Summen

[R] Antwort: Re: Antwort: Re: Installing from source on Windows 7: tibble [SOLVED]

2016-06-29 Thread G . Maubach

Hi Duncan,

indeed, I did not see the other part of your message.

I did

BINPREF ?= C:/R-Project/Rtools/mingw_32/bin/
COMPILED_BY = g++ # instead of gcc-4.9.3

in "C:\R-Project\R-3.3.0\etc\i386\Makeconf"

and

BINPREF ?= C:/R-Project/Rtools/mingw_64/bin/
COMPILED_BY = g++ # instead of gcc-4.9.3

in "C:\R-Project\R-3.3.0\etc\x64\Makeconf"

Now I could compile the package with no futher errors.

Messages are

-- cut --
* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
C:/R-Project/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
matrixToDataFrame.cpp -o matrixToDataFrame.o
C:/R-Project/Rtools/mingw_32/bin/g++ -shared -s -static-libgcc -o 
tibble.dll tmp.def RcppExports.o matrixToDataFrame.o 
-Ld:/Compiler/gcc-4.9.3/local330/lib/i386 
-Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/i386 -lR
installing to C:/R-Project/R-3.3.0/library/tibble/libs/i386

*** arch - x64
C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
C:/R-Project/Rtools/mingw_64/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" 
-DNDEBUG-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
matrixToDataFrame.cpp -o matrixToDataFrame.o
C:/R-Project/Rtools/mingw_64/bin/g++ -shared -s -static-libgcc -o 
tibble.dll tmp.def RcppExports.o matrixToDataFrame.o 
-Ld:/Compiler/gcc-4.9.3/local330/lib/x64 
-Ld:/Compiler/gcc-4.9.3/local330/lib -LC:/R-PROJ~1/R-33~1.0/bin/x64 -lR
installing to C:/R-Project/R-3.3.0/library/tibble/libs/x64
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (tibble)
-- cut --

So - complete success.

Many thanks for your help.

One last questions: Why did Rtools.exe not create a directory named 
"gcc-4.9.3" in "C:\R-Project\Rtools" and putting "
C:\R-Project\Rtools\mingw_32" and "C:\R-Project\Rtools\mingw_64" directly 
in "C:\R-Project\Rtools\"? gcc-4.6.3 was installed that way.

Kind regards

Georg





Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, 
Kopie:  r-help@r-project.org
Datum:  29.06.2016 16:21
Betreff:Re: Antwort: Re: [R] Installing from source on Windows 7: 
tibble



On 29/06/2016 10:17 AM, g.maub...@weinwolf.de wrote:
> Hi Duncan,
>
> many thanks for your reply.
>
> I did insert die paths to the g++ compiler because I got the message 
about
> the not existent compiler.
>
> I took the directories for the compiler out again:
>
> C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program
> Files\Python 3.5\Scripts\;C:\Program Files\Python
> 3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.
>
> Calling
>
> install.packages("tibble", type  = "source")
>
>
> gives this message:
>
> -- cut --
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG
> -I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> c:/Rtools/mingw_32/bin/g++: not found
> make: *** [RcppExports.o] Error 127
> Warnung: Ausführung von Kommando 'make -f
> "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f
> "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk"
> SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)'
> SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab
> Status 2
> ERROR: compilation failed for package 'tibble'
> * removing 'C:/R-Project/R-3.3.0/library/tibble'
> * restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
> Warning in install.packages :
>running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l
> "C:\R-Project\R-3.3.0\library"
> 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz'
> had status 1
> Warning in install.packages :
>installation of package ‘tibble’ had non-zero exit status
> -- cut --
>
> What else could I do?

You seem to have missed the second part of my advice, describing what to 
do with the two Makeconf files.

Duncan Murdoch

>
> Kind regards
>
> Georg
>
>
>
>
>
> Von:Duncan Murdoch 
> An:

[R] Antwort: Re: Installing from source on Windows 7: tibble

2016-06-29 Thread G . Maubach

Hi Duncan,

many thanks for your reply.

I did insert die paths to the g++ compiler because I got the message about 
the not existent compiler.

I took the directories for the compiler out again:

C:\R-Project\Rtools\bin;C:\ProgramData\Oracle\Java\javapath;C:\Program 
Files\Python 3.5\Scripts\;C:\Program Files\Python 
3.5\;C:\Python27\;C:\Python27\Scripts, etc. etc.

Calling

install.packages("tibble", type  = "source")


gives this message:

-- cut --
* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG 
-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_32/bin/g++: not found
make: *** [RcppExports.o] Error 127
Warnung: Ausführung von Kommando 'make -f 
"C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f 
"C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" 
SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' 
SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab 
Status 2
ERROR: compilation failed for package 'tibble'
* removing 'C:/R-Project/R-3.3.0/library/tibble'
* restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
Warning in install.packages :
  running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l 
"C:\R-Project\R-3.3.0\library" 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\RtmpGqOlOW/downloaded_packages/tibble_1.0.tar.gz'
 
had status 1
Warning in install.packages :
  installation of package ‘tibble’ had non-zero exit status
-- cut --

What else could I do?

Kind regards

Georg





Von:Duncan Murdoch 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  29.06.2016 13:07
Betreff:Re: [R] Installing from source on Windows 7: tibble



On 29/06/2016 5:49 AM, g.maub...@weinwolf.de wrote:
> Hi All,
>
> I would like to install R packages from source on Windows 7 64-Bit.
> Currently my settings are:
>
> -- cut --
>> sessionInfo()
> R version 3.3.0 (2016-05-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252
> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] tools_3.3.0
> -- cut --
>
> The environment variable PATH on Windows 7 is set to:
>
> 
C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program
> Files\Python 3.5\Scripts\;C:\Program Files\Python
> 3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc.

Take the mingw_32, mingw_64 and gcc-4.6.3 directories off your path. 
They aren't needed; the first two could conceivably be harmful.

>
> RTools is installed in C:\R-Project\RTools
>
> The call of
>
> C:\R-Project\Rtools\mingw_64\bin\g++.exe --version
>
> results in
>
> g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3
>
> If I do
>
>
>> install.packages("tibble", type = "source")
>
> I get
>
> -- cut --
> trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz'
> Content type 'application/x-gzip' length 38038 bytes (37 KB)
> downloaded 37 KB
>
> * installing *source* package 'tibble' ...
> ** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
> ** libs
>
> *** arch - i386
> c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG
> -I"C:/R-Project/R-3.3.0/library/Rcpp/include"
> -I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 
-c
> RcppExports.cpp -o RcppExports.o
> c:/Rtools/mingw_32/bin/g++: not found
> make: *** [RcppExports.o] Error 127
> Warnung: Ausführung von Kommando 'make -f
> "C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f
> "C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk"
> SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)'
> SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab
> Status 2
> ERROR: compilation failed for package 'tibble'
> * removing 'C:/R-Project/R-3.3.0/library/tibble'
> * restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
> Warning in install.packages :
>   running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l
> "C:\R-Project\R-3.3.0\library"
> 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz'
> had status 1
> Warning in install.packages :
>   installation of package ‘tibble’ had non-zero exit status
> -- cut --
>
> There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found "
> Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need
> to configure the settings in this file?

Yes, since you haven't installed Rtools in the default location, you 
should edit two Makeconf files.  In

[R] Installing from source on Windows 7: tibble

2016-06-29 Thread G . Maubach

Hi All,

I would like to install R packages from source on Windows 7 64-Bit. 
Currently my settings are:

-- cut --
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252 
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C 
[5] LC_TIME=German_Germany.1252 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.0
-- cut --

The environment variable PATH on Windows 7 is set to:

C:\R-Project\Rtools\mingw_32\bin;C:\R-Project\Rtools\mingw_64\bin;C:\R-Project\Rtools\bin;C:\R-Project\Rtools\gcc-4.6.3\bin;C:\Program
 
Files\Python 3.5\Scripts\;C:\Program Files\Python 
3.5\;C:\Python27\;C:\Python27\Scripts; etc. etc.

RTools is installed in C:\R-Project\RTools

The call of

C:\R-Project\Rtools\mingw_64\bin\g++.exe --version

results in

g++ (x86_64-posix-seh, Built by MinGW-W64 project) 4.9.3

If I do


> install.packages("tibble", type = "source")

I get

-- cut --
trying URL 'https://cran.uni-muenster.de/src/contrib/tibble_1.0.tar.gz'
Content type 'application/x-gzip' length 38038 bytes (37 KB)
downloaded 37 KB

* installing *source* package 'tibble' ...
** Paket 'tibble' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
c:/Rtools/mingw_32/bin/g++  -I"C:/R-PROJ~1/R-33~1.0/include" -DNDEBUG 
-I"C:/R-Project/R-3.3.0/library/Rcpp/include" 
-I"d:/Compiler/gcc-4.9.3/local330/include" -O2 -Wall  -mtune=core2 -c 
RcppExports.cpp -o RcppExports.o
c:/Rtools/mingw_32/bin/g++: not found
make: *** [RcppExports.o] Error 127
Warnung: Ausführung von Kommando 'make -f 
"C:/R-PROJ~1/R-33~1.0/etc/i386/Makeconf" -f 
"C:/R-PROJ~1/R-33~1.0/share/make/winshlib.mk" 
SHLIB_LDFLAGS='$(SHLIB_CXXLDFLAGS)' SHLIB_LD='$(SHLIB_CXXLD)' 
SHLIB="tibble.dll" OBJECTS="RcppExports.o matrixToDataFrame.o"' ergab 
Status 2
ERROR: compilation failed for package 'tibble'
* removing 'C:/R-Project/R-3.3.0/library/tibble'
* restoring previous 'C:/R-Project/R-3.3.0/library/tibble'
Warning in install.packages :
  running command '"C:/R-PROJ~1/R-33~1.0/bin/x64/R" CMD INSTALL -l 
"C:\R-Project\R-3.3.0\library" 
C:\Users\MAUBAC~1.WEI\AppData\Local\Temp\Rtmp23SQxM/downloaded_packages/tibble_1.0.tar.gz'
 
had status 1
Warning in install.packages :
  installation of package ‘tibble’ had non-zero exit status
-- cut --

There is no make.conf in "C:\R-Project\Rtools\mingw_64\etc". I found "
Makeconf" in "C:\R-Project\R-3.3.0\etc\x64". Do I need it? How do I need 
to configure the settings in this file?

I searched old aunt Google but did not understand what to do and how to 
configure R environment variables correctly.

What do I need to do to install packages from source?

Kind regards

Georg


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach

Hi All,

Petr, Bert, David, Ivan, Duncan and Rui helped me to develop a function 
able to replace NA's in variables IF NEEDED:

#---
# Module: t_replace_na.R
# Author: Georg Maubach
# Date  : 2016-06-27
# Update: 2016-06-27
# Description   : Replace NA with another value
# Source System : R 3.3.0 (64 Bit)
# Target System : R 3.3.0 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_version = "2016-06-27"
t_module_name = "t_replace_na.R"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ")", "\n", "\n",
 "This software comes with ABSOLUTELY NO WARRANTY.",
 "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
t_do_test <- FALSE

# [ Function Defintion 
]
t_replace_na <- function(dataset, variables, value) {
  # Replace NA with another given value
  #
  # Args:
  #   dataset (data frame, data table):
  # Object with dimnames, e.g. data frame, data table.
  #   variables (character vector):
  # List of variable names.
  #
  # Operation:
  #   NA is replaced by the value given with the parameter "value".
  #
  #   A factor is converted explicitly with as.character(), the missing 
value
  #   replacement is done and then the character vector is converted back 
with
  #   as.factor(). Thus NA becomes a category of the new factor variable.
  #
  # Caution:
  #   Please check your data in case you replace NA within factors due to
  #   explicit type conversion. Tests were done only for the below given
  #   dataset.
  #
  # Returns:
  #   Original dataset.
  #
  # Error handling:
  #   None.
  #
  # Credits: 
https://www.mail-archive.com/r-help@r-project.org/msg236537.html

  for (variable in variables) {
if (inherits(dataset[, variable], "factor") == TRUE) {
  dataset[, variable] <- as.character(dataset[, variable])
  print(class(dataset[, variable]))
  dataset[, variable][is.na(dataset[, variable])] <- value
  dataset[, variable] <- as.factor(dataset[, variable])
  print(class(dataset[, variable]))
} else {
  dataset[, variable][is.na(dataset[, variable])] <- value
}
  }
  return(dataset)
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
# Example dataset
ds_example <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = 
c("A","b",NA))
 
cat("\n", "\n", "Example dataset before function call", "\n", "\n")
cat("Variables and their classes:\n")
print(sapply(ds_example, class))
cat("Dataset:\n")
print(ds_example)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "a", value = -1)
cat("\n", "\n", "Dataset after function call", "\n", "\n") 
print(ds_result)
 
cat("\n", "\n", "Function call", "\n", "\n")
ds_result <- t_replace_na(ds_example, "b", value = -2)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result)

cat("\n", "\n", "Function call", "\n", "\n") 
ds_result <- t_replace_na(ds_example, "c", value = -3)
cat("\n", "\n", "Example dataset after function call", "\n", "\n") 
print(ds_result) 
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_module_name", "t_version", "t_do_test", "t_test")

# EOF .

Please note: R has capabilities to handle NA correctly. There is often no 
need to recode NA. Also NA might or might not have meaning. You have to 
decide with regard to the meaning of the original data and the business 
problem.

Kind regards

Georg




Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to

[R] Antwort: RE: Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach

Hi Petr,

many thanks for your reply and the examples.

My subscripting problems drive me nuts.

I have understood that dataset[variable] is semantically identical to 
dataset[, variable] cause dataset[variable] takes all cases because no 
other subscripts are given.

Where can I lookup the rules when to use the comma and when not?

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  27.06.2016 11:03
Betreff:RE: [R] Antwort: Fw: Re:  Subscripting problem with 
is.na()



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Monday, June 27, 2016 10:45 AM
> To: David L Carlson ; Bert Gunter
> 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Fw: Re: Subscripting problem with is.na()
>
> Hi David,
> Hi Bert,
>
> many thanks for the valuable discussion on NA in R (please see extract
> below). I follow your arguments leaving NA as they are for most of the
> time. In special occasions however I want to replace the NA with another
> value. To preserve the newly acquired knowledge for me I wrote this
> function:
>
> -- cut --
> t_replace_na <- function(dataset, variable, value) {
>  if(inherits(dataset[[variable]], "factor") == TRUE) {
>dataset[variable] <- as.character(dataset[variable])
>print(class(dataset[variable]))
>dataset[, variable][is.na(dataset[, variable])] <- value
>dataset[variable] <- as.factor(dataset[variable])
>print(class(dataset[variable]))
>  } else {
>dataset[, variable][is.na(dataset[, variable])] <- value
>  }
>  return(dataset)
> }
>



> class(ds_test[, "c"])
> test_class(ds_test, "c")
> warning("'c' should be factor NOT data.frame.
> In addition data.frame != factor")
> -- cut --
>
> Why do I get different results for the same function if it is inside or
> outside my own function definition?

Because you still are missing the way how to subscript data frames.

test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[,variable]), TRUE))
 
} else {
return(c(class(dataset[,variable]), FALSE))
##
  }
}

> test_class(ds_test, "a")
[1] "numeric" "FALSE"
> test_class(ds_test, "c")
[1] "factor" "TRUE"
>

If you properly arrange commas in your function you get desired result

p_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[,variable], "factor") == TRUE) {
   dataset[,variable] <- as.character(dataset[,variable])
   print(class(dataset[,variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[, variable] <- as.factor(dataset[, variable])
   print(class(dataset[, variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

> p_replace_na(ds_test, "c", value = -3)
[1] "character"
[1] "factor"
   a  b  c
1  1 NA  A
2 NA NA  b
3  2 NA -3

> t_replace_na(ds_test, "c", value = -3)
[1] "data.frame"
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
>

Cheers
Petr



>
> Kind regards
>
> Georg
>
> 
>
> > Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> > Von: "David L Carlson" 
> > An: "Bert Gunter" 
> > Cc: "R Help" 
> > Betreff: Re: [R] Subscripting problem with is.na()
> >
> > Good point. I did not think about factors. Also your example raises
> another issue since column c is logical, but gets silently converted to
> numeric. This would seem to get the job done assuming the conversion is
> intended for numeric columns only:
> >
> > > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > > sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> > > num <- sapply(test, is.numeric)
> > > test[, num][is.na(test[, num])] <- 0
> > > test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jak�koliv k n�mu p�ipojen� dokumenty jsou d�v�rn� a jsou 
ur�eny pouze jeho adres�t�m.
Jestli�e jste obdr�el(a) tento e-mail omylem, informujte laskav� 
neprodlen� jeho odes�latele. Obsah tohoto emailu i s p��lohami a jeho 
kopie vyma�te ze sv�ho syst�mu.
Nejste-li zam��len�m adres�tem tohoto emailu, nejste opr�vn�ni tento email 
jakkoliv u��vat, roz�i�ovat, kop�rovat �i zve�ej�ovat.
Odes�latel e-mailu neodpov�d�

[R] Antwort: Fw: Re: Subscripting problem with is.na()

2016-06-27 Thread G . Maubach

Hi David,
Hi Bert,

many thanks for the valuable discussion on NA in R (please see extract 
below). I follow your arguments leaving NA as they are for most of the 
time. In special occasions however I want to replace the NA with another 
value. To preserve the newly acquired knowledge for me I wrote this 
function:

-- cut --
t_replace_na <- function(dataset, variable, value) {
 if(inherits(dataset[[variable]], "factor") == TRUE) {
   dataset[variable] <- as.character(dataset[variable])
   print(class(dataset[variable]))
   dataset[, variable][is.na(dataset[, variable])] <- value
   dataset[variable] <- as.factor(dataset[variable])
   print(class(dataset[variable]))
 } else {
   dataset[, variable][is.na(dataset[, variable])] <- value
 }
 return(dataset)
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

t_replace_na(ds_test, "a", value = -1)
t_replace_na(ds_test, "b", value = -2)
t_replace_na(ds_test, "c", value = -3)
-- cut --

Unfortunately the if-statement does not work due to a wrong class 
definition within the function. When finding out what is going on I did 
this:

-- cut --
test_class <- function(dataset, variable) {
  if(inherits(dataset[, variable], "factor") == TRUE) {
return(c(class(dataset[variable]), TRUE))
  } else {
return(c(class(dataset[variable]), FALSE))
  }
}

ds_test <- data.frame(a=c(1,NA,2), b = rep(NA,3), c = c("A","b",NA))
print(sapply(ds_test, class))

# -- Test a --
class(ds_test[, "a"])
if(inherits(ds_test[, "a"], "factor")) {
  print(c(class(ds_test[, "a"]), "TRUE"))
} else {
  print(c(class(ds_test[, "a"]), "FALSE"))
}
test_class(ds_test, "a")
warning("'a' should be numeric NOT data.frame!")

# -- Test b --
if(inherits(ds_test[, "b"], "factor")) {
  print(c(class(ds_test[, "b"]), "TRUE"))
} else {
  print(c(class(ds_test[, "b"]), "FALSE"))
}
class(ds_test[, "b"])
test_class(ds_test, "b")
warning("'b' should be logical NOT data.frame!")

# -- Test c --
if(inherits(ds_test[, "c"], "factor")) {
  print(c(class(ds_test[, "c"]), "TRUE"))
} else {
  print(c(class(ds_test[, "c"]), "FALSE"))
}
class(ds_test[, "c"])
test_class(ds_test, "c")
warning("'c' should be factor NOT data.frame.
In addition data.frame != factor")
-- cut --

Why do I get different results for the same function if it is inside or 
outside my own function definition?

Kind regards

Georg



> Gesendet: Donnerstag, 23. Juni 2016 um 21:14 Uhr
> Von: "David L Carlson" 
> An: "Bert Gunter" 
> Cc: "R Help" 
> Betreff: Re: [R] Subscripting problem with is.na()
>
> Good point. I did not think about factors. Also your example raises 
another issue since column c is logical, but gets silently converted to 
numeric. This would seem to get the job done assuming the conversion is 
intended for numeric columns only:
> 
> > test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> > sapply(test, class)
> a b c 
> "numeric"  "factor" "logical" 
> > num <- sapply(test, is.numeric)
> > test[, num][is.na(test[, num])] <- 0
> > test
>   ab  c
> 1 1A NA
> 2 0b NA
> 3 2  NA
> 
> David C

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Subscripting problem with is.na()

2016-06-24 Thread G . Maubach

Hi Bert,

many thanks for all your help and your comments. I learn at lot this way.

My question was about is.na() at the first sight but the actual task looks like 
this:

I have two variables in my customer data that signal if the customer accout was 
closed by master data management or by sales. Say these variables are 
closed_mdm and closed_sls. They contain NA if the customer account is still 
open or a closing code from "01" to "08" if the customer account was closed and 
why.

For my analysis I need a variable that combines the two variables closed_mdm 
and closed_sls to set a filter easily on those who are closed not matter what 
the reason was nor who closed the account.

As I always encounter problems when dealing with ifelse statements and NA I 
decided to merge these two variables to one variable containing 0 = not closed 
and 1 = closed. In my context this seems to be - at least to me - a reasonable 
approach.

Replacement of missing values and merging the variables is the easiest way for 
me.

-- cut --

cust_id <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20)
closed_mdm <- c("01", NA, NA, NA, "08", "07", NA, NA, "05", NA, NA, NA, "04", 
NA, NA, NA, NA, NA, NA, NA)
closed_sls <- c(NA, "08", NA, NA, "08", "07", NA, NA, NA, NA, "03", NA, NA, NA, 
"05", NA, NA, NA, NA, NA)

# 1st try
ds_temp1 <- data.frame(cust_id, closed_mdm, closed_sls)
ds_temp1

ds_temp1$closed <- closed_mdm | closed_sls  # WRONG

# 2nd try
closed_mdm_fac1 <- as.factor(closed_mdm)
closed_sls_fac1 <- as.factor(closed_sls)

ds_temp2 <- data.frame(cust_id, closed_mdm_fac1, closed_sls_fac1)
ds_temp2

ds_temp2$closed <- ds_temp$closed_mdm_fac1 | ds_temp$closed_sls_fac1  # WRONG

# 3rd try
closed_mdm_num1 <- as.numeric(closed_mdm)  # OK
closed_sls_num1 <- as.numeric(closed_sls)  # OK

ds_temp3 <- data.frame(cust_id, closed_mdm_num1, closed_sls_num1)
ds_temp3

ds_temp3$closed <- ds_temp$closed_mdm_num1 | ds_temp$closed_sls_num1  # WRONG

# 4th try
ds_temp4 <- ds_temp3
ds_temp4

# Does not run due to not allowed NA in subscripts
ds_temp4[is.na(ds_temp4$closed_mdm_num1), ds_temp4$closed_mdm_num1] <- 0
ds_temp4[is.na(ds_temp4$closed_sls_num1), ds_temp4$closed_sls_num1] <- 0

# 5th try
ds_temp4$closed_mdm_num1 <- ifelse(is.na(ds_temp4$closed_mdm_num1), 1, 0)
ds_temp4$closed_sls_num1 <- ifelse(is.na(ds_temp4$closed_sls_num1), 1, 0)
ds_temp4

ds_temp4$closed <- ifelse(ds_temp4$closed_mdm_num1 == 1 | 
ds_temp4$closed_sls_num1 == 1, 1, 0)
ds_temp4

-- cut --

Is there a better way to do it?

Kind regards

Georg


> Gesendet: Donnerstag, 23. Juni 2016 um 23:55 Uhr
> Von: "Bert Gunter" 
> An: "David L Carlson" 
> Cc: "R Help" 
> Betreff: Re: [R] Subscripting problem with is.na()
>
> ... actually, FWIW, I would say that this little discussion mostly
> demonstrates why the OP's request is probably not a good idea in the
> first place. Usually, NA's should be left as NA's to be dealt with
> properly by R and packages. In biological measurements, for example,
> NA's often mean "below the ability to reliably measure." Biologists
> with whom I've worked over many years often want to convert these to 0
> or omit the cases, both of which lead to biased estimates and/or
> underestimates of variability and excess claims of "statistical
> significance" (for those who belong to this religious persuasion). One
> should never say never, but I suspect that there are relatively few
> circumstances where the conversion the OP requested is actually wise.
> 
> Feel free to ignore/reject such extraneous comments of course.
> 
> Cheers,
> Bert
> 
> 
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Thu, Jun 23, 2016 at 12:14 PM, David L Carlson  wrote:
> > Good point. I did not think about factors. Also your example raises another 
> > issue since column c is logical, but gets silently converted to numeric. 
> > This would seem to get the job done assuming the conversion is intended for 
> > numeric columns only:
> >
> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> >> sapply(test, class)
> > a b c
> > "numeric"  "factor" "logical"
> >> num <- sapply(test, is.numeric)
> >> test[, num][is.na(test[, num])] <- 0
> >> test
> >   ab  c
> > 1 1A NA
> > 2 0b NA
> > 3 2  NA
> >
> > David C
> >
> > -Original Message-
> > From: Bert Gunter [mailto:bgunter.4...@gmail.com]
> > Sent: Thursday, June 23, 2016 1:48 PM
> > To: David L Carlson
> > Cc: Ivan Calandra; R Help
> > Subject: Re: [R] Subscripting problem with is.na()
> >
> > Not in general, David:
> >
> > e.g.
> >
> >> test <- data.frame(a=c(1,NA,2), b = c("A","b",NA), c= rep(NA,3))
> >
> >> is.na(test)
> >  a bc
> > [1,] FALSE FALSE TRUE
> > [2,]  TRUE FALSE TRUE
> > [3,] FALSE

[R] r_toolbox: Update

2016-06-23 Thread G . Maubach

Hi folks,

I have updated the functions of the r_toolbox.R set of utilities:

https://sourceforge.net/projects/r-project-utilities/files/?source=navbar

Naming was changed with some functions to reflect similar functions in SAS 
or SPSS, e. g. t_n_miss, t_n_valid. In addition I added functions for 
reporting memory usage, selecting variables by type and getting an 
overview over the levels of factors.

I hope you find these functions useful.

Please get back to me if you have suggestions or encounter any 
difficulties.

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Subscripting problem with is.na()

2016-06-23 Thread G . Maubach

Hi All,

I would like to recode my NAs to 0. Using a single vector everything is 
fine.

But if I use a data.frame things go wrong:

-- cut --

var1 <- c(1:3, NA, 5:7, NA, 9:10)
var2 <- c(1:3, NA, 5:7, NA, 9:10)
ds_test <-
  data.frame(var1, var2)

test <- var1
test[is.na(test)] <- 0
test  # NA recoded OK

# First try
ds_test[is.na(ds_test$var1)] <- 0  # duplicate subscripts WRONG

# Second try
ds_test[is.na("var1")] <- 0 
ds_test$var1  # not recoded WRONG

# Third try: to me the most intuitive approach
is.na(ds_test["var1"]) <- 0  # attempt to select less than one element in 
integerOneIndex WRONG

# Fourth try
ds_test[is.na(var1)] <- 0  # duplicate subscripts for columns WRONG

-- cut --
 
How can I do it correctly?

Where could I have found something about it?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] (Off-Topic] Introducing a new R Blog

2016-06-20 Thread G . Maubach

Hi All,

today I would like to announce a now R blog. I contains a few entries 
about the findings during my course of studies and my daily work:

https://github.com/gmaubach/R-Know-How/wiki/R-Blog

I hope you'll find my hints usefull.

In addition you could have a look at a small R collection of functions I 
found usefull when working with my data:

https://github.com/gmaubach/R-Project-Utilities

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: Aw: Re: Building a binary vector out of dichotomous variables

2016-06-17 Thread G . Maubach

> Hi Tom,
> 
> thanks for your reply.
> 
> Yes, that's exactly what I am looking for. I did not know about the automatic 
> type conversion in R.
> 
> #-- cut --
> ds_example <-
>   structure(
> list(
>   year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
>   year2014 = c(0,
>0, 1, 1, 0, 0, 1, 1),
>   year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
> ),
> .Names = c("year2013",
>"year2014", "year2015"),
> row.names = c(NA, 8L),
> class = "data.frame"
>   )
> 
> #-- Proposal: works!
> as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep='')))
> 
> # I store my know-how about R in functions for later use.
> 
> #--´ Putting it in a function - does not work!
> t_make_binary_vector <- function(dataset,
>  input_variables,
>  output_variable = "binary_vector") {
>   dataset[output_variable] <- "1"
>   print(dataset[output_variable])
>   
>   for (variable in input_variables) {
> print(variable)
> dataset[output_variable] <- paste(dataset[output_variable],
>   dataset[variable], 
>   sep='')
>   }
>   
>   # print(dataset[output_variable])
> 
>   dataset[output_variable] <- as.integer(dataset[output_variable])
>   
>   return(dataset)
> }
> 
> t_make_binary_vector(dataset = ds_example,
>  input_variables = c("year2013", "year2014", "year2015"),
>  output_variable = "binary_vector")
> 
> 
> #-- Doesn't work either.
> t_make_binary_vector <- function(dataset,
>  input_variables,
>  output_variable = "binary_vector") {
>   dataset[output_variable] <- as.integer(paste(1, dataset[ , 
> input_variables], sep = ''))
> 
>   return(dataset)
> }
> 
> t_make_binary_vector(dataset = ds_example,
>  input_variables = c("year2013", "year2014", "year2015"),
>  output_variable = "binary_vector")
> 
> #-- cut --
> 
> Why is R taking the parameter value itself to paste it together instead of 
> referencing the variable within the dataset?
> 
> What did I get wrong about R? How can I fix it?
> 
> Kind regards
> 
> Georg
> 
> 
> > Gesendet: Donnerstag, 16. Juni 2016 um 16:13 Uhr
> > Von: "Tom Wright" 
> > An: g.maub...@weinwolf.de
> > Cc: "R. Help" 
> > Betreff: Re: [R] Building a binary vector out of dichotomous variables
> >
> > Does this do what you want?
> > 
> > as.numeric(with(ds_example,paste(1,year2013,year2014,year2015,sep='')))
> > 
> > On Thu, Jun 16, 2016 at 8:57 AM,   wrote:
> > > Hi All,
> > >
> > > I need to build a binary vector made of a set of dichotomous variables.
> > >
> > > What I have so far is:
> > >
> > > -- cut --
> > >
> > > ds_example <-
> > >   structure(
> > > list(
> > >   year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
> > >   year2014 = c(0,
> > >0, 1, 1, 0, 0, 1, 1),
> > >   year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
> > > ),
> > > .Names = c("year2013",
> > >"year2014", "year2015"),
> > > row.names = c(NA, 8L),
> > > class = "data.frame"
> > >   )
> > >
> > > attach(ds_example)
> > > base <- 1000
> > > binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015
> > > detach(ds_example)
> > >
> > > binary_vector
> > >
> > > ds_example <- cbind(ds_example, binary_vector)
> > >
> > > varlist <- c("year2013", "year2014", "year2015")
> > >
> > > base <- 10^length(varlist)
> > >
> > > binary_vector <- NULL
> > >
> > > for (i in 1:3) {
> > >   binary_vector <-
> > >base +
> > >ds_example [[varlist[i]]] * base / (10 ^ i)
> > > }
> > >
> > > ds_example <- cbind(ds_example, binary_vector)
> > >
> > > message("Wrong result!")
> > > ds_example
> > >
> > > -- cut --
> > >
> > > How do I get vectors like  1000 1001 1011  1100 1101 1110 1010 for
> > > each case?
> > >
> > > Is there a better approach than mine?
> > >
> > > Kind regards
> > >
> > > Georg
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

[R] Building a binary vector out of dichotomous variables

2016-06-16 Thread G . Maubach

Hi All,

I need to build a binary vector made of a set of dichotomous variables.

What I have so far is:

-- cut --

ds_example <-
  structure(
list(
  year2013 = c(0, 0, 0, 1, 1, 1, 1, 0),
  year2014 = c(0,
   0, 1, 1, 0, 0, 1, 1),
  year2015 = c(0, 1, 1, 1, 0, 1, 0, 0)
),
.Names = c("year2013",
   "year2014", "year2015"),
row.names = c(NA, 8L),
class = "data.frame"
  )

attach(ds_example)
base <- 1000
binary_vector <- base + year2013 * 100 + year2014 * 10 + year2015
detach(ds_example)

binary_vector

ds_example <- cbind(ds_example, binary_vector)

varlist <- c("year2013", "year2014", "year2015")

base <- 10^length(varlist)

binary_vector <- NULL

for (i in 1:3) {
  binary_vector <- 
   base + 
   ds_example [[varlist[i]]] * base / (10 ^ i)
}

ds_example <- cbind(ds_example, binary_vector)

message("Wrong result!")
ds_example

-- cut --

How do I get vectors like  1000 1001 1011  1100 1101 1110 1010 for 
each case?

Is there a better approach than mine?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installation of package "rio" broken

2016-06-14 Thread G . Maubach

Hi all,

today I wanted to install package "rio". As it depends on package "feather" 
which is only available as source I have chosen to install "rio" from source. 
The installations fails with the following messages:

-- cut --
* installing *source* package 'feather' ...
** Paket 'feather' erfolgreich entpackt und MD5 Summen überprüft
** libs

*** arch - i386
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c RcppExports.cpp -o RcppExports.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-read.cpp -o feather-read.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-types.cpp -o feather-types.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather-write.cpp -o feather-write.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/buffer.cc -o feather/buffer.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/feather-c.cc -o feather/feather-c.o
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/io.cc -o feather/io.o
feather/io.cc:18:0: warning: "NOMINMAX" redefined [enabled by default]
c:\program 
files\rtools\gcc-4.6.3\bin\../lib/gcc/i686-w64-mingw32/4.6.3/../../../../include/c++/4.6.3/i686-w64-mingw32/bits/os_defines.h:46:0:
 note: this is the location of the previous definition
g++ -m32 -std=c++0x -I"C:/PROGRA~1/R/R-32~1.2/include" -DNDEBUG -I.   
-I"C:/Users/admin/Documents/R/win-library/3.2/Rcpp/include" 
-I"d:/RCompile/r-compiling/local/local320/include" -O2 -Wall  -mtune=core2 
-c feather/metadata.cc -o feather/metadata.o
feather/metadata.cc:29:7: error: expected nested-name-specifier before 
'FBString'
feather/metadata.cc:29:7: error: 'FBString' has not been declared
feather/metadata.cc:29:16: error: expected ';' before '=' token
feather/metadata.cc:29:16: error: expected unqualified-id before '=' token
feather/metadata.cc:32:7: error: expected nested-name-specifier before 
'ColumnVector'
feather/metadata.cc:32:7: error: 'ColumnVector' has not been declared
feather/metadata.cc:32:20: error: expected ';' before '=' token
feather/metadata.cc:32:20: error: expected unqualified-id before '=' token
feather/metadata.cc:178:3: error: 'ColumnVector' does not name a type
feather/metadata.cc: In member function 'feather::Status 
feather::metadata::TableBuilder::Impl::Finish()':
feather/metadata.cc:146:5: error: 'FBString' was not declared in this scope
feather/metadata.cc:146:14: error: expected ';' before 'desc'
feather/metadata.cc:148:7: error: 'desc' was not declared in this scope
feather/metadata.cc:154:9: error: 'desc' was not declared in this scope
feather/metadata.cc:156:27: error: 'columns_' was not declared in this scope
feather/metadata.cc:157:34: error: unable to deduce 'auto' from ''
feather/metadata.cc: In member function 'void 
feather::metadata::TableBuilder::Impl::add_column(const 
flatbuffers::Offset&)':
feather/metadata.cc:173:5: error: 'columns_' was not declared in this scope
feather/metadata.cc: In constructor 
'feather::metadata::TableBuilder::TableBuilder()':
feather/metadata.cc:190:5: error: type 'feather::metadata::TableBuilder' is not 
a direct base of 'feather::metadata::TableBuilder'
make: *** [feather/metadata.o] Error 1
Warnung: Ausführung von Kommando 'make -f "Makevars" -f 
"C:/PROGRA~1/R/R-32~1.2/etc/i386/Makeconf" -f 
"C:/PROGRA~1/R/R-32~1.2/share/make/winshlib.mk" CXX='$(CXX1X) $(CXX1XSTD)' 
CXXFLAGS='$(CXX1XFLAGS)' CXXPICFLAGS='$(CXX1XPICFLAGS)' 
SHLIB_LDFLAGS='$(SHLIB_CXX1XLDFLAGS)' SHLIB_LD='$(SHLIB_CXX1XLD)' 
SHLIB="feather.dll" OBJECTS="RcppExports.o feather-read.o feather-types.o 
feather-write.o"' ergab Status 2
ERROR: compilation failed for package 'feather'
* removing 'C:/Users/admin/Documents/R/win-library/3.2/feather'
Warning in install.packages :
  running command '"C:/PROGRA~1/R/R-32~1.2/bin/x64/R" CMD INSTALL -l 
"C:\Users\admin\Documents\R\win-library\3.2"

[R] Warning message in openxlsx

2016-06-14 Thread G . Maubach

Hi All,

I get the warning message

Warning message:
In styles$font : partial match of 'font' to 'fonts'

when executing


> xls_workbook <- t_create_workbook()
> xls_sheetname <- "Kunden"
> xls_ds_to_save <- ds_merge1
> xls_filename <- paste0(data_created, 
"_Merge1_BW-SAP-Kunden_cleaned.xlsx")
> t_add_sheet(workbook = xls_workbook,
+ sheetname = xls_sheetname,
+ dataset = xls_ds_to_save)
> t_write_xlsx(workbook = xls_workbook,
+  path = path_output,
+  filename = xls_filename,
+  overwrite = TRUE)

where t_create_workbook() is

return(createWorkbook())

and t_add_sheet() is

 addWorksheet(workbook,
sheetName = sheetname)
  writeDataTable(workbook, 
sheet = sheetname, 
x = dataset)
  ### writeDataTable writes data to a sheet an adds
  ### autofilter to the first line
  if (freeze_row <= 1 | freeze_col <= 1) {
NULL # do nothing
  }
  else {
freezePane(workbook,
  sheet = sheetname,
  firstActiveRow = freeze_row,
  firstActiveCol = freeze_col)
  }
 
  setColWidths(workbook,
sheet = sheetname,
cols = 1:ncol(dataset), 
widths = "auto")

and t_write_xlsx is

saveWorkbook(workbook, 
file = file.path(path, filename),
overwrite = overwrite)

I am woundring what "partial match of 'font' to 'fonts'" means cause I do 
not call it in the functions calls. I use these calls a lot in my programs 
but never got this message before.

What does this message mean? How can I avoid this message?

Kind regards

Georg Maubach

PS: You can find more information about the used functions by going to 
https://sourceforge.net/projects/r-project-utilities/files/?source=navbar 
.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Antwort: Re: Merging variables

2016-06-08 Thread G . Maubach

Hi Petr,

thanks for your reply.

I prepared little example for you:

-- cut --

ds_temp_1 <-
  structure(list(
CustId = c(1001, 1002, 1003, 1004, 1005, 1006),
CustName = c("Miller", "Smith", "Doe", "White", "Black",
 "Nobody"),
sales = c(100, 500, 300, 50, 700, 10)
  ),
  .Names = c("CustId",
 "CustName", "sales"), row.names = c(NA, 6L), class = 
"data.frame")

ds_temp_2 <-
  structure(
list(
  CustId = c(1001, 1002, 1003),
  CustName = c("Miller",
   "Smith", "Doe"),
  CustGroup = c(1, 2, 3)
),
.Names = c("CustId",
   "CustName", "CustGroup"),
row.names = c(NA, 3L),
class = "data.frame"
  )

ds_merge <- merge(ds_temp_1, ds_temp_2,
  by.x = "CustId", all.x = TRUE,
  by.y = "CustId", all.y = FALSE)

ds_merge

-- cut --

which gives

ds_merge
  CustId CustName.x sales CustName.y CustGroup
1   1001 Miller   100 Miller 1
2   1002  Smith   500  Smith 2
3   1003Doe   300Doe 3
4   1004  White50   NA
5   1005  Black   700   NA
6   1006 Nobody10   NA

where CustName is split into CustName.x and CustName.y.

What I would like to have is:

ds_merge
  CustId CustName   sales  CustGroup
1   1001 Miller   100  1
2   1002  Smith   500  2
3   1003Doe   300  3
4   1004  White50 NA
5   1005  Black   700 NA
6   1006 Nobody10 NA

That is CustName in a single variable cause the values within that 
variable are identical. I guess because of NA for some cases in ds_temp_2 
R generates CustName.x and CustName.y.

Is there a simple way of merging a dataset and having R return a single 
variable is the values are identical or missing in either one of the 
datasets?

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
Kopie:  "r-help@r-project.org" 
Datum:  07.06.2016 13:11
Betreff:RE: [R] Antwort: Re:  Merging variables



Hi

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Tuesday, June 7, 2016 8:19 AM
> To: Michael Dewey 
> Cc: r-help@r-project.org
> Subject: [R] Antwort: Re: Merging variables
>
> Hi Michael,
>
> yes, I was astonished about this behaviour either. I have worked with 
SPSS a
> lot - and that works different.

If you want to join two data frames by common names you can use use

merge(dat1, dat2, )

without specifing by. From help page:

By default the data frames are merged on the columns with names they both 
have, but separate specifications of the columns can be given by by.x and 
by.y. The rows in the two data frames that match on the specified columns 
are extracted, and joined together.

>
> I would like to share some of my data. Can you tell me how I can dump a
> dataset in a way that I can post it here as text?

copy result of dput directly to your mail

dput(dat)
structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names = 
c("hz",
"vykon"), row.names = c(NA, -3L), class = "data.frame")

We can use

dat <- structure(list(hz = c(0, 25, 50), vykon = c(0, 11.6, 22.6)), .Names 
= c("hz",
"vykon"), row.names = c(NA, -3L), class = "data.frame")

to reconstruct the object.

Regards
Petr

>
> Kind regards
>
> Georg
>
>
>
>
> Von:Michael Dewey 
> An: g.maub...@weinwolf.de, r-help@r-project.org,
> Datum:  06.06.2016 15:45
> Betreff:Re: [R] Merging variables
>
>
>
> X-Originating-<%= hostname %>-IP: [217.155.205.190]
>
> Dear Georg
>
> I find it a bit surprising that you end up with customer.x and 
customer.y. Can
> you share with us a toy example of two data.frames which exhibit this
> behaviour?
>
> On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> > Hi All,
> >
> > I merged two datasets:
> >
> > ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> > ds_zww_customer_4_match,
> >   by.x = "customer", by.y = "customer",
> >   all.x = TRUE, all.y = FALSE)
> >
> > R created a new dataset with the variables customer.x and customer.y.
> > I would like to merge these two variable back together. I wrote a
> > little function (code can be run) for it:
> >
> > -- cut --
> >
> > customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> > customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors =
> > FALSE)
> >
> > t_merge_variables <-
> >   function(dataset,
> >var1,
> >var2,
> >merged_var) {
> >
> > # Initialize
> > dataset[[merged_var]] = rep(NA, nrow(dataset))
> > dataset[["mismatch"]] = rep(NA, nrow(dataset))
> >
> > for (i in 1:nrow(dataset)) {
> >
> >   # Check 1: var1 missing, var2 missing
> >   if (is.na(dataset[[i,

[R] Antwort: RE: Merging variables

2016-06-07 Thread G . Maubach

Hi Petr,

I would like to describe the data situation in brief:

I have an business warehouse dataset (referred to as BW data) containing 
sales and an ERP  customer master data dataset with additional information 
(referred to as ERP data). Though customer IDs and customer names are 
identical due to the fact that the business warehouse data is derived from 
the ERP data.  Due to selection criteria the BW data contains slightly 
more customers than the ERP data. So customer names and all other 
information is missing in the ERP data for some cases of the BW data.  If 
I merge those by customer ID variable customer names are duplicated using 
customer.x and customer.y as variable names.

As both fields contains the same contents I would have expected R to merge 
this into one variable, e. g. customer. But this is not the case.

Can I adjust the below given merge statement - which looks almost the same 
in my script - that R does the merge of the variables if they are 
identical automatically?

This is my code using left join:

-- cut --

ds_merge1 <- merge(x = ds_bw_customer_4_match, y = 
ds_erp_customer_4_match,
  by.x = "CustID", by.y = "CustID",
  all.x = TRUE, all.y = FALSE)

-- cut --

Kind regards

Georg




Von:PIKAL Petr 
An: Michael Dewey , "g.maub...@weinwolf.de" 
, "r-help@r-project.org" , 
Datum:  06.06.2016 17:04
Betreff:RE: [R] Merging variables



Hi Michael

it is simple

set.seed(111)
let=sample(letters[1:10],6, replace=T)
dat1<-data.frame(let=let, customer=sample(1:10,6, replace=T))
let=sample(letters[1:10],6, replace=T)
dat2<-data.frame(let=let, customer=sample(1:10,6, replace=T))
merge(dat1, dat2, by.x="let", by.y="let", all=T)

Of course you could add customer variable to by parameter but sometimes it 
is necessary to leave it out. When you have two sets of analytical results 
and you have 2 variables operator but you want to merge those sets e.g. by 
date/hour of analysis.

Regards
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael
> Dewey
> Sent: Monday, June 6, 2016 3:46 PM
> To: g.maub...@weinwolf.de; r-help@r-project.org
> Subject: Re: [R] Merging variables
>
> X-Originating-<%= hostname %>-IP: [217.155.205.190]
>
> Dear Georg
>
> I find it a bit surprising that you end up with customer.x and 
customer.y. Can
> you share with us a toy example of two data.frames which exhibit this
> behaviour?
>
> On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> > Hi All,
> >
> > I merged two datasets:
> >
> > ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> > ds_zww_customer_4_match,
> >   by.x = "customer", by.y = "customer",
> >   all.x = TRUE, all.y = FALSE)
> >
> > R created a new dataset with the variables customer.x and customer.y.
> > I would like to merge these two variable back together. I wrote a
> > little function (code can be run) for it:
> >
> > -- cut --
> >
> > customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> > customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> > ds_test <- data.frame(customer.x, customer.y, stringsAsFactors =
> > FALSE)
> >
> > t_merge_variables <-
> >   function(dataset,
> >var1,
> >var2,
> >merged_var) {
> >
> > # Initialize
> > dataset[[merged_var]] = rep(NA, nrow(dataset))
> > dataset[["mismatch"]] = rep(NA, nrow(dataset))
> >
> > for (i in 1:nrow(dataset)) {
> >
> >   # Check 1: var1 missing, var2 missing
> >   if (is.na(dataset[[i, var1]]) &
> >   is.na(dataset[[i, var2]])) {
> > dataset[["mismatch"]] <- 1  # var1 & var2 are missing
> >
> >   # Check 2: var1 filled, var2 missing
> >   } else if (!is.na(dataset[[i, var1]]) &
> >  is.na(dataset[[i, var2]])) {
> > dataset[[i, merged_var]] <- dataset[[i, var1]]
> > dataset[["mismatch"]] <- 0
> >
> >   # Check 3: var1 missing, var2 filled
> >   } else if (is.na(dataset[[i, var1]]) &
> >  !is.na(dataset[i, var2])) {
> > dataset[[i, merged_var]] <- dataset[[i, var2]]
> > dataset[["mismatch"]] <-  0
> >
> >   # Check 4: var1 == var2
> >   } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
> >   dataset[[i, merged_var]] <- dataset[[i, var1]]
> >   dataset[["mismatch"]] <- 0
> >
> >   # Leftover: var1 != var2
> >   } else {
> > dataset[[i, merged_var]] <- NA
> > dataset[["mismatch"]] <- 2  # var1 != var2
> >   }  # end if
> > }  # end for
> > return(dataset)
> > }
> >
> > ds_var_merge1 <- t_merge_variables(dataset = ds_test,
> >   var1 = "customer.x",
> >   var2 = "customer.y",
> >   merged_var = "customer")
> >
> > ds_var_merge1
> >
> > -- cut --
> >
> > It is executed without error but delivers the wrong values in the
> > variable "mismatch". This variable is always 1 although it should be
> > NA, 1 or 2 respectively.
> >
>

[R] Antwort: Re: Merging variables

2016-06-07 Thread G . Maubach

Hi Michael,

yes, I was astonished about this behaviour either. I have worked with SPSS 
a lot - and that works different.

I would like to share some of my data. Can you tell me how I can dump a 
dataset in a way that I can post it here as text?

Kind regards

Georg




Von:Michael Dewey 
An: g.maub...@weinwolf.de, r-help@r-project.org, 
Datum:  06.06.2016 15:45
Betreff:Re: [R] Merging variables



X-Originating-<%= hostname %>-IP: [217.155.205.190]

Dear Georg

I find it a bit surprising that you end up with customer.x and 
customer.y. Can you share with us a toy example of two data.frames which 
exhibit this behaviour?

On 06/06/2016 13:29, g.maub...@weinwolf.de wrote:
> Hi All,
>
> I merged two datasets:
>
> ds_merge1 <- merge(x = ds_bw_customer_4_match, y =
> ds_zww_customer_4_match,
>   by.x = "customer", by.y = "customer",
>   all.x = TRUE, all.y = FALSE)
>
> R created a new dataset with the variables customer.x and customer.y. I
> would like to merge these two variable back together. I wrote a little
> function (code can be run) for it:
>
> -- cut --
>
> customer.x <- c("Miller", "Smith", NA,"Bird", NA)
> customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
> ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
>
> t_merge_variables <-
>   function(dataset,
>var1,
>var2,
>merged_var) {
>
> # Initialize
> dataset[[merged_var]] = rep(NA, nrow(dataset))
> dataset[["mismatch"]] = rep(NA, nrow(dataset))
>
> for (i in 1:nrow(dataset)) {
>
>   # Check 1: var1 missing, var2 missing
>   if (is.na(dataset[[i, var1]]) &
>   is.na(dataset[[i, var2]])) {
> dataset[["mismatch"]] <- 1  # var1 & var2 are missing
>
>   # Check 2: var1 filled, var2 missing
>   } else if (!is.na(dataset[[i, var1]]) &
>  is.na(dataset[[i, var2]])) {
> dataset[[i, merged_var]] <- dataset[[i, var1]]
> dataset[["mismatch"]] <- 0
>
>   # Check 3: var1 missing, var2 filled
>   } else if (is.na(dataset[[i, var1]]) &
>  !is.na(dataset[i, var2])) {
> dataset[[i, merged_var]] <- dataset[[i, var2]]
> dataset[["mismatch"]] <-  0
>
>   # Check 4: var1 == var2
>   } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
>   dataset[[i, merged_var]] <- dataset[[i, var1]]
>   dataset[["mismatch"]] <- 0
>
>   # Leftover: var1 != var2
>   } else {
> dataset[[i, merged_var]] <- NA
> dataset[["mismatch"]] <- 2  # var1 != var2
>   }  # end if
> }  # end for
> return(dataset)
> }
>
> ds_var_merge1 <- t_merge_variables(dataset = ds_test,
>   var1 = "customer.x",
>   var2 = "customer.y",
>   merged_var = "customer")
>
> ds_var_merge1
>
> -- cut --
>
> It is executed without error but delivers the wrong values in the 
variable
> "mismatch". This variable is always 1 although it should be NA, 1 or 2
> respectively.
>
> Can you tell me why the variable is not correctly set?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Merging variables

2016-06-06 Thread G . Maubach

Hi David,
Hi Petr,

many thanks for your help. With your hints I got the idea how I could do 
it and I came up with this solution:

-- cut --

#---
# Module: t_merge_variables.R
# Author: Georg Maubach
# Date  : 2016-06-06
# Update: 2016-06-06
# Description   : Merge two variables
# Source System : R 3.2.5 (64 Bit)
# Target System : R 3.2.5 (64 Bit)
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#1-2-3-4-5-6-7-8

t_module_name = "t_merge_variables.R"
t_version = "2016-06-06"

cat(
  paste0("\n",
 t_module_name, " (Version: ", t_version, ")", "\n", "\n",
 "This software comes with ABSOLUTELY NO WARRANTY.",
 "\n", "\n"))

# If do_test is not defined globally define it here locally by 
un-commenting it
# Switch t_do_test to TRUE to run test
t_do_test <- FALSE

# [ Function Defintion 
]
t_merge_variables <-
  function(dataset,
   var1,
   var2,
   merged_var) {
# Merges two variables with identical, different or missing values
#
# Args:
#  dataset (data frame, data table):
#Object with dimnames, e.g. data frame, data table.
#  var1 (character):
#Variable 1 to be merged.
#  var2 (character):
#Variable 2 to be merged.
#  merged_var (class based on input variable, coercion done if 
possible):
#Variable with the merged variables var1 and var2.
#
# Operation:
#   Var1 and var2 are merged like follows:
#   if var1 == var2: merged_var <- var1
#   if var1 != var2: merged_var <- -900 (-900 = indicating mismatch)
#   if var1 is filled & var2 is missing: merged_var <- var1
#   if var1 is missing & var2 is filled: merged_var <- var2
#   if var1 is missing & var2 is filled: merged_var <- -999
#(-999 = indicating NA)
#
# Returns:
#   Original dataset and variable given in "merged_var" will be added.
#
# Error handling:
#   None.
#
# Credits: 
#   https://www.mail-archive.com/r-help@r-project.org/msg236012.html
 
# Initialize
dataset[merged_var] = rep(NA, nrow(dataset))

dataset[merged_var] <-
  # Check 1: var1 missing, var2 missing
  ifelse(is.na(dataset[, var1]) & is.na(dataset[, var2]), 
# then
dataset[[merged_var]] <- 0,
# Check 2: var1 filled, var2 missing
ifelse(!is.na(dataset[, var1]) & is.na(dataset[, var2]),
  # then
  dataset[[merged_var]] <- dataset[, var1],
  # Check 3: var1 missing, var2 filled
  ifelse(is.na(dataset[ , var1]) & !is.na(dataset[, var2]),
# then
dataset[[merged_var]] <- dataset[ , var2],
# Check 4: var1 == var2
ifelse(dataset[, var1] == dataset[, var2],
  # then: use var1
  dataset[[merged_var]] <- dataset[, var1],
  #Leftover: var1 != var2
  dataset[merged_var] <- 1
 
return(dataset)
}

# [ Test Defintion 
]
t_test <- function(do_test = FALSE) {
  if (do_test == TRUE) {
cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
# Example dataset
customer.x <- c("Miller", "Smith", NA,"Bird", NA)
customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
ds_test <-
  data.frame(customer.x, customer.y, stringsAsFactors = FALSE)
 
# Call function
ds_merge <- t_merge_variables(
  dataset = ds_test,
  var1 = "customer.x",
  var2 = "customer.y",
  merged_var = "customer"
)
 
# Dataset after function call
ds_merge
  }
}

# [ Test Run 
]--
t_test(do_test = t_do_test)

# [ Clean up 
]--
rm("t_do_test", "t_module_name", "t_version", "t_test")

# EOF

-- cut --

It delivers the customer name if there is one or they match. If they don't 
match it delivers 1. If both are missing it delivers 0.

This solution is for my applications sufficient.

Many thanks again for your help and giving me the ideas to solve my data 
transformation task.

Kind regards

Georg





Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
"r-help@r-project.org" , 
Datum:  06.06.2016 15:04
Betreff:RE: [R] Merging variables



Hi

Not sure if this is the most effective or general solution but

Here you get 2 if the value is same in both columns, 1 if it is only in 
one column and the other is NA and 0 if there is mismatch of values.
temp <- (ds_test[,2]

[R] Merging variables

2016-06-06 Thread G . Maubach

Hi All,

I merged two datasets:

ds_merge1 <- merge(x = ds_bw_customer_4_match, y = 
ds_zww_customer_4_match,
  by.x = "customer", by.y = "customer",
  all.x = TRUE, all.y = FALSE)

R created a new dataset with the variables customer.x and customer.y. I 
would like to merge these two variable back together. I wrote a little 
function (code can be run) for it:

-- cut --

customer.x <- c("Miller", "Smith", NA,"Bird", NA)
customer.y <- c("Miller",  NA, "Doe", "Fish", NA)
ds_test <- data.frame(customer.x, customer.y, stringsAsFactors = FALSE)

t_merge_variables <-
  function(dataset,
   var1,
   var2,
   merged_var) {
 
# Initialize
dataset[[merged_var]] = rep(NA, nrow(dataset))
dataset[["mismatch"]] = rep(NA, nrow(dataset))
 
for (i in 1:nrow(dataset)) {
 
  # Check 1: var1 missing, var2 missing
  if (is.na(dataset[[i, var1]]) &
  is.na(dataset[[i, var2]])) {
dataset[["mismatch"]] <- 1  # var1 & var2 are missing
 
  # Check 2: var1 filled, var2 missing
  } else if (!is.na(dataset[[i, var1]]) &
 is.na(dataset[[i, var2]])) {
dataset[[i, merged_var]] <- dataset[[i, var1]]
dataset[["mismatch"]] <- 0
 
  # Check 3: var1 missing, var2 filled
  } else if (is.na(dataset[[i, var1]]) &
 !is.na(dataset[i, var2])) {
dataset[[i, merged_var]] <- dataset[[i, var2]]
dataset[["mismatch"]] <-  0
 
  # Check 4: var1 == var2
  } else if (dataset[[i, var1]] == dataset[[i, var2]]) {
  dataset[[i, merged_var]] <- dataset[[i, var1]]
  dataset[["mismatch"]] <- 0

  # Leftover: var1 != var2
  } else {
dataset[[i, merged_var]] <- NA
dataset[["mismatch"]] <- 2  # var1 != var2
  }  # end if
}  # end for
return(dataset)
}

ds_var_merge1 <- t_merge_variables(dataset = ds_test,
  var1 = "customer.x",
  var2 = "customer.y",
  merged_var = "customer")

ds_var_merge1

-- cut --

It is executed without error but delivers the wrong values in the variable 
"mismatch". This variable is always 1 although it should be NA, 1 or 2 
respectively.

Can you tell me why the variable is not correctly set?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Variable labels and value labels

2016-06-01 Thread G . Maubach

Hi Jim,

many thanks for the hint.

When looking at the documentation I did not get how I do control which 
value gets which label. Is it possible to define it?

Kind regards

Georg




Von:Jim Lemon 
An: g.maub...@weinwolf.de, r-help mailing list , 

Datum:  01.06.2016 03:59
Betreff:Re: [R] Variable labels and value labels



Hi Georg,
You may find the "add.value.labels" function in the prettyR package 
useful.

Jim

On Tue, May 31, 2016 at 10:00 PM,   wrote:
> Hi All,
>
> I am using R for social sciences. In this field I am used to use short
> variable names like "q1" for question 1, "q2" for question 2 and so on 
and
> label the variables like q1 : "Please tell us your age" or q2 : "Could 
you
> state us your household income?" or something similar indicating which
> question is stored in the variable.
>
> Similar I am used to label values like 1: "Less than 18 years", 2 : "18 
to
> 30 years", 3 : "31 to 60 years" and 4 : "61 years and more".
>
> I know that the packages Hmisc and memisc have a functionality for this
> but these labeling functions are limited to the packages they were 
defined
> for. Using the question tests as variable names is possible but very
> inconvenient.
>
> I there another way for labeling variables and values in R?
>
> Kind regards
>
> Georg Maubach
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: RE: Variable labels and value labels

2016-06-01 Thread G . Maubach

Hi Petr,

I am looking for a general procedure that I can use with any package of R.

As to my current experience it probably will happen that I need a 
procedure from another package than hmisc or memisc and the my solution 
shall work even than so that I do need to find another way to do it.

Kind regards

Georg



Von:PIKAL Petr 
An: "g.maub...@weinwolf.de" , 
"r-help@r-project.org" , 
Datum:  31.05.2016 14:56
Betreff:RE: [R] Variable labels and value labels



Hi

see in line

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@weinwolf.de
> Sent: Tuesday, May 31, 2016 2:01 PM
> To: r-help@r-project.org
> Subject: [R] Variable labels and value labels
>
> Hi All,
>
> I am using R for social sciences. In this field I am used to use short 
variable
> names like "q1" for question 1, "q2" for question 2 and so on and label 
the
> variables like q1 : "Please tell us your age" or q2 : "Could you state 
us your
> household income?" or something similar indicating which question is 
stored
> in the variable.
>
> Similar I am used to label values like 1: "Less than 18 years", 2 : "18 
to
> 30 years", 3 : "31 to 60 years" and 4 : "61 years and more".

Seems to me that it is work for factors

nnn <- sample(1:4, 20, replace=TRUE)
q1 <-factor(nnn, labels=c("Less than 18 years", "18 to 30 years", "31 to 
60 years","61 years and more"))

You can store such variables in data.frame with names "q1" to "qwhatever" 
and possibly "Subject"

And you can store annotation of questions in another data frame with 2 
columns e.g. "Question" and "Description"

Basically it is an approach similar to database and in R you can merge 
those two data.frames by ?merge.
>
> I know that the packages Hmisc and memisc have a functionality for this 
but
> these labeling functions are limited to the packages they were defined 
for.

It seems to me strange. What prevents you to use functions from Hmisc?

Regards
Petr

> Using the question tests as variable names is possible but very 
inconvenient.
>
> I there another way for labeling variables and values in R?
>
> Kind regards
>
> Georg Maubach
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou 
určeny pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě 
neprodleně jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho 
kopie vymažte ze svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi 
či zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření 
smlouvy, a to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany 
příjemce s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve 
výslovným dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za 
společnost žádné smlouvy s výjimkou případů, kdy k tomu byl písemně 
zmocněn nebo písemně pověřen a takové pověření nebo plná moc byly 
adresátovi tohoto emailu případně osobě, kterou adresát zastupuje, 
předloženy nebo jejich existence je adresátovi či osobě jím zastoupené 
známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its 
sender. Delete the contents of this e-mail with all attachments and its 
copies from your system.
If you are not the intended recipient of this e-mail, you are not 
authorized to use, disseminate, copy or disclose this e-mail in any 
manner.
The sender of this e-mail shall not be liable for any possible damage 
caused by modifications of the e-mail or by delay with transfer of the 
email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to 
immediately accept such offer; The sender of this e-mail (offer) excludes 
any acceptance of the offer on the part of the recipient containing any 
amendment or variation.
- the sender insists on that the

[R] Antwort: Re: Unable to update R software to 3.3.0

2016-06-01 Thread G . Maubach

Hi all,

I did it today on Debian GNU Linux 8 Jessie this way:

vim /etc/apt/sources.list
deb http://cran.uni-muenster.de/bin/linux/debian jessie-cran3
ESC;:wq

apt.get update
apt-get install r-base r-base-dev

This worked for me.

When installing R packages from within R I found that R needed the 
following:

apt-get install libssl-dev libcurl4-openssl-dev libhunspell-dev 
libxml2-dev 

You probably might to wish to install this also.

HTH.

Kind regards

Georg

Von:Marc Schwartz 
An: Sunish Kumar Bilandi , 
Kopie:  R-help 
Datum:  01.06.2016 17:18
Betreff:Re: [R] Unable to update R software to 3.3.0
Gesendet von:   "R-help" 

> On Jun 1, 2016, at 1:33 AM, Sunish Kumar Bilandi 
 wrote:
> 
> Hi Team,
> 
> I am using RedHat 5 and installed R using YUM, (R version 3.2.3) Now I 
want to update R version tp 3.3.0, but I am unable to do that, Is there 
any alternate to do this?
> 
> Hope to hear from your side.
> 
> Regards,
> 
> 
> Sunish Bilandi
> Business Analyst, CIDA-01
> Evalueserve

Hi,

First, RHEL and related distributions (e.g. Fedora), have a dedicated 
R-SIG list:

  https://stat.ethz.ch/mailman/listinfo/r-sig-fedora

Future queries in this domain should be submitted there, as many of the RH 
package maintainers (e.g. Tom Callaway, aka Spot) read that list.

For R 3.3.0, it would appear that it is about a day away from being 
available for release:

  https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-6fc2c863b0

So for now, it would be available via the EPEL testing repos.

Otherwise, you can wait until it is available via release in the next day 
or so, or download the RPMS directly here:

  http://koji.fedoraproject.org/koji/buildinfo?buildID=762521

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Installing miniCRAN on Debian

2016-06-01 Thread G . Maubach

Hi All,

I am installng miniCRAN on Debian GNU Linux 8 Jessie (Linux analytics7 
4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1 (2016-05-13) x86_64 GNU/Linux) 
and R 3.3.0 

-- cut --
> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

locale:
 [1] LC_CTYPE=de_DE.UTF-8   LC_NUMERIC=C   LC_TIME=de_DE.UTF-8  
 
 [4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8
LC_MESSAGES=de_DE.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8   LC_NAME=C  LC_ADDRESS=C 
 
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C  
 

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_3.3.0
-- cut --

After running

sudo apt-get install libssl-dev libcurl4-openssl-dev libxml2-dev libhunspell-dev

and calling

install.packages(pkgs = "miniCRAN", repos = "http://cran.csiro.au;, 
dependencies = TRUE)

I get the message

- ANTICONF ERROR ---
Configuration failed because hunspell was not found. Try installing:
 * deb: libhunspell-dev (Debian, Ubuntu, etc)
 * rpm: hunspell-devel (Fedora, CentOS, RHEL)
 * brew: hunspell (Mac OSX)
If hunspell is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a hunspell.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'

Running

find / -name hunspell.pc

gives

/usr/lib/x86_64-linux-gnu/pkgconfig/hunspell.pc

and running

find / -name pkg-config

gives

/usr/share/bash-completion/completions/pkg-config

How do I need to configure R correctly to get miniCRAN running?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Utility Functions

2016-05-31 Thread G . Maubach

Hi All,

I was new to R and this list a couple of mounths ago. When processing my 
data I got tremendous support from R-Help mailing list.

The solutions I have worked out with your help might be also helpful for 
others. I have put the solutions in a couple of small functions with 
documentation and tests. You can find the software on Sourceforge.net at

https://sourceforge.net/projects/r-project-utilities/files/?source=navbar

You should download at least "r_toolbox.R" and store it in a directory 
like "r_toolbox" in your favourite project folder. Within "r_toolbox" 
folder put all the other files. You have to adjust the variable 
"t_toolbox_path" to your favourite project directory including the 
"r_toolbox" folder, e. g. "C:\My-Projects\t-toolbox\" on Windows or 
"/home/username/my-projects/r-toolbox" on Unix-like systems.

You can use them for your projects. Although I developed them with great 
care these functions come with absolutely no warrenty. You need to use 
them at your own risk. As the functions are small and overseeable you will 
find out quickly by reading the source code that the functions are save to 
use.

If you have any recommendations or improvement proposals please get back 
to me.

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Variable labels and value labels

2016-05-31 Thread G . Maubach

Hi All,

I am using R for social sciences. In this field I am used to use short 
variable names like "q1" for question 1, "q2" for question 2 and so on and 
label the variables like q1 : "Please tell us your age" or q2 : "Could you 
state us your household income?" or something similar indicating which 
question is stored in the variable.

Similar I am used to label values like 1: "Less than 18 years", 2 : "18 to 
30 years", 3 : "31 to 60 years" and 4 : "61 years and more".

I know that the packages Hmisc and memisc have a functionality for this 
but these labeling functions are limited to the packages they were defined 
for. Using the question tests as variable names is possible but very 
inconvenient.

I there another way for labeling variables and values in R?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Difference subsetting (dataset$variable vs. dataset["variable"]

2016-05-31 Thread G . Maubach

Hi All,

I thought dataset$variable is the same as dataset["variable"]. I tried the 
following:

> str(ZWW_Kunden$Branche)
 chr [1:49673] "231" "151" "151" "231" "231" "111" "231" "111" "231" "231" 
"151" "111" ...
> str(ZWW_Kunden["Branche"])
'data.frame':49673 obs. of  1 variable:
 $ Branche: chr  "231" "151" "151" "231" ...

and get different results: "chr {1:49673]" vs. "data.frame". First one is 
a simple vector, second one is a data.frame.

This has consequences when subsetting a dataset and filter cases:

> ZWW_Kunden["Branche"] %in% c("315", "316", "317")
[1] FALSE

> head(ZWW_Kunden$Branche %in% c("315", "316", "317")) # head() only to 
shorten output
[1] FALSE FALSE FALSE FALSE FALSE FALSE

I have thought dataset$variable is the same as dataset["variable"] but 
actually it's not.

Can you explain what the difference is?

Kind regards

Georg

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Antwort: Re: Creating a data frame from scratch (SOLVED)

2016-05-25 Thread G . Maubach

Hi Dan,
Hi All,

many thanks for your help.

Please find enclosed my little function for your use:

-- cut --

#---
# Module: t_count_na.R
# Author: Georg Maubach
# Date  : 2016-05-24
# Update: 2016-05-25
# Description   : Count NA's
# Source System : R 3.2.2 (64 Bit)
# Target System : R 3.2.2 (64 Bit)
# License   : CC-BY-SA-NC
#1-2-3-4-5-6-7-8

test <- FALSE

t_count_na <- function(dataset,
   variables = "all") {
  # Counts the number of NA within given set of veriables
  #
  # Args:
  #   dataset  : Object with dimnames, e.g. data frame, data table.
  #   variables: Character vector with variable names.
  #
  # Operation:
  #   Adds the variable "na_count" to the given dataset containing the 
count of
  #   NA's within the given variables
  #
  # Returns:
  #   Original dataset with variable "na_count" added.
  #
  # Error handling:
  #   None.
  #
  # Credits: 
  #   
http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame
  #   
http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html
 
  version <- "2016-05-25"
 
  if (identical(variables, "all")) {
variable_list <- names(dataset)
  }  else {
variable_list <- variables
  } 
  dataset[["na_count"]] <- apply(dataset[,variable_list],
 1, 
 function(x) sum(is.na(x)))
 
  return(dataset)
 
}

#---

test <- function(do_test = FALSE) {
 
  cat("\n", "\n", "Test function t_count_na()", "\n", "\n")
 
  # Example dataset
gene <- 
c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", 

 "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490",
  "ENSG09943004048")
hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1)
rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2)
cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)
ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
ds_example$gene <- as.character(ds_example$gene)
 
  cat("\n", "\n", "Example dataset before function call", "\n", "\n")
  print(ds_example)
 
  cat("\n", "\n", "Function call", "\n", "\n")
  ds_example <- t_count_na(dataset = ds_example,
   variables = c("mmul", "mmus"))
 
  cat("\n", "\n", "Example dataset after function call", "\n", "\n")
  print(ds_example)
}

test(do_test = test)

# EOF .

-- cut --

Kind regards

Georg Maubach




Von:"Nordlund, Dan (DSHS/RDA)" 
An:  "r-help@r-project.org" , 
Datum:  24.05.2016 21:41
Betreff:Re: [R] Creating a data frame from scratch
Gesendet von:   "R-help" 




I would probably write the function something like this:


t_count_na <- function(dataset,
   variables = "all") {
  if (identical(variables, "all")) {
variable_list <- names(dataset)
  }  else {
variable_list <- variables
  } 
  apply(dataset[,variable_list], 1, function(x) sum(is.na(x)))
}


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of
> g.maub...@gmx.de
> Sent: Tuesday, May 24, 2016 11:55 AM
> To: r-help@r-project.org
> Subject: [R] Creating a data frame from scratch
> 
> Hi All,
> 
> I need to create a data frame from scratch and fill variables created on 
the fly
> with values. What I have so far:
> 
> -- schnipp --
> 
> # Example dataset
> gene <-
> c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0
> 207604",
> 
> "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG0039403
> 9490",
>   "ENSG09943004048")
> hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
> mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
> mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1) rnor <- c(NA,2 ,NA, 1 , NA, 3 ,
> NA,NA, 2) cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)
> 
> ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
> ds_example$gene <- as.character(ds_example$gene)
> 
> t_count_na <- function(dataset,
>variables = "all")
>   # credit: http://stackoverflow.com/questions/4862178/remove-rows-with-
> nas-in-data-frame
>   {
>   ds_na <- data.frame()
>   # if variables = "all" create character vector of variable names
>   if (variables == "all") {
> variable_list <- dimnames(dataset)[[ 2 ]]
>   }
>   # if a character vector with variable names is given
>   # to run the function on a defined set of selected variables
>   else {
> variable_list <- variables
>   }
> 
>   for (var in variable_list) {
> new_name <-

[R] Creating a data frame from scratch

2016-05-24 Thread G . Maubach

Hi All,

I need to create a data frame from scratch and fill variables created on the 
fly with values. What I have so far:

-- schnipp --

# Example dataset
gene <- 
c("ENSG0208234","ENSG0199674","ENSG0221622","ENSG0207604", 
  "ENSG0207431","ENSG0221312","ENSG00134940305","ENSG00394039490",
  "ENSG09943004048")
hsap <- c(0,0,0, 0, 0, 0, 1,1, 1)
mmul <- c(NA,2 ,3, NA, 2, 1 , NA,2, NA)
mmus <- c(NA,2 ,NA, NA, NA, 2 , NA,3, 1)
rnor <- c(NA,2 ,NA, 1 , NA, 3 , NA,NA, 2)
cfam <- c(NA,2,NA, 2, 1, 2, 2,NA, NA)

ds_example <- data.frame(gene, hsap, mmul, mmus, rnor, cfam)
ds_example$gene <- as.character(ds_example$gene)

t_count_na <- function(dataset,
   variables = "all")
  # credit: 
http://stackoverflow.com/questions/4862178/remove-rows-with-nas-in-data-frame
  {
  ds_na <- data.frame()
  # if variables = "all" create character vector of variable names
  if (variables == "all") {
variable_list <- dimnames(dataset)[[ 2 ]] 
  }
  # if a character vector with variable names is given
  # to run the function on a defined set of selected variables
  else {
variable_list <- variables
  }
  
  for (var in variable_list) {
new_name <- paste0("na_", var)
ds_na[[ new_name ]] <- as.data.frame(is.na(dataset[[ var ]]))
  }
  
  ds_na[[ "na_count" ]] <- rowSums(ds_na)
  return(ds_na)
}

test <- t_count_na(dataset = ds_example, variables = c("mmul", "mmus"))

-- schnipp --

gives:

 Error in `[[<-.data.frame`(`*tmp*`, new_name, value = 
list(`is.na(dataset[[var]])` = c(TRUE,  : 
  replacement has 9 rows, data has 0 In addition: Warning message:
In if (variables == "all") { :
  the condition has length > 1 and only the first element will be used

My goal is to create a dataset from scratch on the fly which has the same 
amount of variables as the dataset ds_example plus a single variable storing 
the amount of NA's in a row for the given variables. This is the basis for a 
decious which cases to keep and which to drop.

I do not want to alter the base dataset like ds_example in the first place nor 
do I want to make a copy of the existing dataset due to memory allocation. The 
function shall also work with big data, e. g. datasets with more than 1 GB 
memory consumption.

I also do not want the newly created variables to be stored in the original 
data frame. They shall be separate.

A former similar solution worked:
http://r.789695.n4.nabble.com/Creating-variables-on-the-fly-td4720034.html

Why doesn't this one?

How do I create the variables within the data frame if the data frame is empty?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] WG: Filtering String Variables (SOLVED)

2016-05-23 Thread G . Maubach

Hi All,

the solution for my question is as follows

## Filter duplicates and correpsonding non-duplicates
### To filter duplicates and their corresponding non-duplicates use the
### following code snippet:
Debitor <- c("968691", "968691", "968691",
 "A04046", "A04046",
 "L0006", "L0006", "L0006",
 "L0023", "L0023",
 "L0056", "L0056",
 "L0094", "L0094", "L0094",
 "L0124", "L0124",
 "L0143", 
 "L0170",
 "13459",
 "473908",
 "394704",
 "4711",
 "4712",
 "4713")
Debitor <- as.character(Debitor)
var1 <- c(11, 12, 13,
  14, 14,
  12, 13, 14,
  10, 11,
  12, 12,
  12, 12, 12,
  15, 17,
  11,
  14,
  12,
  17,
  13,
  15,
  16,
  11)
ds_example <- data.frame(Debitor, var1)
ds_example$case_id <- 1:nrow(ds_example)
ds_example <- ds_example[, sort(colnames(ds_example))]
ds_example

# This task is to generate a data frame that contains the duplicates AND 
the
# corresponding non-duplicates to the duplicates.
# For example, finding the duplicates will deliver case 2 and 3 but the 
list
# should also contain case 1 because case 1 is the corresponding case to 
the
# duplicate cases 2 and 3.
# For the whole example dataset that would be:
needed <- c(1, 1, 1,
1, 1,
1, 1, 1,
1, 1,
1, 1,
1, 1, 1,
1, 1,
0, 0, 0, 0, 0, 0, 0, 0)
needed <- as.logical(needed)
ds_example <- data.frame(ds_example, needed)
ds_example

# To find the duplicates and the corresponding non-duplicates
duplicates <- duplicated(ds_example$Debitor)

list_of_duplicated_debitors <- as.character(ds_example[duplicates, 
"Debitor"])

filter_variable <- unique(list_of_duplicated_debitors)

### Wrong code. Do not run.
### ds_duplicates <- ds_example["Debitor" == filter_variable]  # Result: 
dataset with 0 columns
### duplicates_and_correponding_non_duplicates <- ds_example["Debitor"] 
%in% filter_variable  # Result: FALSE

duplicates_and_correponding_non_duplicates <- ds_example$Debitor %in% 
filter_variable  # Result: OK
duplicates_and_correponding_non_duplicates <- ds_example[, "Debitor"] %in% 
filter_variable  # Result: OK

### Create the dataset with duplicates and corresponding non-duplicates
ds_example <- ds_example[duplicates_and_correponding_non_duplicates, ]
ds_example

It was a simple mistake when subscripting.

Kind regards

Georg Maubach


- Weitergeleitet von Georg Maubach/WWBO/WW/HAW am 23.05.2016 15:54 
-

Von:Georg Maubach/WWBO/WW/HAW
An: r-help@r-project.org, 
Datum:  23.05.2016 15:28
Betreff:Filtering String Variables


# Hi All,
# 
# I have the following data frame (example):

Debitor <- c("968691", "968691", "968691",
 "A04046", "A04046",
 "L0006", "L0006", "L0006",
 "L0023", "L0023",
 "L0056", "L0056",
 "L0094", "L0094", "L0094",
 "L0124", "L0124",
 "L0143", 
 "L0170",
 "13459",
 "473908",
 "394704",
 "4711",
 "4712",
 "4713")
Debitor <- as.character(Debitor)
var1 <- c(11, 12, 13,
  14, 14,
  12, 13, 14,
  10, 11,
  12, 12,
  12, 12, 12,
  15, 17,
  11,
  14,
  12,
  17,
  13,
  15,
  16,
  11)
ds_example <- data.frame(Debitor, var1)
ds_example$case_id <- 1:nrow(ds_example)
ds_example <- ds_example[, sort(colnames(ds_example))]
ds_example

# I would like to generate a data frame that contains the duplicates AND 
the
# corresponding non-duplicates to the duplicates.
# For example, finding the duplicates with deliver case 2 and 3 but the 
list
# should also contain case 1 because case 1 is the corresponding case to 
the
# duplicate cases 2 and 3.
# For the whole example dataset that would be:
needed <- c(1, 1, 1,
1, 1,
1, 1, 1,
1, 1,
1, 1,
1, 1, 1,
1, 1,
0, 0, 0, 0, 0, 0, 0, 0)
needed <- as.logical(needed)
ds_example <- data.frame(ds_example, needed)
ds_example

# To find the duplicates and the corresponding non-duplicates
duplicates <- duplicated(ds_example$Debitor)

list_of_duplicated_debitors <- as.character(ds_example[duplicates, 
"Debitor"])

filter_variable <- unique(list_of_duplicated_debitors)

ds_duplicates <- ds_example["Debitor" == filter_variable]  # Result: 
dataset with 0 columns

ds_duplicates <- ds_example["Debitor"] %in% filter_variable  # Result: 
FALSE

# How can I create a dataset like this

ds_example <- ds_example[needed, ]
ds_example

# using the Debitor IDs?

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To

1 2 >

1 - 100 of 120 matches

Mail list logo