[R] R WORDLE tool concepts vary

2022-02-24 Thread Avi Gross via R-help
This forum often discusses different ways of looking at a problem that result 
in often very different R formulations. Many may be RIGHT in a sense but some 
clearly have tradeoffs or work only in limited circumstances or come from some 
limited way of thinking. Note this message got to be a bit long, so feel free 
to skip it.

With this in mind, I saw the following article mentioned on R-bloggers about a 
sort of helper function that can be used as a way to identify what your next 
possible moves might be in the popular new game called WORDLE. The following 
article explains it well if interested:

https://www.r-bloggers.com/2022/02/wordle-solve-wordle-with-r/

I am not sure who the author is and clearly you can solve the problem in many 
ways using many languages. I looked at the code they used to learn from it. It 
is fairly compact and depends on another package that is used to look up 
potential words to see if they exist.

  library(hunspell)
 
  black <- ""

  exclude <- function(local, global = black) {
LETTERS[!LETTERS %in% c(local, global)]
  }

  wordle <- function(one, two, three, four, five, include = "") {
words <- apply(expand.grid(one, two, three, four, five), 1, \(x) paste(x, 
collapse = ""))
words <- words[grepl(paste0("(?=.*", include, ")", collapse = ""), words, 
perl = TRUE)]
words[hunspell_check(words)]
  }

Rather than explaining everything again, each move in this game is a guess of a 
five letter word. The feedback is to tell you which letters in your guess are 
NOT in the word of the day and which, if any, are correct and in the right 
position and which are correct but in a wrong position. Based on the feedback, 
you try another 5-letter word that may illuminate you further. You get six 
tries. I do not generally need help and always guess it in no more than 4-5 and 
several times in 3 and once, weirdly, in 2 tries. But it can be helpful to get 
a list of what words remain potentially valid and thus this function.

In English, the code above accepts inputs that boil down to which of the 26 
uppercase letters are allowed at this point in column ONE, and another set for 
two through five. It also uses info (one oddly kept in a global variable) that 
keep track of which letters are known not to be in the word at all as well as 
another holding those known to be there at least one time.

The first line can be translated into English:

  words <- apply(expand.grid(one, two, three, four, five), 1, \(x) paste(x, 
collapse = ""))

Basically it makes a potentially huge data.frame containing all combinations of 
words. Early in the game that may mean something in the neighborhood of 20 
possible letters for as many as five positions which can mean creating a 
data.frame on the order of millions of rows. The five columns are then stitched 
together into the potentially millions of 5-letter strings. UGH!

The next line of code uses a regular expression to test that all letters KNOWN 
at that point to be in the answer MUST  be present and any other filtered out. 

  words <- words[grepl(paste0("(?=.*", include, ")", collapse = ""), words, 
perl = TRUE)]

Finally, the other package is asked to look up tons of remaining words and 
return those deemed to be TRUE words.

  words[hunspell_check(words)]

Although I appreciated the terseness of the code, I was appalled, especially at 
how long this ran at times. It seems too brute force! Why make all possible 5 
letter words first, rather than select only 5 letter words that match a pattern?

My solution, in English, was to make a regular expression dynamically that 
looked like:

  "^[cond1][cond2][cond2][cond4][cond5]$"

With appropriate actual conditions, the above would match only five-letter 
words. If you said that that the first character was the letter "S" the 
condition was written as [S] and if you had a vector of possible letters like 
c("A", "E", "I", "O", "U") the condition would obviously be [AEIOU] and finally 
my version of the exclude function above added a caret symbol as in [^AEIOU] to 
mean the negation.

On top of that, why search the entire dictionary rather than one with just 5 
letter words? On my machine, it searches a file called en_US.dic with 49,271 
lines/words that also includes additional info on some words like:

  zucchini/MS
  zwieback/M

So the first thing I did was run the above code as if the game had just begun 
and saved the results in a character vector called dict. The length is down 
nicely to 6494 five-letter words. And the technique is now not to search the 
entire set of possible words potentially repeatedly in that larger framework, 
but to match one word at a time for a pattern and keep the results. This runs 
much quicker, even if the word list is now saved as a file. Here is my first 
attempt and I am wondering if this approach has positives or negatives I have 
not considered/


  # WORDLE checker

  ## Find available WORDLE words given stated conditions
  ## assuming "dict" has b

Re: [R] installation problem for a new package

2022-02-24 Thread Jeff Newmiller
Do use "reply-all"... others may be able to respond more quickly or more 
accurately than I. I have re-introduced the mailing list to this reply.

For example, if you read [1] it says there is a system requirement that the 
completely separate gmp software be installed using your system software 
installation tools. I happen to be unfamiliar with the details of MacOS system 
administration, so I cannot offer useful guidance other than that you need to 
solve this and that there exists a mailing list dedicated to such 
MacOS-specific questions about R [2]. You can of course simply Google for 
similar threads or blogs about gmp on MacOS to find clues as well.

I will also add that there seem to be a variety of conditions that can cause 
the automatic dependency management features of install.packages() to fail, and 
in such cases it is normal to focus on installing the problematic dependent 
packages one at a time. This is particularly true in this case where there are 
requirements for the dependency beyond what R can deal with, so ignoring advice 
from those with more experience than yourself about focusing on the dependency 
separately is probably less productive than simply following that advice.

[1] https://cran.r-project.org/web/packages/gmp/index.html
[2] https://stat.ethz.ch/mailman/listinfo/r-sig-mac

On February 24, 2022 12:11:14 PM PST, Tariq Khasiri  
wrote:
>Will correct my formatting as you kindly suggested.
>
>I believe gmp and CVXR is not getting installed but with the dependencies,
>it's supposed to be installed as the package builder posted in his website.
>Therefore, I'm quite puzzled that do I need to install the other two
>packages separately or not?
>
>install_github("asheshrambachan/HonestDiD", dependencies = TRUE)
>Downloading GitHub repo asheshrambachan/HonestDiD@HEAD
>Installing 1 packages: gmp
>
>  There is a binary version available but the source version is later:
> binary source needs_compilation
>gmp 0.6-2.1  0.6-4  TRUE
>
>Do you want to install from sources the package which needs compilation?
>(Yes/no/cancel) yes
>installing the source package ‘gmp’
>
>trying URL 'https://cran.rstudio.com/src/contrib/gmp_0.6-4.tar.gz'
>Content type 'application/x-gzip' length 163941 bytes (160 KB)
>==
>downloaded 160 KB
>
>* installing *source* package ‘gmp’ ...
>** package ‘gmp’ successfully unpacked and MD5 sums checked
>** using staged installation
>checking for gcc... clang -mmacosx-version-min=10.13
>checking whether the C compiler works... yes
>checking for C compiler default output file name... a.out
>checking for suffix of executables...
>checking whether we are cross compiling... no
>checking for suffix of object files... o
>checking whether we are using the GNU C compiler... yes
>checking whether clang -mmacosx-version-min=10.13 accepts -g... yes
>checking for clang -mmacosx-version-min=10.13 option to accept ISO C89...
>none needed
>checking how to run the C preprocessor... clang -mmacosx-version-min=10.13
>-E
>checking whether we are using the GNU C++ compiler... yes
>checking whether clang++ -mmacosx-version-min=10.13 -std=gnu++14 accepts
>-g... yes
>checking for grep that handles long lines and -e... /usr/bin/grep
>checking for egrep... /usr/bin/grep -E
>checking for ANSI C header files... rm: conftest.dSYM: is a directory
>rm: conftest.dSYM: is a directory
>yes
>checking for sys/types.h... yes
>checking for sys/stat.h... yes
>checking for stdlib.h... yes
>checking for string.h... yes
>checking for memory.h... yes
>checking for strings.h... yes
>checking for inttypes.h... yes
>checking for stdint.h... yes
>checking for unistd.h... yes
>checking gmp.h usability... no
>checking gmp.h presence... no
>checking for gmp.h... no
>configure: error: Header file gmp.h not found; maybe use
>--with-gmp-include=INCLUDE_PATH
>ERROR: configuration failed for package ‘gmp’
>* removing
>‘/Library/Frameworks/R.framework/Versions/4.1/Resources/library/gmp’
>
>The downloaded source packages are in
>‘/private/var/folders/4m/tvx9lnqs0rx6wgnysxz36vm0gp/T/Rtmpk3pocK/downloaded_packages’
>✓  checking for file
>‘/private/var/folders/4m/tvx9lnqs0rx6wgnysxz36vm0gp/T/Rtmpk3pocK/remotes86e84daddc5e/asheshrambachan-HonestDiD-419f305/DESCRIPTION’
>...
>─  preparing ‘HonestDiD’:
>✓  checking DESCRIPTION meta-information ...
>─  checking for LF line-endings in source and make files and shell scripts
>─  checking for empty or unneeded directories
>─  building ‘HonestDiD_0.2.0.tar.gz’
>
>* installing *source* package ‘HonestDiD’ ...
>** using staged installation
>** R
>** data
>*** moving datasets to lazyload DB
>** inst
>** byte-compile and prepare package for lazy loading
>Error: package or namespace load failed for ‘CVXR’ in loadNamespace(j <-
>i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
> there is no package called ‘gmp’
>Error: package ‘CVXR’ could not be loaded
>Execution halted
>ERROR: lazy loading failed for package ‘H

Re: [R] installation problem for a new package

2022-02-24 Thread Jeff Newmiller
These messages seem relevant:

>Error: package �CVXR’ could not be loaded
>Warning messages:
>1: In i.p(...) : installation of package �gmp’ had non-zero exit status

You need to make sure these packages are installed successfully before the 
package you are interested will install.

Please don't send formatted email to this list ... as the Posting Guide 
indicates, the list is for plain text and others may receive a garbled version 
of what you sent it your email program is not configured properly.

On February 24, 2022 11:43:05 AM PST, Tariq Khasiri  
wrote:
>Hello everyone,
>
>I'm trying to install a package honestdid and following the commands of the
>developer of that package
>
># Install some packages
>library(devtools)
>install_github("bcallaway11/BMisc", dependencies = TRUE)
>install_github("bcallaway11/did", dependencies = TRUE)
>install_github("asheshrambachan/HonestDiD", dependencies = TRUE)
>
>
>But, the error message is saying:
>
>* installing *source* package �HonestDiD’ ...
>** using staged installation
>** R
>** data
>*** moving datasets to lazyload DB
>** inst
>** byte-compile and prepare package for lazy loading
>Error: package or namespace load failed for �CVXR’ in loadNamespace(j
><- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
> there is no package called �gmp’
>Error: package �CVXR’ could not be loaded
>Execution halted
>ERROR: lazy loading failed for package �HonestDiD’
>* removing 
>�/Library/Frameworks/R.framework/Versions/4.1/Resources/library/HonestDiD’
>Warning messages:
>1: In i.p(...) : installation of package �gmp’ had non-zero exit status
>2: In i.p(...) :
>  installation of package
>�/var/folders/4m/tvx9lnqs0rx6wgnysxz36vm0gp/T//Rtmpk3pocK/file86e83034c58c/HonestDiD_0.2.0.tar.gz’
>had non-zero exit status
>
>
>Can anyone guide me why i'm having this issue? Thanks in advance !
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pixel Image Reshaping using R

2022-02-24 Thread Ivan Krylov
On Thu, 24 Feb 2022 13:31:09 -0500
Paul Bernal  wrote:

> Basically, what I am being asked to do is to take the
> train.csv dataset, and store it in a data structure so that the data
> can be reshaped into a matrix of size 28 x 28, then I just need to
> print some indices of that (e.g. index 1, 2, 4, 7,, etc.)

Is this homework? It's considered a good idea to contact your
instructor with homework questions.

> dataframe_train <- array(read.csv(file_path_2, header=TRUE,
> stringsAsFactors = FALSE), 28, 28)

Almost there. Two problems left:

1. You did not remove the first column. R will do what is asked of it
and mix the labels with the pixel values. You will probably need to
subset the data frame that read.csv() returns to remove it before
reshaping the rest of it into a three-way array.

3. The dimensions of the array should be specified in a single
3-element vector: c(N, 28, 28). I don't think there's a convenient way
to do that in a single expression. Once you have your pixels in an N by
784 matrix, use c(nrow(your.matrix), 28, 28) to construct the
dimension vector.

> then printing out several indices by doing (I know this could be done
> with a for loop better but I tried this):

> print(dataframe_train[1])

That chooses the N'th pixel value from the whole array. The syntax to
extract whole slices of an array is different:
https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Array-indexing

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pixel Image Reshaping using R

2022-02-24 Thread Paul Bernal
Thank you Ivan. Basically, what I am being asked to do is to take the
train.csv dataset, and store it in a data structure so that the data can be
reshaped into a matrix of size 28 x 28, then I just need to print some
indices of that (e.g. index 1, 2, 4, 7,, etc.)

I basically tried the following:
dataframe_train <- array(read.csv(file_path_2, header=TRUE,
stringsAsFactors = FALSE), 28, 28)

then printing out several indices by doing (I know this could be done with
a for loop better but I tried this):

print(dataframe_train[1])
print(dataframe_train[2])
print(dataframe_train[4])
print(dataframe_train[7])
print(dataframe_train[8])
print(dataframe_train[9])
print(dataframe_train[11])
print(dataframe_train[12])
print(dataframe_train[17])
print(dataframe_train[22])

I am just unsure if I am achieving what is asked.

Best regards,
Paul


El jue, 24 feb 2022 a las 13:25, Ivan Krylov ()
escribió:

> On Thu, 24 Feb 2022 13:12:16 -0500
> Paul Bernal  wrote:
>
> > dataframe_train <- as.matrix((read.csv(file_path_2, header=TRUE,
> > stringsAsFactors = FALSE)))
>
> Have you removed the first column containing the labels?
>
> > dim(dataframe_train) <- c(28,28)
>
> This assumes that dataframe_train is a single 784-element vector.
> Presumably, it's a whole matrix containing many such vectors as rows.
>
> > Would this do the work to reshape original dataset into a 28 x 28
> > matrix?
>
> Probably not. Use image() to plot a matrix and check. Wouldn't you want
> the original dataset to consist of many such matrices, that is, a
> three-way array?
>
> > When I print the original dataframe I get the message:
> > [ reached getOption("max.print") -- omitted 41999 rows ] this only
> > means that R will not pront the whole data, but is not trimming
> > anything right?
>
> That's right. Use str() to examine objects. Most of them (except long
> and/or deeply nested lists) should produce shorter output that's easier
> to understand.
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pixel Image Reshaping using R

2022-02-24 Thread Paul Bernal
Dear Ivan, this is what I did:

dataframe_train <- as.matrix((read.csv(file_path_2, header=TRUE,
stringsAsFactors = FALSE)))
dim(dataframe_train) <- c(28,28)
The file I read was the one I attached in the first email. Would this do
the work to reshape original dataset into a 28 x 28 matrix? When I print
the original dataframe I get the message: [ reached getOption("max.print")
-- omitted 41999 rows ] this only means that R will not pront the whole
data, but is not trimming anything right?

Best regards,
Paul


El jue, 24 feb 2022 a las 12:00, Ivan Krylov ()
escribió:

> On Thu, 24 Feb 2022 11:00:08 -0500
> Paul Bernal  wrote:
>
> > Each pixel column in the training set has a name like pixel x, where
> > x is an integer between 0 and 783, inclusive. To locate this pixel on
> > the image, suppose that we have decomposed x as x = i ∗ 28 + j, where
> > i and j are integers between 0 and 27, inclusive.
>
> > I have been looking for information about how to process this with R,
> > but have not found anything yet.
>
> Given a 784-element vector x, you can reshape it into a 28 by 28 matrix:
>
> dim(x) <- c(28, 28)
>
> Or create a new matrix: matrix(x, 28, 28)
>
> Working with more dimensions is also possible. A matrix X with dim(X)
> == c(n, 784) can be transformed into a three-way array in place or
> copied into one:
>
> dim(X) <- c(dim(X)[1], 28, 28)
> array(X, c(dim(X)[1], 28, 28))
>
> (Replace 28,28 with 784 for an inverse transformation. In modern
> versions of R, two-way arrays are more or less the same as matrices,
> but old versions may disagree with that in some corner cases.)
>
> For more information, see ?dim, ?matrix, ?array.
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pixel Image Reshaping using R

2022-02-24 Thread Sarah Goslee
Hi Paul,

I may be missing something, but you can transform a vector to a matrix
of any desired size by using matrix().

For more nuanced processing of images, you might look into one of the
many image processing packages in R, or even the raster package (or
the newer terra).

Sarah

On Thu, Feb 24, 2022 at 11:00 AM Paul Bernal  wrote:
>
> Dear friends,
>
> I apologize if the description is  a bit long, but I think that I need to
> be as specific as possible so that you guys can help.
>
> I wil share with you a file (train.csv), which contains gray-scale images
> of hand-drawn digits, from zero through 9.
>
> Each image is 28 pixels in height and 28 pixels in width, for a total of
> 784 pixels in total. Each pixel has a single pixel-value associated with
> it, indicating the lightness or darkness of that pixel, with higher numbers
> meaning darker. This pixel-value is an integer between 0 and 255,
> inclusive. The training data set, (train.csv), has 785 columns. The first
> column, called ”label”, is the digit that was drawn by the user. The rest
> of the columns contain the pixel-values of the associated image. Each pixel
> column in the training set has a name like pixel x, where x is an integer
> between 0 and 783, inclusive. To locate this pixel on the image, suppose
> that we have decomposed x as x = i ∗ 28 + j, where i and j are integers
> between 0 and 27, inclusive. Then pixel x is located on row i and column j
> of a 28 x 28 matrix, (indexing by zero). or example, pixel 31 indicates the
> pixel that is in the fourth column from the left, and the second row from
> the top, as in the ascii-diagram below.
>
>   This data is set up in a csv file which will require the reshaping of the
> data to be 28 × 28 matrix representing images. There are 42000 images in
> the train.csv file. For this problem it is only necessary to process
> approximately 100 images, 10 each of the numbers from 0 through 9. The goal
> is to learn how to generate features from images using transforms and first
> order statistics.
>
> So I need to develop an algorithm to store the data in a data structure
> such that the data is reshaped into a matrix of size 28 x 28 and then I
> have to plot the developed matrix for indices 1, 2, 4, 7, 8, 9, 11. 12, 17
> and 22.
>
> I have been looking for information about how to process this with R, but
> have not found anything yet.
>
> The dataset is attached in this e-mail for your reference.
>
> Any help and/or guidance will be greatly appreciated.
>
> Best regards,
> Paul
>  train.csv
> 
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee (she/her)
http://www.sarahgoslee.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pixel Image Reshaping using R

2022-02-24 Thread Ivan Krylov
On Thu, 24 Feb 2022 11:00:08 -0500
Paul Bernal  wrote:

> Each pixel column in the training set has a name like pixel x, where
> x is an integer between 0 and 783, inclusive. To locate this pixel on
> the image, suppose that we have decomposed x as x = i ∗ 28 + j, where
> i and j are integers between 0 and 27, inclusive. 

> I have been looking for information about how to process this with R,
> but have not found anything yet.

Given a 784-element vector x, you can reshape it into a 28 by 28 matrix:

dim(x) <- c(28, 28)

Or create a new matrix: matrix(x, 28, 28)

Working with more dimensions is also possible. A matrix X with dim(X)
== c(n, 784) can be transformed into a three-way array in place or
copied into one:

dim(X) <- c(dim(X)[1], 28, 28)
array(X, c(dim(X)[1], 28, 28))

(Replace 28,28 with 784 for an inverse transformation. In modern
versions of R, two-way arrays are more or less the same as matrices,
but old versions may disagree with that in some corner cases.)

For more information, see ?dim, ?matrix, ?array.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.