Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Paul Bernal
Hi Bert,

After doing sapply(your_dataframe, "class"), it seems like R is recognizing
that the field is of type numeric after all, the problem (it seems) is how
the import_list() function from the rio package is reading the data (my
suspicion).

Best regards,
Paul

El mar, 30 ene 2024 a las 14:59, Paul Bernal ()
escribió:

> Hi Bert,
>
> Below the information you asked me for:
>
> nrow(mydataset)
> [1] 2986276
>
> 
>
> sapply(mydataset, "class")
> $`Transit Date`
> [1] "POSIXct" "POSIXt"
>
> $`Market Segment`
> [1] "character"
>
> $`Número de Tránsitos`
> [1] "numeric"
>
> $`Tar No`
> [1] "character"
>
> $`Beam Range (Operations)`
> [1] "character"
>
> $`Operational Vessel Ranges Group`
> [1] "character"
>
> $`Rcnst PCUMS`
> [1] "numeric"
>
> $`Toll Amount`
> [1] "numeric"
>
> $Beam
> [1] "numeric"
>
> $Length
> [1] "numeric"
>
> $`Trn Draft (FT)`
> [1] "numeric"
>
> $`Other Income Amt`
> [1] "numeric"
>
> $`Total Other Income Amount`
> [1] "logical"
>
> $`Booking Charges`
> [1] "numeric"
>
> $`Booking Cancellation`
> [1] "logical"
>
> $`Booking Auction`
> [1] "logical"
>
> $`_file`
> [1] "integer"
>
> Hope this helps you understand what I am dealling with.
>
> Cheers,
> Paul
>
> El mar, 30 ene 2024 a las 14:19, Bert Gunter ()
> escribió:
>
>> Incidentally, "didn't work" is not very useful information. Please tell
>> us exactly what error message or apparently aberrant result you received.
>> Also, what do you get from:
>>
>> sapply(your_dataframe, "class")
>> nrow(your_dataframe)
>>
>> (as I suspect what you think it is, isn't).
>>
>> Cheers,
>> Bert
>>
>> On Tue, Jan 30, 2024 at 11:01 AM Bert Gunter 
>> wrote:
>>
>>> "I cannot change the data type from
>>> boolean to numeric. I tried doing dataset$my_field =
>>> as.numeric(dataset$my_field), I also tried to do dataset <-
>>> dataset[complete.cases(dataset), ], didn't work either. "
>>>
>>> Sorry, but all I can say is: huh?
>>>
>>> > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4)
>>> > dt
>>>   a b
>>> 1NA 1
>>> 2NA 2
>>> 3 FALSE 3
>>> 4  TRUE 4
>>> > sapply(dt, class)
>>> a b
>>> "logical" "integer"
>>> > dt$a <- as.numeric(dt$a)
>>> > dt
>>>a b
>>> 1 NA 1
>>> 2 NA 2
>>> 3  0 3
>>> 4  1 4
>>> > sapply(dt, class)
>>> a b
>>> "numeric" "integer"
>>>
>>> So either I'm missing something or you are. Happy to be corrected and
>>> chastised if the former.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> On Tue, Jan 30, 2024 at 10:41 AM Paul Bernal 
>>> wrote:
>>>
 Dear friend Duncan,

 Thank you so much for your kind reply. Yes, that is exactly what is
 happening, there are a lot of NA values at the start, so R assumes that
 the
 field is of type boolean. The challenge that I am facing is that I want
 to
 read into R an Excel file that has many sheets (46 in this case) but I
 wanted to combine all 46 sheets into a single dataframe (since the
 columns
 are exactly the same for all 46 sheets). The rio package does this
 nicely,
 the problem is that, once I have the full dataframe (which amounts to
 roughly 2.98 million rows total), I cannot change the data type from
 boolean to numeric. I tried doing dataset$my_field =
 as.numeric(dataset$my_field), I also tried to do dataset <-
 dataset[complete.cases(dataset), ], didn't work either.

 The only thing that worked for me was to take a single sheed and through
 the read_excel function use the guess_max parameter and set it to a
 sufficiently large number (a number >= to the total amount of the full
 merged dataset). I want to automate the merging of the N number of Excel
 sheets so that I don't have to be manually doing it. Unless there is a
 way
 to accomplish something similar to what rio's package function
 import_list
 does, that is able to keep the field's numeric data type nature.

 Cheers,
 Paul

 El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<
 murdoch.dun...@gmail.com>)
 escribió:

 > On 30/01/2024 11:10 a.m., Paul Bernal wrote:
 > > Dear friends,
 > >
 > > Hope you are doing well. I am currently using R version 4.3.2, and I
 > have a
 > > .xlsx file that has 46 sheets on it. I basically combined  all 46
 sheets
 > > and read them as a single dataframe in R using package rio.
 > >
 > > I read a solution using package readlx, as suggested in a
 StackOverflow
 > > discussion as follows:
 > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
 > 10).
 > > Now, when you have so many sheets (46 in my case) in an Excel file,
 the
 > rio
 > > methodology is more practical.
 > >
 > > This is what I did:
 > > path =
 > >
 >
 "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
 > > (4).xlsx"
 > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my
 > dataset

Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Paul Bernal
Hi Bert,

Below the information you asked me for:

nrow(mydataset)
[1] 2986276



sapply(mydataset, "class")
$`Transit Date`
[1] "POSIXct" "POSIXt"

$`Market Segment`
[1] "character"

$`Número de Tránsitos`
[1] "numeric"

$`Tar No`
[1] "character"

$`Beam Range (Operations)`
[1] "character"

$`Operational Vessel Ranges Group`
[1] "character"

$`Rcnst PCUMS`
[1] "numeric"

$`Toll Amount`
[1] "numeric"

$Beam
[1] "numeric"

$Length
[1] "numeric"

$`Trn Draft (FT)`
[1] "numeric"

$`Other Income Amt`
[1] "numeric"

$`Total Other Income Amount`
[1] "logical"

$`Booking Charges`
[1] "numeric"

$`Booking Cancellation`
[1] "logical"

$`Booking Auction`
[1] "logical"

$`_file`
[1] "integer"

Hope this helps you understand what I am dealling with.

Cheers,
Paul

El mar, 30 ene 2024 a las 14:19, Bert Gunter ()
escribió:

> Incidentally, "didn't work" is not very useful information. Please tell us
> exactly what error message or apparently aberrant result you received.
> Also, what do you get from:
>
> sapply(your_dataframe, "class")
> nrow(your_dataframe)
>
> (as I suspect what you think it is, isn't).
>
> Cheers,
> Bert
>
> On Tue, Jan 30, 2024 at 11:01 AM Bert Gunter 
> wrote:
>
>> "I cannot change the data type from
>> boolean to numeric. I tried doing dataset$my_field =
>> as.numeric(dataset$my_field), I also tried to do dataset <-
>> dataset[complete.cases(dataset), ], didn't work either. "
>>
>> Sorry, but all I can say is: huh?
>>
>> > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4)
>> > dt
>>   a b
>> 1NA 1
>> 2NA 2
>> 3 FALSE 3
>> 4  TRUE 4
>> > sapply(dt, class)
>> a b
>> "logical" "integer"
>> > dt$a <- as.numeric(dt$a)
>> > dt
>>a b
>> 1 NA 1
>> 2 NA 2
>> 3  0 3
>> 4  1 4
>> > sapply(dt, class)
>> a b
>> "numeric" "integer"
>>
>> So either I'm missing something or you are. Happy to be corrected and
>> chastised if the former.
>>
>> Cheers,
>> Bert
>>
>>
>> On Tue, Jan 30, 2024 at 10:41 AM Paul Bernal 
>> wrote:
>>
>>> Dear friend Duncan,
>>>
>>> Thank you so much for your kind reply. Yes, that is exactly what is
>>> happening, there are a lot of NA values at the start, so R assumes that
>>> the
>>> field is of type boolean. The challenge that I am facing is that I want
>>> to
>>> read into R an Excel file that has many sheets (46 in this case) but I
>>> wanted to combine all 46 sheets into a single dataframe (since the
>>> columns
>>> are exactly the same for all 46 sheets). The rio package does this
>>> nicely,
>>> the problem is that, once I have the full dataframe (which amounts to
>>> roughly 2.98 million rows total), I cannot change the data type from
>>> boolean to numeric. I tried doing dataset$my_field =
>>> as.numeric(dataset$my_field), I also tried to do dataset <-
>>> dataset[complete.cases(dataset), ], didn't work either.
>>>
>>> The only thing that worked for me was to take a single sheed and through
>>> the read_excel function use the guess_max parameter and set it to a
>>> sufficiently large number (a number >= to the total amount of the full
>>> merged dataset). I want to automate the merging of the N number of Excel
>>> sheets so that I don't have to be manually doing it. Unless there is a
>>> way
>>> to accomplish something similar to what rio's package function
>>> import_list
>>> does, that is able to keep the field's numeric data type nature.
>>>
>>> Cheers,
>>> Paul
>>>
>>> El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<
>>> murdoch.dun...@gmail.com>)
>>> escribió:
>>>
>>> > On 30/01/2024 11:10 a.m., Paul Bernal wrote:
>>> > > Dear friends,
>>> > >
>>> > > Hope you are doing well. I am currently using R version 4.3.2, and I
>>> > have a
>>> > > .xlsx file that has 46 sheets on it. I basically combined  all 46
>>> sheets
>>> > > and read them as a single dataframe in R using package rio.
>>> > >
>>> > > I read a solution using package readlx, as suggested in a
>>> StackOverflow
>>> > > discussion as follows:
>>> > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
>>> > 10).
>>> > > Now, when you have so many sheets (46 in my case) in an Excel file,
>>> the
>>> > rio
>>> > > methodology is more practical.
>>> > >
>>> > > This is what I did:
>>> > > path =
>>> > >
>>> >
>>> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
>>> > > (4).xlsx"
>>> > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my
>>> > dataset
>>> > >
>>> > > Now, it successfully imports and merges all records, however, some
>>> fields
>>> > > (despite being numeric), R interprets as a boolean field.
>>> > >
>>> > > Here is the structure of the field that is causing me problems (I
>>> > apologize
>>> > > for the length):
>>> > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 

Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread CALUM POLWART
And your other option - recode what gets imported. It may well be you will
actually want the blanks to be NAs for instance rather than blank. I'm
assuming the True and False are >$0 and $0 from your description. (Or maybe
vice versa). So I'd have made my column name something like
"OverZeroDollars" and then your data makes sense... Depends what data
processing comes next.

On Tue, 30 Jan 2024, 19:41 Duncan Murdoch,  wrote:

> If you are using the read_excel() function from the readxl package, then
> there's an argument named col_types that lets you specify the types to use.
>
> You could specify col_types = "numeric" to read all columns as numeric
> columns.  If some columns are different types, you should specify a
> vector of type names, with one entry per column.  Allowable names are
> "skip", "guess", "logical", "numeric", "date", "text" or "list".  You'll
> have to read the docs to find out what some of those do.
>
> Duncan Murdoch
>
> On 30/01/2024 1:40 p.m., Paul Bernal wrote:
> > Dear friend Duncan,
> >
> > Thank you so much for your kind reply. Yes, that is exactly what is
> > happening, there are a lot of NA values at the start, so R assumes that
> > the field is of type boolean. The challenge that I am facing is that I
> > want to read into R an Excel file that has many sheets (46 in this case)
> > but I wanted to combine all 46 sheets into a single dataframe (since the
> > columns are exactly the same for all 46 sheets). The rio package does
> > this nicely, the problem is that, once I have the full dataframe (which
> > amounts to roughly 2.98 million rows total), I cannot change the data
> > type from boolean to numeric. I tried doing dataset$my_field =
> > as.numeric(dataset$my_field), I also tried to do dataset <-
> > dataset[complete.cases(dataset), ], didn't work either.
> >
> > The only thing that worked for me was to take a single sheed and through
> > the read_excel function use the guess_max parameter and set it to a
> > sufficiently large number (a number >= to the total amount of the full
> > merged dataset). I want to automate the merging of the N number of Excel
> > sheets so that I don't have to be manually doing it. Unless there is a
> > way to accomplish something similar to what rio's package function
> > import_list does, that is able to keep the field's numeric data type
> nature.
> >
> > Cheers,
> > Paul
> >
> > El mar, 30 ene 2024 a las 12:23, Duncan Murdoch
> > (mailto:murdoch.dun...@gmail.com>>) escribió:
> >
> > On 30/01/2024 11:10 a.m., Paul Bernal wrote:
> >  > Dear friends,
> >  >
> >  > Hope you are doing well. I am currently using R version 4.3.2,
> > and I have a
> >  > .xlsx file that has 46 sheets on it. I basically combined  all 46
> > sheets
> >  > and read them as a single dataframe in R using package rio.
> >  >
> >  > I read a solution using package readlx, as suggested in a
> > StackOverflow
> >  > discussion as follows:
> >  > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
> > 10).
> >  > Now, when you have so many sheets (46 in my case) in an Excel
> > file, the rio
> >  > methodology is more practical.
> >  >
> >  > This is what I did:
> >  > path =
> >  >
> >
>  
> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
> >  > (4).xlsx"
> >  > figidat = import_list(path, rbind = TRUE) #here figidat refers to
> > my dataset
> >  >
> >  > Now, it successfully imports and merges all records, however,
> > some fields
> >  > (despite being numeric), R interprets as a boolean field.
> >  >
> >  > Here is the structure of the field that is causing me problems (I
> > apologize
> >  > for the length):
> >  > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> >  > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > ...
> >  > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
> >  > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> >  > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
> >  > ), row.names = c(NA, -7033L))
> >  >
> >  > As you can see, when I do the dput, it gives me a bunch of TRUE
> > and FALSE
> >  > values, when in reality I have records with value $0, records
> > with amounts
> >  >> $0 and also a bunch of blank records.
> >  >
> 

Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Duncan Murdoch
If you are using the read_excel() function from the readxl package, then 
there's an argument named col_types that lets you specify the types to use.


You could specify col_types = "numeric" to read all columns as numeric 
columns.  If some columns are different types, you should specify a 
vector of type names, with one entry per column.  Allowable names are
"skip", "guess", "logical", "numeric", "date", "text" or "list".  You'll 
have to read the docs to find out what some of those do.


Duncan Murdoch

On 30/01/2024 1:40 p.m., Paul Bernal wrote:

Dear friend Duncan,

Thank you so much for your kind reply. Yes, that is exactly what is 
happening, there are a lot of NA values at the start, so R assumes that 
the field is of type boolean. The challenge that I am facing is that I 
want to read into R an Excel file that has many sheets (46 in this case) 
but I wanted to combine all 46 sheets into a single dataframe (since the 
columns are exactly the same for all 46 sheets). The rio package does 
this nicely, the problem is that, once I have the full dataframe (which 
amounts to roughly 2.98 million rows total), I cannot change the data 
type from boolean to numeric. I tried doing dataset$my_field = 
as.numeric(dataset$my_field), I also tried to do dataset <- 
dataset[complete.cases(dataset), ], didn't work either.


The only thing that worked for me was to take a single sheed and through 
the read_excel function use the guess_max parameter and set it to a 
sufficiently large number (a number >= to the total amount of the full 
merged dataset). I want to automate the merging of the N number of Excel 
sheets so that I don't have to be manually doing it. Unless there is a 
way to accomplish something similar to what rio's package function 
import_list does, that is able to keep the field's numeric data type nature.


Cheers,
Paul

El mar, 30 ene 2024 a las 12:23, Duncan Murdoch 
(mailto:murdoch.dun...@gmail.com>>) escribió:


On 30/01/2024 11:10 a.m., Paul Bernal wrote:
 > Dear friends,
 >
 > Hope you are doing well. I am currently using R version 4.3.2,
and I have a
 > .xlsx file that has 46 sheets on it. I basically combined  all 46
sheets
 > and read them as a single dataframe in R using package rio.
 >
 > I read a solution using package readlx, as suggested in a
StackOverflow
 > discussion as follows:
 > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
10).
 > Now, when you have so many sheets (46 in my case) in an Excel
file, the rio
 > methodology is more practical.
 >
 > This is what I did:
 > path =
 >

"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
 > (4).xlsx"
 > figidat = import_list(path, rbind = TRUE) #here figidat refers to
my dataset
 >
 > Now, it successfully imports and merges all records, however,
some fields
 > (despite being numeric), R interprets as a boolean field.
 >
 > Here is the structure of the field that is causing me problems (I
apologize
 > for the length):
 > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
 > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
...
 > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
 > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
 > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
 > ), row.names = c(NA, -7033L))
 >
 > As you can see, when I do the dput, it gives me a bunch of TRUE
and FALSE
 > values, when in reality I have records with value $0, records
with amounts
 >> $0 and also a bunch of blank records.
 >
 > Any help will be greatly appreciated.

I don't know how read_excel() determines column types, but some
functions look only at the first n rows to guess the type.  It appears
you have a lot of NA values at the start.  That is a logical value, so
that might be what is going wrong.

In read.table() and related functions, you can specify the types of
column explicitly.  It sounds as though that's what you should do if
read_excel() offers that as a possibility.

Duncan Murdoch



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Bert Gunter
Incidentally, "didn't work" is not very useful information. Please tell us
exactly what error message or apparently aberrant result you received.
Also, what do you get from:

sapply(your_dataframe, "class")
nrow(your_dataframe)

(as I suspect what you think it is, isn't).

Cheers,
Bert

On Tue, Jan 30, 2024 at 11:01 AM Bert Gunter  wrote:

> "I cannot change the data type from
> boolean to numeric. I tried doing dataset$my_field =
> as.numeric(dataset$my_field), I also tried to do dataset <-
> dataset[complete.cases(dataset), ], didn't work either. "
>
> Sorry, but all I can say is: huh?
>
> > dt <- data.frame(a = c(NA,NA, FALSE, TRUE), b = 1:4)
> > dt
>   a b
> 1NA 1
> 2NA 2
> 3 FALSE 3
> 4  TRUE 4
> > sapply(dt, class)
> a b
> "logical" "integer"
> > dt$a <- as.numeric(dt$a)
> > dt
>a b
> 1 NA 1
> 2 NA 2
> 3  0 3
> 4  1 4
> > sapply(dt, class)
> a b
> "numeric" "integer"
>
> So either I'm missing something or you are. Happy to be corrected and
> chastised if the former.
>
> Cheers,
> Bert
>
>
> On Tue, Jan 30, 2024 at 10:41 AM Paul Bernal 
> wrote:
>
>> Dear friend Duncan,
>>
>> Thank you so much for your kind reply. Yes, that is exactly what is
>> happening, there are a lot of NA values at the start, so R assumes that
>> the
>> field is of type boolean. The challenge that I am facing is that I want to
>> read into R an Excel file that has many sheets (46 in this case) but I
>> wanted to combine all 46 sheets into a single dataframe (since the columns
>> are exactly the same for all 46 sheets). The rio package does this nicely,
>> the problem is that, once I have the full dataframe (which amounts to
>> roughly 2.98 million rows total), I cannot change the data type from
>> boolean to numeric. I tried doing dataset$my_field =
>> as.numeric(dataset$my_field), I also tried to do dataset <-
>> dataset[complete.cases(dataset), ], didn't work either.
>>
>> The only thing that worked for me was to take a single sheed and through
>> the read_excel function use the guess_max parameter and set it to a
>> sufficiently large number (a number >= to the total amount of the full
>> merged dataset). I want to automate the merging of the N number of Excel
>> sheets so that I don't have to be manually doing it. Unless there is a way
>> to accomplish something similar to what rio's package function import_list
>> does, that is able to keep the field's numeric data type nature.
>>
>> Cheers,
>> Paul
>>
>> El mar, 30 ene 2024 a las 12:23, Duncan Murdoch (<
>> murdoch.dun...@gmail.com>)
>> escribió:
>>
>> > On 30/01/2024 11:10 a.m., Paul Bernal wrote:
>> > > Dear friends,
>> > >
>> > > Hope you are doing well. I am currently using R version 4.3.2, and I
>> > have a
>> > > .xlsx file that has 46 sheets on it. I basically combined  all 46
>> sheets
>> > > and read them as a single dataframe in R using package rio.
>> > >
>> > > I read a solution using package readlx, as suggested in a
>> StackOverflow
>> > > discussion as follows:
>> > > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
>> > 10).
>> > > Now, when you have so many sheets (46 in my case) in an Excel file,
>> the
>> > rio
>> > > methodology is more practical.
>> > >
>> > > This is what I did:
>> > > path =
>> > >
>> >
>> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
>> > > (4).xlsx"
>> > > figidat = import_list(path, rbind = TRUE) #here figidat refers to my
>> > dataset
>> > >
>> > > Now, it successfully imports and merges all records, however, some
>> fields
>> > > (despite being numeric), R interprets as a boolean field.
>> > >
>> > > Here is the structure of the field that is causing me problems (I
>> > apologize
>> > > for the length):
>> > > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > ...
>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
>> > > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
>> > > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
>> > > ), row.names = c(NA, -7033L))
>> > >
>> > > As you can see, when I do the dput, it gives me a bunch of TRUE and
>> FALSE
>> > > values, when in reality I have records with value $0, records with
>> > amounts
>> > >> $0 and also a bunch of blank records.
>> > >
>> > > Any help will be greatly appreciated.
>> >
>> > I don't know how read_excel() determines column types, but some
>> 

Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Paul Bernal
Dear friend Duncan,

Thank you so much for your kind reply. Yes, that is exactly what is
happening, there are a lot of NA values at the start, so R assumes that the
field is of type boolean. The challenge that I am facing is that I want to
read into R an Excel file that has many sheets (46 in this case) but I
wanted to combine all 46 sheets into a single dataframe (since the columns
are exactly the same for all 46 sheets). The rio package does this nicely,
the problem is that, once I have the full dataframe (which amounts to
roughly 2.98 million rows total), I cannot change the data type from
boolean to numeric. I tried doing dataset$my_field =
as.numeric(dataset$my_field), I also tried to do dataset <-
dataset[complete.cases(dataset), ], didn't work either.

The only thing that worked for me was to take a single sheed and through
the read_excel function use the guess_max parameter and set it to a
sufficiently large number (a number >= to the total amount of the full
merged dataset). I want to automate the merging of the N number of Excel
sheets so that I don't have to be manually doing it. Unless there is a way
to accomplish something similar to what rio's package function import_list
does, that is able to keep the field's numeric data type nature.

Cheers,
Paul

El mar, 30 ene 2024 a las 12:23, Duncan Murdoch ()
escribió:

> On 30/01/2024 11:10 a.m., Paul Bernal wrote:
> > Dear friends,
> >
> > Hope you are doing well. I am currently using R version 4.3.2, and I
> have a
> > .xlsx file that has 46 sheets on it. I basically combined  all 46 sheets
> > and read them as a single dataframe in R using package rio.
> >
> > I read a solution using package readlx, as suggested in a StackOverflow
> > discussion as follows:
> > df <- read_excel(path = filepath, sheet = sheet_name, guess_max =
> 10).
> > Now, when you have so many sheets (46 in my case) in an Excel file, the
> rio
> > methodology is more practical.
> >
> > This is what I did:
> > path =
> >
> "C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
> > (4).xlsx"
> > figidat = import_list(path, rbind = TRUE) #here figidat refers to my
> dataset
> >
> > Now, it successfully imports and merges all records, however, some fields
> > (despite being numeric), R interprets as a boolean field.
> >
> > Here is the structure of the field that is causing me problems (I
> apologize
> > for the length):
> > structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> ...
> > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
> > FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
> > FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
> > ), row.names = c(NA, -7033L))
> >
> > As you can see, when I do the dput, it gives me a bunch of TRUE and FALSE
> > values, when in reality I have records with value $0, records with
> amounts
> >> $0 and also a bunch of blank records.
> >
> > Any help will be greatly appreciated.
>
> I don't know how read_excel() determines column types, but some
> functions look only at the first n rows to guess the type.  It appears
> you have a lot of NA values at the start.  That is a logical value, so
> that might be what is going wrong.
>
> In read.table() and related functions, you can specify the types of
> column explicitly.  It sounds as though that's what you should do if
> read_excel() offers that as a possibility.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R interpreting numeric field as a boolean field

2024-01-30 Thread Duncan Murdoch

On 30/01/2024 11:10 a.m., Paul Bernal wrote:

Dear friends,

Hope you are doing well. I am currently using R version 4.3.2, and I have a
.xlsx file that has 46 sheets on it. I basically combined  all 46 sheets
and read them as a single dataframe in R using package rio.

I read a solution using package readlx, as suggested in a StackOverflow
discussion as follows:
df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 10).
Now, when you have so many sheets (46 in my case) in an Excel file, the rio
methodology is more practical.

This is what I did:
path =
"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
(4).xlsx"
figidat = import_list(path, rbind = TRUE) #here figidat refers to my dataset

Now, it successfully imports and merges all records, however, some fields
(despite being numeric), R interprets as a boolean field.

Here is the structure of the field that is causing me problems (I apologize
for the length):
structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,

...

FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, NA, NA,
FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7033L))

As you can see, when I do the dput, it gives me a bunch of TRUE and FALSE
values, when in reality I have records with value $0, records with amounts

$0 and also a bunch of blank records.


Any help will be greatly appreciated.


I don't know how read_excel() determines column types, but some 
functions look only at the first n rows to guess the type.  It appears 
you have a lot of NA values at the start.  That is a logical value, so 
that might be what is going wrong.


In read.table() and related functions, you can specify the types of 
column explicitly.  It sounds as though that's what you should do if 
read_excel() offers that as a possibility.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R interpreting numeric field as a boolean field

2024-01-30 Thread Paul Bernal
Dear friends,

Hope you are doing well. I am currently using R version 4.3.2, and I have a
.xlsx file that has 46 sheets on it. I basically combined  all 46 sheets
and read them as a single dataframe in R using package rio.

I read a solution using package readlx, as suggested in a StackOverflow
discussion as follows:
df <- read_excel(path = filepath, sheet = sheet_name, guess_max = 10).
Now, when you have so many sheets (46 in my case) in an Excel file, the rio
methodology is more practical.

This is what I did:
path =
"C:/Users/myuser/Documents/DataScienceF/Forecast_and_Econometric_Analysis_FIGI
(4).xlsx"
figidat = import_list(path, rbind = TRUE) #here figidat refers to my dataset

Now, it successfully imports and merges all records, however, some fields
(despite being numeric), R interprets as a boolean field.

Here is the structure of the field that is causing me problems (I apologize
for the length):
structure(list(StoreCharges = c(NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,