[R] Filtering data with dplyr or grep and losing data?

2018-12-19 Thread Satish Vadlamani
Hello Experts:

I have this log file that has about 1200 characters (max) on a line. What I
want to do is read this first and then extract certain portions of the file
into new columns. I want to extract rows that contain the text “[DF_API:
input string]”. When I read it and then filter based on the rows that I am
interested, it almost seems like I am losing data. I tried this using the
dplyr filter and using standard grep with the same result.

Not sure why this is the case. Appreciate your help with this. The code and
the data is there at the following link. Satish

Code is given below

library(dplyr)
setwd("C:/Users/satis/Documents/VF/df_issue_dec01")

sec1 <- read.delim(file="secondary1_aa_small.log")
head(sec1)
names(sec1) <- c("V1")
sec1_test <- filter(sec1,str_detect(V1,"DF_API: input string")==TRUE)
head(sec1_test)

sec1_test2 = sec1[grep("DF_API: input string",sec1$V1, perl = TRUE),]
head(sec1_test2)

write.csv(sec1_test, file = "test_out.txt", row.names = F, quote = F)
write.csv(sec1_test2, file = "test2_out.txt", row.names = F, quote = F)

Data (and code) is given at the link below. Sorry, I should have used dput.

https://spaces.hightail.com/space/arJlYkgIev


Satish Vadlamani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to group by and get distinct rows of of grouped rows based on certain criteria

2016-07-15 Thread Satish Vadlamani
Thank you Bill and Sarah for your help. I was able to do the same with
dplyr with the following code. But I could not post this since at that time
my message was not posted yet.

>>
file1 <- select(file1, ATP.Group,Business.Event,Category)

file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
filter(Category == "EQ") %>% distinct(ATP.Group,Business.Event)
file1_1 <- as.data.frame(file1_1)
file1_1

file1_2 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
distinct(ATP.Group,Business.Event)
file1_2 <- as.data.frame(file1_2)
file1_2

setdiff(select(file1_2,ATP.Group,Business.Event),
select(file1_1,ATP.Group,Business.Event))
>>

On Thu, Jul 14, 2016 at 1:53 PM, William Dunlap <wdun...@tibco.com> wrote:

> > txt <- "|ATP Group|Business Event|Category|
> |02   |A |AC  |
> |02   |A |AD  |
> |02   |A |EQ  |
> |ZM   |A |AU  |
> |ZM   |A |AV  |
> |ZM   |A |AW  |
> |02   |B |AC  |
> |02   |B |AY  |
> |02   |B |EQ  |
> "
> > d <- read.table(sep="|", text=txt, header=TRUE, strip.white=TRUE,
> check.names=FALSE)[,2:4]
> > str(d)
> 'data.frame':   9 obs. of  3 variables:
>  $ ATP Group : Factor w/ 2 levels "02","ZM": 1 1 1 2 2 2 1 1 1
>  $ Business Event: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 2 2 2
>  $ Category  : Factor w/ 7 levels "AC","AD","AU",..: 1 2 7 3 4 5 1 6 7
> > unique(d[d[,"Category"]!="EQ", c("ATP Group", "Business Event")])
>   ATP Group Business Event
> 102  A
> 4ZM  A
> 702  B
> > unique(d[d[,"Category"]=="EQ", c("ATP Group", "Business Event")])
>   ATP Group Business Event
> 302  A
> 902  B
>
> Some folks prefer to use subset() instead of "[".  The previous expression
> is equivalent to:
>
> > unique( subset(d, Category=="EQ", c("ATP Group", "Business Event")))
>   ATP Group Business Event
> 302  A
> 902  B
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Thu, Jul 14, 2016 at 12:43 PM, Satish Vadlamani <
> satish.vadlam...@gmail.com> wrote:
>
>> Hello All:
>> I would like to get your help on the following problem.
>>
>> I have the following data and the first row is the header. Spaces are not
>> important.
>> I want to find out distinct combinations of ATP Group and Business Event
>> (these are the field names that you can see in the data below) that have
>> the Category EQ (Category is the third field) and those that do not have
>> the category EQ. In the example below, the combinations 02/A and 02/B have
>> EQ and the combination ZM/A does not.
>>
>> If I have a larger file, how to get to this answer?
>>
>> What did I try (with dplyr)?
>>
>> # I know that the below is not correct and not giving desired results
>> file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
>> filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
>> # for some reason, I have to convert to data.frame to print the data
>> correctly
>> file1_1 <- as.data.frame(file1_1)
>> file1_1
>>
>>
>> *Data shown below*
>> |ATP Group|Business Event|Category|
>> |02   |A |AC  |
>> |02   |A |AD  |
>> |02   |A |EQ  |
>> |ZM   |A     |AU  |
>> |ZM   |A |AV  |
>> |ZM   |A |AW  |
>> |02   |B |AC  |
>> |02   |B |AY  |
>> |02   |B |EQ  |
>>
>> --
>>
>> Satish Vadlamani
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


-- 

Satish Vadlamani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to group by and get distinct rows of of grouped rows based on certain criteria

2016-07-14 Thread Satish Vadlamani
Hello All:
I would like to get your help on the following problem.

I have the following data and the first row is the header. Spaces are not
important.
I want to find out distinct combinations of ATP Group and Business Event
(these are the field names that you can see in the data below) that have
the Category EQ (Category is the third field) and those that do not have
the category EQ. In the example below, the combinations 02/A and 02/B have
EQ and the combination ZM/A does not.

If I have a larger file, how to get to this answer?

What did I try (with dplyr)?

# I know that the below is not correct and not giving desired results
file1_1 <- file1  %>% group_by(ATP.Group,Business.Event) %>%
filter(Category != "EQ") %>% distinct(ATP.Group,Business.Event)
# for some reason, I have to convert to data.frame to print the data
correctly
file1_1 <- as.data.frame(file1_1)
file1_1


*Data shown below*
|ATP Group|Business Event|Category|
|02   |A |AC  |
|02   |A |AD  |
|02   |A |EQ  |
|ZM   |A |AU  |
|ZM   |A |AV  |
|ZM   |A |AW  |
|02   |B |AC  |
|02   |B |AY  |
|02   |B     |EQ  |

-- 

Satish Vadlamani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] question - how to subcribe to this list

2016-06-17 Thread Satish Vadlamani
Hello All:
I posted one question in the past and another today and hope to get the
same excellent help that I got last time.

My question is this: is there any way to subcribe to the forum so that I
can see the questions and answers posted to r-help?

Thanks,

-- 

Satish Vadlamani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] what is the best way to process the following data?

2016-06-17 Thread Satish Vadlamani
Hello,
I have multiple text files with the format shown below (see the two files
that I pasted below). Each file is a log of multiple steps that the system
has processed and for each step, it has shown the start time of the process
step. For example, in the data below, the filter started at
|06/16/2016|03:44:16

How to read this data so that Step 001 is one data frame, Step 002 is
another, and so on. After I do this, I will then compare the Step 001 times
with and without parallel process.

For example, the files pasted below "no_parallel_process_SLS_4.txt" and
"parallel_process_SLS_4.txt" will make it clear what I am trying to do. I
want to compare the parallel process times taken for each step with the non
parallel process times.

If there are better ways of performing this task that what I am thinking,
could you let me know? Thanks in advance.

Satish Vadlamani

>> parallel_process_file.txt

|06/16/2016|03:44:16|Step 001
|06/16/2016|03:44:16|Initialization
|06/16/2016|03:44:16|Filters
|06/16/2016|03:45:03|Split Items
|06/16/2016|03:46:20|Sort
|06/16/2016|03:46:43|Check
|06/16/2016|04:01:13|Save
|06/16/2016|04:04:35|Update preparation
|06/16/2016|04:04:36|Update comparison
|06/16/2016|04:04:38|Update
|06/16/2016|04:04:38|Update
|06/16/2016|04:06:01|Close
|06/16/2016|04:06:33|BOP processing for 7,960 items has finished
|06/16/2016|04:06:34|Step 002
|06/16/2016|04:06:35|Initialization
|06/16/2016|04:06:35|Filters
|06/16/2016|04:07:14|Split Items
|06/16/2016|04:08:57|Sort
|06/16/2016|04:09:06|Check
|06/16/2016|04:26:36|Save
|06/16/2016|04:39:29|Update preparation
|06/16/2016|04:39:31|Update comparison
|06/16/2016|04:39:43|Update
|06/16/2016|04:39:45|Update
|06/16/2016|04:44:28|Close
|06/16/2016|04:45:26|BOP processing for 8,420 items has finished
|06/16/2016|04:45:27|Step 003
|06/16/2016|04:45:27|Initialization
|06/16/2016|04:45:27|Filters
|06/16/2016|04:48:50|Split Items
|06/16/2016|04:55:15|Sort
|06/16/2016|04:55:40|Check
|06/16/2016|05:13:35|Save
|06/16/2016|05:17:34|Update preparation
|06/16/2016|05:17:34|Update comparison
|06/16/2016|05:17:36|Update
|06/16/2016|05:17:36|Update
|06/16/2016|05:19:29|Close
|06/16/2016|05:19:49|BOP processing for 8,876 items has finished
|06/16/2016|05:19:50|Step 004
|06/16/2016|05:19:50|Initialization
|06/16/2016|05:19:50|Filters
|06/16/2016|05:20:43|Split Items
|06/16/2016|05:22:14|Sort
|06/16/2016|05:22:29|Check
|06/16/2016|05:37:27|Save
|06/16/2016|05:38:43|Update preparation
|06/16/2016|05:38:44|Update comparison
|06/16/2016|05:38:45|Update
|06/16/2016|05:38:45|Update
|06/16/2016|05:39:09|Close
|06/16/2016|05:39:19|BOP processing for 5,391 items has finished
|06/16/2016|05:39:20|Step 005
|06/16/2016|05:39:20|Initialization
|06/16/2016|05:39:20|Filters
|06/16/2016|05:39:57|Split Items
|06/16/2016|05:40:21|Sort
|06/16/2016|05:40:24|Check
|06/16/2016|05:46:01|Save
|06/16/2016|05:46:54|Update preparation
|06/16/2016|05:46:54|Update comparison
|06/16/2016|05:46:54|Update
|06/16/2016|05:46:55|Update
|06/16/2016|05:47:24|Close
|06/16/2016|05:47:31|BOP processing for 3,016 items has finished
|06/16/2016|05:47:32|Step 006
|06/16/2016|05:47:32|Initialization
|06/16/2016|05:47:32|Filters
|06/16/2016|05:47:32|Update preparation
|06/16/2016|05:47:32|Update comparison
|06/16/2016|05:47:32|Update
|06/16/2016|05:47:32|Close
|06/16/2016|05:47:33|BOP processing for 0 items has finished
|06/16/2016|05:47:33|Step 007
|06/16/2016|05:47:33|Initialization
|06/16/2016|05:47:33|Filters
|06/16/2016|05:47:34|Split Items
|06/16/2016|05:47:34|Sort
|06/16/2016|05:47:34|Check
|06/16/2016|05:47:37|Save
|06/16/2016|05:47:37|Update preparation
|06/16/2016|05:47:37|Update comparison
|06/16/2016|05:47:37|Update
|06/16/2016|05:47:37|Update
|06/16/2016|05:47:37|Close
|06/16/2016|05:47:37|BOP processing for 9 items has finished
|06/16/2016|05:47:37|Step 008
|06/16/2016|05:47:37|Initialization
|06/16/2016|05:47:37|Filters
|06/16/2016|05:47:38|Update preparation
|06/16/2016|05:47:38|Update comparison
|06/16/2016|05:47:38|Update
|06/16/2016|05:47:38|Close
|06/16/2016|05:47:38|BOP processing for 0 items has finished




>> no_parallel_process_file.txt

|06/15/2016|22:52:46|Step 001
|06/15/2016|22:52:46|Initialization

|06/15/2016|22:52:46|Filters

|06/15/2016|22:54:21|Split Items

|06/15/2016|22:55:10|Sort

|06/15/2016|22:55:15|Check

|06/15/2016|23:04:43|Save

|06/15/2016|23:06:38|Update preparation

|06/15/2016|23:06:38|Update comparison

|06/15/2016|23:06:39|Update

|06/15/2016|23:06:39|Update

|06/15/2016|23:12:04|Close

|06/15/2016|23:13:16|BOP processing for 7,942 items has finished

|06/15/2016|23:13:17|Step 002
|06/15/2016|23:13:17|Initialization

|06/15/2016|23:13:17|Filters

|06/15/2016|23:16:27|Split Items

|06/15/2016|23:20:18|Sort

|06/15/2016|23:20:34|Check

|06/16/2016|00:08:08|Save

|06/16/2016|00:26:19|Update preparation

|06/16/2016|00:26:20|Update comparison

|06/16/2016|00:26:30|Update

|06/16/2016|00:26:31|Update

|06/16/2016|00:42:31|Close

|06/16/2016|0

Re: [R] How to form groups for this specific problem?

2016-03-28 Thread Satish Vadlamani
Jean:
Wow. Thank you so much for this. I will read up igraph and then see if this
is going to work for me for the larger dataset.

Thanks for the wonderful snippet code you wrote. Basically, the requirement
is this:
TLA1 (Top Level Assembly) and its components should belong to the same
group. If a component belongs to a different TLA (say TLA2), then that TLA1
and all of its components should belong to the same as that of TLA1.

Are these types of questions appropriate for this group?

Thanks,
Satish


On Mar 28, 2016 9:10 AM, "Adams, Jean" <jvad...@usgs.gov> wrote:

> Satish,
>
> If you rearrange your data into a network of nodes and edges, you can use
> the igraph package to identify disconnected (mutually exclusive) groups.
>
> # example data
> df <- data.frame(
>   Component = c("C1", "C2", "C1", "C3", "C4", "C5"),
>   TLA = c("TLA1", "TLA1", "TLA2", "TLA2", "TLA3", "TLA3")
> )
>
> # characterize data as a network of nodes and edges
> nodes <- levels(unlist(df))
> edges <- apply(df, 2, match, nodes)
>
> # use the igraph package to identify disconnected groups
> library(igraph)
> g <- graph(edges)
> ngroup <- clusters(g)$membership
> df$Group <- ngroup[match(df$Component, nodes)]
> df
>
>   Component  TLA Group
> 1C1 TLA1 1
> 2C2 TLA1 1
> 3C1 TLA2 1
> 4C3 TLA2 1
> 5C4 TLA3 2
> 6C5 TLA3 2
>
> Jean
>
> On Sun, Mar 27, 2016 at 7:56 PM, Satish Vadlamani <
> satish.vadlam...@gmail.com> wrote:
>
>> Hello All:
>> I would like to get some help with the following problem and understand
>> how
>> this can be done in R efficiently. The header is given in the data frame.
>>
>> *Component, TLA*
>> C1, TLA1
>> C2, TLA1
>> C1, TLA2
>> C3, TLA2
>> C4, TLA3
>> C5, TLA3
>>
>> Notice that C1 is a component of TLA1 and TLA2.
>>
>> I would like to form groups of mutually exclusive subsets and create a new
>> column called group for this subset. For the above data, the subsets and
>> the new group column value will be like so:
>>
>> *Component, TLA, Group*
>> C1, TLA1, 1
>> C2, TLA1, 1
>> C1, TLA2, 1
>> C3, TLA2, 1
>> C4, TLA3, 2
>> C5, TLA3, 2
>>
>> Appreciate any help on this. I could have looped through the observations
>> and tried some logic but I did not try that yet.
>>
>> --
>>
>> Satish Vadlamani
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to form groups for this specific problem?

2016-03-27 Thread Satish Vadlamani
Hello All:
I would like to get some help with the following problem and understand how
this can be done in R efficiently. The header is given in the data frame.

*Component, TLA*
C1, TLA1
C2, TLA1
C1, TLA2
C3, TLA2
C4, TLA3
C5, TLA3

Notice that C1 is a component of TLA1 and TLA2.

I would like to form groups of mutually exclusive subsets and create a new
column called group for this subset. For the above data, the subsets and
the new group column value will be like so:

*Component, TLA, Group*
C1, TLA1, 1
C2, TLA1, 1
C1, TLA2, 1
C3, TLA2, 1
C4, TLA3, 2
C5, TLA3, 2

Appreciate any help on this. I could have looped through the observations
and tried some logic but I did not try that yet.

-- 

Satish Vadlamani

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files

2010-02-05 Thread Satish Vadlamani

Folks:
Can anyone throw some light on this? Thanks.
Satish


-
Satish Vadlamani
-- 
View this message in context: 
http://n4.nabble.com/Reading-large-files-tp1469691p1470169.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading large files

2010-02-05 Thread Satish Vadlamani

Matthew:
If it is going to help, here is the explanation. I have an end state in
mind. It is given below under End State header. In order to get there, I
need to start somewhere right? I started with a 850 MB file and could not
load in what I think is reasonable time (I waited for an hour).

There are references to 64 bit. How will that help? It is a 4GB RAM machine
and there is no paging activity when loading the 850 MB file.

I have seen other threads on the same types of questions. I did not see any
clear cut answers or errors that I could have been making in the process. If
I am missing something, please let me know. Thanks.
Satish


End State
 Satish wrote: at one time I will need to load say 15GB into R 


-
Satish Vadlamani
-- 
View this message in context: 
http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.