Re: How to count the number of occurrences of a certain string in file

2017-11-10 Thread Mark Payne
Tina,

My apologies - I was conflating ExecuteScript with ExecuteStreamCommand.
In ExecuteStream, the contents of the FlowFile are not read from StdIn but 
rather
you'd want to use the session to read the contents. So in Groovy we'd do 
something like:

import org.apache.nifi.processors.script.ExecuteScript
def flowFile = session.get()
def in = session.read(flowFile)
def totalDisconnections = 0
def totalConnections = 0

try {
// Here, 'in' is the InputStream that contains the data
finally {
in.close()
}

flowFile = session.putAttribute("TOTAL_DISCONNECTIONS", totalDisconnections)
flowFile = session.putAttribute("TOTAL_CONNECTIONS", totalConnections)
session.transfer(flowFile, ExecuteScript.REL_SUCCESS)

Sorry, I am not familiar enough with Python to provide any sort of meaningful 
suggestions on what
to do there.

Thanks
-Mark



On Nov 10, 2017, at 4:01 PM, tzhu 
mailto:js.tianlu...@gmail.com>> wrote:

Hi Mark,

This is the script I'm using currently:

import sys
TOTAL_DISCONNECTIONS = 0
TOTAL_CONNECTIONS = 0
flowFile = sys.stdin
if (flowFile != None):
   for line in flowFile:
   if "Lost connection to server" in line:
   TOTAL_DISCONNECTIONS += 1
   if "Established connection to server" in line:
TOTAL_CONNECTIONS += 1
attrMap = {"TOTAL_DISCONNECTIONS":TOTAL_DISCONNECTIONS,
  "TOTAL_CONNECTIONS":TOTAL_CONNECTIONS}
flowFile = session.putAllAttributes(flowFile, attrMap)



Does it make sense to you? The error message says "filereader" object is not
iterable. So what is in flowFile now? How should I access the content in
filwFile?

Thanks,

Tina



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/



Re: How to count the number of occurrences of a certain string in file

2017-11-10 Thread tzhu
Hi Mark,

This is the script I'm using currently:

import sys
TOTAL_DISCONNECTIONS = 0 
TOTAL_CONNECTIONS = 0 
flowFile = sys.stdin
if (flowFile != None):
for line in flowFile:
if "Lost connection to server" in line:
TOTAL_DISCONNECTIONS += 1
if "Established connection to server" in line:
 TOTAL_CONNECTIONS += 1
attrMap = {"TOTAL_DISCONNECTIONS":TOTAL_DISCONNECTIONS,
   "TOTAL_CONNECTIONS":TOTAL_CONNECTIONS}  
flowFile = session.putAllAttributes(flowFile, attrMap)



Does it make sense to you? The error message says "filereader" object is not
iterable. So what is in flowFile now? How should I access the content in
filwFile?

Thanks,

Tina



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: How to count the number of occurrences of a certain string in file

2017-11-10 Thread Mark Payne
Tina,

In NiFi, a FlowFile is made up of two parts: Content (the payload, or bytes) 
and Attributes (metadata about the content).
When you are using the Expression Language, you are operating on FlowFile 
Attributes, not the content. So if you are
wanting to count the number of occurrences of some string in the content, the 
Expression Language will not help you.

This is why I was suggesting using the ExecuteScript processor. You can have a 
script that reads the content from
StdIn (this will provide you the content of the FlowFile) and count the number 
of occurrences of each word/phrase of
interest. Once you have counted the number of occurrences, you can add those to 
the FlowFile as attributes.
Or, alternatively, you could write out the data to the contents of the 
FlowFile, from within your script.

Thanks
-Mark


> On Nov 9, 2017, at 4:05 PM, tzhu  wrote:
> 
> Hi Mark,
> 
> According to the  language guide
> 
>  
> , the count can be done by:
> ${allMatchingAttributes("abc","xyz"):contains("world"):count()}
> 
> I'm wondering if I can contain this one in one of the processor, as a side
> product of "update attribute" or some other processor.
> 
> Thanks,
> 
> Tina
> 
> 
> 
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/



Re: How to count the number of occurrences of a certain string in file

2017-11-09 Thread tzhu
Hi Mark,

According to the  language guide

 
, the count can be done by:
${allMatchingAttributes("abc","xyz"):contains("world"):count()}

I'm wondering if I can contain this one in one of the processor, as a side
product of "update attribute" or some other processor.

Thanks,

Tina



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: How to count the number of occurrences of a certain string in file

2017-11-08 Thread tzhu
Hi Mark,

I am confused about the whole process. I have the following questions:

1. From what I read, I can use TailFile to read the log file. However, it
would only read the file once (as the input file does not change). Is there
a way that I can read the file every time it gets started?

2. As you suggested, I am writing a personal Python script to handle the
count. Most of the examples online are in Jython.(My NiFi version is 1.3.0.
I suppose it's similar to Python, correct?) 
I find 
https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html

  
as a useful guide, but I don't understand what to choose from. What's the
input and output for the script? I want to read the file line by line and
count the string occurrences. I'm currently using  key,value =
flowFile.getAttributes().iteritems() to get the file content, but it shows
"too many values to unpack". (The original file is 41.14MB)
For the output, the common way seems to be using a callback. Is it necessary
in my case? Or can I just add these attributes to the output file and
extract the attributes later?

3. To write the columns into the SQL table, the common way seems to use
"ReplaceText" and "PutSQL". I also noticed there's a processor called
PutDatabaseRecord that might combine the function of ExecuteScript and
PutSQL together. Since my Python script doesn't work so far I can't really
test the result. But is this an easier approach?

Any help is appreciated...

Thanks,
Tina



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: How to count the number of occurrences of a certain string in file

2017-11-07 Thread Mark Payne
Tina,

I don't believe there are any processors right now that will count the number 
of occurrences
of some string in a FlowFile. I would recommend either using an ExecuteScript 
processor and
scripting it out in Groovy/Jython, or if you're comfortable enough with Java 
and want to get your
hands a bit dirtier, you could actually update the ScanContent processor to 
optionally count
the number of occurrences of each term in the dictionary, rather than simply 
routing to 'matched'
or 'unmatched'. Then we could have ScanContent provide attributes such as 
 = .

Thanks
-Mark

> On Nov 6, 2017, at 3:58 PM, tzhu  wrote:
> 
> Hi,
> 
> I have a text log file. I want to count the number of occurrences of strings
> in the file and store it in a SQL table. For example, I want to count the
> number of times the string "Established connection to server" present in the
> file as TOTAL_CONNECTIONS and the number of times "Lost connection to
> server" existed in the same file as TOTAL_DISCONNECTION.
> 
> I don't know how I should count the strings in this case. I find this blog
> as a template:  https://www.batchiq.com/database-injest-with-nifi.html
>    Maybe I can
> modify things from this?
> 
> Any suggestions would be appreciated.
> 
> Thanks,
> 
> Tina
> 
> 
> 
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/