Re: Duplicate Attribute Values in Extract Text Processor Output

2020-07-11 Thread muhyid72
Hi Mark,

Hi Mark,
I would like to say thank you for your advice. I did your described method.
It is working and giving better performance.  



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-26 Thread muhyid72
Hi Mark,

Thanks for your answer

Actually i don't have so much experience on NiFi

I guess, i couldn't understand correctly your explanation

I want to append extra words beginning of each line

for example:
my IIS Log File line like this:
2020-03-13 13:59:19 XXX-YYY  GET /Maintenance/Status.svc
X-ARR-LOG-ID=267ed22c-f1b 200 0 0 1005 1086 46

My line will be like this:
*Jun 26 23:29:09 SERVER1 IISHttp *2020-03-13 13:59:19 XXX-YYY  GET
/Maintenance/Status.svc X-ARR-LOG-ID=267ed22c-f1b 200 0 0 1005 1086 46

When I investigate Replace Text and Update Record Processors I couldn't find
how can i do that 

I added my current Flow in the message

SyslogTransferFlow2.jpg

  



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-26 Thread Mark Payne
You’ll want to connect FetchAzureBlob -> ReplaceText -> PutTCP.

ReplaceText would use the Evaluation Mode of Line-by-Line to update the text. 
Or, alternatively, you could use UpdateRecord.

Thanks
-Mark


> On Jun 26, 2020, at 2:36 PM, muhyid72  wrote:
> 
> Hi Mark,
> 
> Thank you so much for valuable advice.
> 
> I tried PutTCP it seems working. I would like to make a summary for your
> explanation and ask a questions
> 
> If I understand correctly;
> 
> 1. Getting IIS Log Files from Azure Blob Storage same as before
> 1.1. List Azure Blob Storage Processor
> 1.2. Route on Attribute Processor (I have date filter RegEx on it)
> 1.3. Fetch Azure Blob Storage Processor
> 
> 2. I will not use Split Text Processor as you explained
> 3. I will not use Extract Text Processor as you explained
> 4. I will not use Put Syslog Processor as you explained
> 
> 3. Fetch Azure Blob Storage Processor will be directly connecting to Put TCP
> Processor
> 4. Put TCP Processor
> 4.1. Hostname: Syslog Server
> 4.2. Port: Syslog Server Port (TCP)
> 4.3. Outgoing Message Delimiter: \n (for splitting each line from entire IIS
> Log file. I will have just 1 line to syslog transfer for each time)
> 4.4. SSL Context Service --> StandardRestrictedSSLContextService
> (configuring for mutual authentication)
> 4.5. Rest of the Properties will be default
> 
> I need your help after that point because i didn't use PutTCP Processor
> until today
> 
> 5. I need to add some prefixes to each line which is produced by \n
> delimiter for Syslog Server. How will I do these?
> 5.1. Each Line should be begin these prefixes:
> 5.1.1. Message Timestamp: ${now():format('MMM d HH:mm:ss')}
> 5.1.2. Message Hostname: ${hostname(true)}
> 5.2. After these two prefix Message Body should be include IISHttp (Message
> Body: IISHttp ${msg}) wording.
> 
> Thanks for your help in advance
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-26 Thread muhyid72
Hi Mark,

Thank you so much for valuable advice.

I tried PutTCP it seems working. I would like to make a summary for your
explanation and ask a questions

If I understand correctly;

1. Getting IIS Log Files from Azure Blob Storage same as before
1.1. List Azure Blob Storage Processor
1.2. Route on Attribute Processor (I have date filter RegEx on it)
1.3. Fetch Azure Blob Storage Processor

2. I will not use Split Text Processor as you explained
3. I will not use Extract Text Processor as you explained
4. I will not use Put Syslog Processor as you explained

3. Fetch Azure Blob Storage Processor will be directly connecting to Put TCP
Processor
4. Put TCP Processor
4.1. Hostname: Syslog Server
4.2. Port: Syslog Server Port (TCP)
4.3. Outgoing Message Delimiter: \n (for splitting each line from entire IIS
Log file. I will have just 1 line to syslog transfer for each time)
4.4. SSL Context Service --> StandardRestrictedSSLContextService
(configuring for mutual authentication)
4.5. Rest of the Properties will be default

I need your help after that point because i didn't use PutTCP Processor
until today

5. I need to add some prefixes to each line which is produced by \n
delimiter for Syslog Server. How will I do these?
5.1. Each Line should be begin these prefixes:
5.1.1. Message Timestamp: ${now():format('MMM d HH:mm:ss')}
5.1.2. Message Hostname: ${hostname(true)}
5.2. After these two prefix Message Body should be include IISHttp (Message
Body: IISHttp ${msg}) wording.

Thanks for your help in advance



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-26 Thread Mark Payne
If performance is the problem, then you definitely want to get rid of any 
SplitText / Split* processors.
These processors are great when they are necessary but they should be avoided 
if at all possible, because splitting the data apart results in huge overhead 
for NiFi and will harm performance [1] (plus it just makes the flow a lot more 
complex).

Better options would be to use ReplaceText in order to convert the existing IIS 
Log message into a Syslog-formatted message or to use UpdateRecord with a CSV 
Reader and a Syslog Writer, adding in fields to the UpdateRecord processor like 
/hostname = localhost, /priority = 4, /message = CONCAT(‘IISHttp’, .), and so 
on.

If you’re sending data over TCP, there is no need to split the data up at all. 
You can just send the entire text, newline delimited, over TCP using PutTCP 
instead of PutSyslog.
If you want to send over UDP, you may end up needing to use a SplitText just 
before PutUDP, but at least that would offer better performance because you 
only have a single processor operating on tiny FlowFiles.

Thanks
-Mark


[1] https://www.youtube.com/watch?v=RjWstt7nRVY



On Jun 26, 2020, at 3:51 AM, muhyid72 
mailto:muhyi...@outlook.com>> wrote:

Hi Andy,
Thank you for your great support
My aim is transferring all IIS logs to syslog line by line. Therefore i am
using split text for parsing line. I tried Route Text yesterday but i didn't
accomplish to transfer line by line to syslog.
Extract Text is transferring splitted line on the attribute, in this way i
can say to syslog processor "Message Body: IISHttp${msg}".
Actually my problem is botleneck on the Extract Text. I have to transfer IIS
Logs near-real time due to cyber security process. But it doesn't drain
number of the message in the queue properly. I tried increasing Thread
Number, changing Run Duration, increasing/reducing Queue size but i couldn't
achive my target. The queue between split text and extract text allways full
and i have log gap about 12 hours. I am trying find a way for that



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-26 Thread muhyid72
Hi Andy,
Thank you for your great support
My aim is transferring all IIS logs to syslog line by line. Therefore i am
using split text for parsing line. I tried Route Text yesterday but i didn't
accomplish to transfer line by line to syslog. 
Extract Text is transferring splitted line on the attribute, in this way i
can say to syslog processor "Message Body: IISHttp${msg}". 
Actually my problem is botleneck on the Extract Text. I have to transfer IIS
Logs near-real time due to cyber security process. But it doesn't drain
number of the message in the queue properly. I tried increasing Thread
Number, changing Run Duration, increasing/reducing Queue size but i couldn't
achive my target. The queue between split text and extract text allways full
and i have log gap about 12 hours. I am trying find a way for that



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread Andy LoPresto
The resulting flowfile will always have at least two attributes because the 
whole match is extracted as an attribute and every capture group is extracted 
as an attribute, and the expression must contain at least one capture group. 

What is the objective you are trying to accomplish? If you want to route 
flowfiles based on their text contents, you can use RouteText. If you want to 
extract text content to attributes, use ExtractText. 

The use case you described above basically retrieves a log file from blob 
storage, splits each file to individual lines, extracts the content of each 
line (minus the final character) into an attribute, and then sends the values 
to Syslog. 

You may want to look at the record processors to improve the performance and 
simplicity of the flow substantially. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 25, 2020, at 11:53 AM, muhyid72  wrote:
> 
> Hi Andy,
> 
> Thank you for your quick answer and interest. 
> 
> Actually I tried that but there were still 2 attributes on the flow file. As
> far as I understand it is by design, I can't set just one attribute, it has
> at least 2. Am i right?
> 
> Can I use Route Text Processor instead of Extract Text (I have given my
> Extract Text configuration at the above) Dou you have comment?
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/



Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread muhyid72
Hi Andy,

Thank you for your quick answer and interest. 

Actually I tried that but there were still 2 attributes on the flow file. As
far as I understand it is by design, I can't set just one attribute, it has
at least 2. Am i right?

Can I use Route Text Processor instead of Extract Text (I have given my
Extract Text configuration at the above) Dou you have comment?



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Duplicate Attribute Values in Extract Text Processor Output

2020-06-25 Thread Andy LoPresto
The regex you’re using contains a capture group, and so the entire string is 
captured as one attribute, and then the contained capture groups are also 
extracted as attributes. You can set the property “Include Capture Group 0” to 
false to remove one of them. The others are provided as expected. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 25, 2020, at 8:27 AM, muhyid72  wrote:
> 
> Dear All
> I need an information about Flow Files Attribute of Extract Text Processor. 
> My flow is that;
> 
> 1. Getting IIS Log files from Azure Blob Storage 
> 2. Splitting each IIS Log File to line by line with Split Text Processor. 
> 2.1. Line Split Count:1
> 2.2. Maximum Fragment Size: No value set
> 2.3. Header Line Count: 0
> 2.4. Header Line Marker Characters: No value set
> 2.5. Remove Trailing Newlines: True
> 3. Transferring new flow files which is produced by Split Text Processor to
> Extract Text Processor. 
> 3.1. All Properties are Default
> 3.2. I added one RegEx in the Properties. I would like to carry on Flow
> Files attributes to Syslog
> 3.2.1. Property Name: msg 
> 3.2.2. Value: (.*). 
> 4. Transferring all flow files where is coming from Extract Text to Put
> Syslog Processor. 
> 4.1. All Properties are Default or configured properly for requirements
> (such as IP address of the Syslog, port etc.) 
> 4.2. Message Body: IISHttp${msg}
> 
> When I check Flow Files Attribute from Data Provenance in the Extract Text
> Processor, I see 3 attributes same each other. 
> Msg: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> Msg.1: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> Msg.2: 2020-06-24 13:33:49  GET /Test/Service/test.css
>  200 0 0 852 7005 921
> 
> How can I remove duplicate attributes from extract text output? Or I need to
> use another way?
> Do you have any comment or suggestion?
> 
> My environment details are below:
> Apache NiFi 1.11.3
> Windows Server 2016
> Java JRE 1.8.0_241 (64 Bit)
> 
> 
> 
> --
> Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/