subject:"Question with ExtractText Processor"

Re: Question with ExtractText Processor

2017-07-12 Thread Mark Payne

Atish,

I think there may be a limit on the number of extracted columns, but if you 
exceeded that limit,
then the Processor would be invalid. If you are trying to use a regex that has 
34 .* segments, then
the performance is likely to be awful. Any time you have a .* in a regex it's 
quite expensive. Doing that
34 times can be incredibly expensive.

Is it possible for you to upgrade to a newer version of NiFi? With the newest 
version (1.3) there was the
introduction of a handful of Record-oriented Processors. These should make flow 
design dramatically
easier and should result in far, far better performance.

So instead of using a SplitContext -> ExtractText with regexes -> ReplaceText 
you could just simply use
ConvertRecord (with a CSV Reader and a JSON Writer), and it will keep all of 
the records within a single FlowFile.
No need to fuss with regular expressions or replacing text.

Thanks
-Mark

> On Jul 12, 2017, at 1:24 PM, Atish Ray  wrote:
> 
> Thanks!!! Regex is working for me with smaller number of column. Another
> problem I am facing with ExtractText processor. My pipe delimited file 
> having 34 fields. I need to convert all 34 fields and convert them into
> json. My file size is around 30MB. So I am converting from CSV to JSON using
> "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText.
> Do we have any limitation on number of extracted column? 
> 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16412.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Question with ExtractText Processor

2017-07-12 Thread Lee Laim

Atish,

ExtractText has a setting called Max Buffer Size that defaults to 1 MB, but
I don't think this is causing your queue to build up before Extract Text.
While I don't know the exact details of your flow, I suggest you try Split
Text instead of Split Content, as this is a more commonly used pattern.

An example template is available on the NiFi dev page:
https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
CsvToJSON.xml
<https://cwiki.apache.org/confluence/download/attachments/57904847/CsvToJSON.xml?version=2&modificationDate=1486479474000&api=v2>
This
flow shows how to convert a CSV entry to a JSON document using ExtractText
and ReplaceText.

If you upgrade to 1.3, you can use the NiFi schema registry to convert
formats.
Here is a great write up:
http://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries

Thanks,
Lee

On Wed, Jul 12, 2017 at 11:24 AM, Atish Ray  wrote:

> Thanks!!! Regex is working for me with smaller number of column. Another
> problem I am facing with ExtractText processor. My pipe delimited file
> having 34 fields. I need to convert all 34 fields and convert them into
> json. My file size is around 30MB. So I am converting from CSV to JSON
> using
> "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText.
> Do we have any limitation on number of extracted column?
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/Question-with-ExtractText-
> Processor-tp16405p16412.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: Question with ExtractText Processor

2017-07-12 Thread Atish Ray

Thanks!!! Regex is working for me with smaller number of column. Another
problem I am facing with ExtractText processor. My pipe delimited file 
having 34 fields. I need to convert all 34 fields and convert them into
json. My file size is around 30MB. So I am converting from CSV to JSON using
"SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText.
Do we have any limitation on number of extracted column? 




--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16412.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Question with ExtractText Processor

2017-07-12 Thread Mark Payne

Atish,

I believe you will need to escape the pipe with a backslash:

(.+)\|(.+)\|(.+)\|(.+)\|(.+)

Thanks
-Mark


> On Jul 12, 2017, at 10:13 AM, Atish Ray  wrote:
> 
> Hi,
> I am working with Extract text processor. I have a file which is having Pipe
> Delimited text. I have Spited the file based on every line and now I want to
> extract text from each splited line. I have Difined attribute in ExtractText
> processor as "csv" and given the regex as   (.+)|(.+)|(.+)|(.+)|(.+). 
> 
> Then My expectation is, I will be getting attribute value as csv.1="first
> element of line" ,csv.2="second element of line" ,csv.3="third element of
> line" and csv.4="fourth element of line".
> 
> But it is not working for me. But if I make the file as "," delimited and
> use regex as (.+),(.+),(.+),(.+),(.+), then I am getting the correct
> attribute value.
> 
> Can you please suggest what would be the regex, if I use Pipe delimited
> file?
> 
> 
> Regards
> Atish 
> 
> 
> 
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Question with ExtractText Processor

2017-07-12 Thread Atish Ray

Hi,
I am working with Extract text processor. I have a file which is having Pipe
Delimited text. I have Spited the file based on every line and now I want to
extract text from each splited line. I have Difined attribute in ExtractText
processor as "csv" and given the regex as   (.+)|(.+)|(.+)|(.+)|(.+). 

Then My expectation is, I will be getting attribute value as csv.1="first
element of line" ,csv.2="second element of line" ,csv.3="third element of
line" and csv.4="fourth element of line".

But it is not working for me. But if I make the file as "," delimited and
use regex as (.+),(.+),(.+),(.+),(.+), then I am getting the correct
attribute value.

Can you please suggest what would be the regex, if I use Pipe delimited
file?


Regards
Atish 



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Question with ExtractText Processor

2017-07-12 Thread Atish Ray

I am using nifi 1.1



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16406.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: Question with ExtractText Processor

Re: Question with ExtractText Processor

Re: Question with ExtractText Processor

Re: Question with ExtractText Processor

Question with ExtractText Processor

Re: Question with ExtractText Processor

6 matches

Site Navigation

Mail list logo

Footer information