Re: Question with ExtractText Processor
Atish, I think there may be a limit on the number of extracted columns, but if you exceeded that limit, then the Processor would be invalid. If you are trying to use a regex that has 34 .* segments, then the performance is likely to be awful. Any time you have a .* in a regex it's quite expensive. Doing that 34 times can be incredibly expensive. Is it possible for you to upgrade to a newer version of NiFi? With the newest version (1.3) there was the introduction of a handful of Record-oriented Processors. These should make flow design dramatically easier and should result in far, far better performance. So instead of using a SplitContext -> ExtractText with regexes -> ReplaceText you could just simply use ConvertRecord (with a CSV Reader and a JSON Writer), and it will keep all of the records within a single FlowFile. No need to fuss with regular expressions or replacing text. Thanks -Mark > On Jul 12, 2017, at 1:24 PM, Atish Ray wrote: > > Thanks!!! Regex is working for me with smaller number of column. Another > problem I am facing with ExtractText processor. My pipe delimited file > having 34 fields. I need to convert all 34 fields and convert them into > json. My file size is around 30MB. So I am converting from CSV to JSON using > "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText. > Do we have any limitation on number of extracted column? > > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16412.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Question with ExtractText Processor
Atish, ExtractText has a setting called Max Buffer Size that defaults to 1 MB, but I don't think this is causing your queue to build up before Extract Text. While I don't know the exact details of your flow, I suggest you try Split Text instead of Split Content, as this is a more commonly used pattern. An example template is available on the NiFi dev page: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates CsvToJSON.xml <https://cwiki.apache.org/confluence/download/attachments/57904847/CsvToJSON.xml?version=2&modificationDate=1486479474000&api=v2> This flow shows how to convert a CSV entry to a JSON document using ExtractText and ReplaceText. If you upgrade to 1.3, you can use the NiFi schema registry to convert formats. Here is a great write up: http://bryanbende.com/development/2017/06/20/apache-nifi-records-and-schema-registries Thanks, Lee On Wed, Jul 12, 2017 at 11:24 AM, Atish Ray wrote: > Thanks!!! Regex is working for me with smaller number of column. Another > problem I am facing with ExtractText processor. My pipe delimited file > having 34 fields. I need to convert all 34 fields and convert them into > json. My file size is around 30MB. So I am converting from CSV to JSON > using > "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText. > Do we have any limitation on number of extracted column? > > > > > -- > View this message in context: http://apache-nifi-developer- > list.39713.n7.nabble.com/Question-with-ExtractText- > Processor-tp16405p16412.html > Sent from the Apache NiFi Developer List mailing list archive at > Nabble.com. >
Re: Question with ExtractText Processor
Thanks!!! Regex is working for me with smaller number of column. Another problem I am facing with ExtractText processor. My pipe delimited file having 34 fields. I need to convert all 34 fields and convert them into json. My file size is around 30MB. So I am converting from CSV to JSON using "SplitContent">ExtractText>ReplaceText. Queue is stuck before ExtractText. Do we have any limitation on number of extracted column? -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16412.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Question with ExtractText Processor
Atish, I believe you will need to escape the pipe with a backslash: (.+)\|(.+)\|(.+)\|(.+)\|(.+) Thanks -Mark > On Jul 12, 2017, at 10:13 AM, Atish Ray wrote: > > Hi, > I am working with Extract text processor. I have a file which is having Pipe > Delimited text. I have Spited the file based on every line and now I want to > extract text from each splited line. I have Difined attribute in ExtractText > processor as "csv" and given the regex as (.+)|(.+)|(.+)|(.+)|(.+). > > Then My expectation is, I will be getting attribute value as csv.1="first > element of line" ,csv.2="second element of line" ,csv.3="third element of > line" and csv.4="fourth element of line". > > But it is not working for me. But if I make the file as "," delimited and > use regex as (.+),(.+),(.+),(.+),(.+), then I am getting the correct > attribute value. > > Can you please suggest what would be the regex, if I use Pipe delimited > file? > > > Regards > Atish > > > > -- > View this message in context: > http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405.html > Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Question with ExtractText Processor
Hi, I am working with Extract text processor. I have a file which is having Pipe Delimited text. I have Spited the file based on every line and now I want to extract text from each splited line. I have Difined attribute in ExtractText processor as "csv" and given the regex as (.+)|(.+)|(.+)|(.+)|(.+). Then My expectation is, I will be getting attribute value as csv.1="first element of line" ,csv.2="second element of line" ,csv.3="third element of line" and csv.4="fourth element of line". But it is not working for me. But if I make the file as "," delimited and use regex as (.+),(.+),(.+),(.+),(.+), then I am getting the correct attribute value. Can you please suggest what would be the regex, if I use Pipe delimited file? Regards Atish -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.
Re: Question with ExtractText Processor
I am using nifi 1.1 -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/Question-with-ExtractText-Processor-tp16405p16406.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.