NiFi-Neo4j Issues

2017-05-03 Thread dale.chang13
Hi All,

At the bottom you can find my question. Note, I am positive this is more a
network issue, but I cannot seem to figure out the solution. I tried posting
this to the  Neo4j Google+ board
  , and as a relevant 
NiFi-Neo4j Github issue    (I
know this is not an official NiFi-supported processor, so you can handle
this thread how you wish); however, no luck. First, here is the structure of
my architecture:

Windows 8 host machine with Windows Hyper-V Manager. Hyper-V Manager has 6
RHEL nodes running a Hadoop cluster--Hortonworks. I have NiFi
1.0.0.2.0.1.0-12 (no SSL enabled) on HDF (Hortonworks Data Flow). On the
Windows host machine, I have Neo4j Enterprise (trial) edition 3.1.3.

Windows/Neo4j configuration:
- I found that my windows IP address i (windows.ip.address)
- I started Neo4j EE and confirmed that the browser client works.
- I went into the neo4j.conf file and enabled the property
"dbms.bolt.connector.listen_address=0.0.0.0:7687, which allows Neo4j to
listen on all networks
- I also went into the Neo4j browser, went down to the settings tab, and
configured the database URI to be bolt://(windows.ip.address):7687 (default
is bolt://localhost:7687)
- Additionally I did a netstat -a to see that the TCP connection is active
and I have a service listening to 0.0.0.0:7687 and [::]:7687

Linux/NiFi configuration:
- According to the NiFi-Neo4j Github I linked above, I created a
Neo4jBoltSessionPool Controller Service and specified the Bolt DB Connection
URL to be a variety of things (0.0.0.0:7687, localhost:7687,
(windows.ip.address):7687, (foreign.ip.address):7687, etc)
- Created a PutCypher processor and put in a simple LOAD CSV WITH HEADERS
FROM ${file_path} cypher query

When I pass in a FlowFile (I have confirmed that the file path is saved as
the file_path FlowFile attribute) and run the PutCypher processor, I get an
error saying that the neo4j driver was "unable to connect to
(ip.address):7687, ensure the database is running and that there is a
working network connection to it"

Is my configuration wrong or could it be that the Windows firewall is
preventing communication to the bolt address:port?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/NiFi-Neo4j-Issues-tp15654.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


ExecuteProcess Question

2016-09-27 Thread dale.chang13
So I have a bash script that I am able to run from the command line, and I
want to be able to let NiFi call it using the ExecuteProcess processor.

The script itself runs fine from the command line, and it looks like the
ExecuteProcess is executing the script as well (I have a LogAttribute
processor as a downstream processor that verifies that no problems were
executing the script), but neither the Bulletin nor the logs tell me if
anything is wrong or successful. 

I found that the echo commands in the script should write to the NiFi
FlowFile Content, but I do not see anything show up.



The script is simply a java -jar xx.jar file, which happens to contain a
java wrapper class with a main method that calls a scala main object that
then performs Apache Spark operations.

Any ideas?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/ExecuteProcess-Question-tp13471.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


ListenHTTP Questions

2016-06-14 Thread dale.chang13
So I am having some woes with sending HTTP Post requests to my NiFi server by
using curl to send a JSON object.
I have a ListenHTTP Processor running that is configured as so:
- BasePath = contentListener
- ListeningPort = 8011
- Authorized DN Pattern = .*
- Max Unconfirmed FlowFile Time = 60 secs

And I am trying to send a JSON object via curl:
curl -i -X POST -H 'Content-Type: application/json' -d '{"test":"Hello,
NiFi"}' http://[hostname for NiFi]:8011/contentListener

And I feel like this needs to be sent elsewhere, but I am thinking that the
internal HTTP Server that ListenHTTP creates because I get /curl: (7)
couldn't connect to host/. This usually arises as a result of the server
receiving the HTTP request being down, firewall (have none at the moment),
and other server errors.

I have been able to successfully use the Rest API to modify Processors,
Processor Groups, etc through curl. Is there something I am missing?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/ListenHTTP-Questions-tp11441.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


NiFi Version Bundled with HDF 1.1.1.0-12

2016-06-01 Thread dale.chang13
I cannot find what version of NiFi is bundled with Hortonworks Dataflow
1.1.1.0-12. Is there a way I can find out?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/NiFi-Version-Bundled-with-HDF-1-1-1-0-12-tp10857.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


AttributesToJSON Multi-Valued Fields

2016-05-17 Thread dale.chang13
I don't know if this use-case is too specific to be a feature for a future
release, but I would like to see the AttributesToJSON processor support
multi-valued fields.

In my use-case, I am storing JSON documents into Solr, and there are two
ways to store multi-valued fields: using an array and repeating keys in the
JSON document. The result is a JSON array stored in that field in Solr,
regardless if you submit repeated keys or not.

*Would we like to see this implemented, and how should we go about doing
so?* I was thinking that if a FlowFile attribute is a comma-separated list,
we could continue to use the Jackson ObjectMapper() and convert it to an
array.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/AttributesToJSON-Multi-Valued-Fields-tp10452.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Distribute FlowFiles among Nodes in a Cluster

2016-05-16 Thread dale.chang13
I was wondering if there was a way for the NCM to distribute FlowFiles to
different nodes.

Currently I see that all of the nodes in my cluster run the same dataflow. I
know I can restrict certain processors to the primary node, but it seems
like the NCM does not distribute the FlowFiles to different nodes once the
primary node have finished their operations. 

Additionally, I have checked nifi.properties and I do not have any
nifi.state properties configured



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Distribute-FlowFiles-among-Nodes-in-a-Cluster-tp10293.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Purpose of Disallowing Attribute Expression

2016-05-13 Thread dale.chang13
Michael Moser wrote
> NIFI-1077 [1] has discussed this a bit in the past, when
> ConvertCharacterSet was improved to support expression language.  A JIRA
> ticket is needed to spur action on these requests.
> 
> An interesting case to help this would be to improve the IdentifyMimeType
> processor to detect character encodings on text data.  Apache Tika can do
> it with an EncodingDetector [2], so why not take advantage since it's
> already part of IdentifyMimeType?  I think this would be cool so I wrote
> NIFI-1874 [3].
> 
> [1] - https://issues.apache.org/jira/browse/NIFI-1077
> [2] -
> https://tika.apache.org/1.12/api/org/apache/tika/detect/EncodingDetector.html
> [3] - https://issues.apache.org/jira/browse/NIFI-1874

Funny enough, my company's backlog says that we would need to have a
character set detection processor. However, I just got assigned a bunch of
tasks for our next sprint. I'd love to have either my colleague or I take up
JIRA 1874, but we'll have to wait until the sprint after



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221p10234.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Purpose of Disallowing Attribute Expression

2016-05-12 Thread dale.chang13
Joe Witt wrote
> It is generally quite easy to enable for Property Descriptors which
> accept user supplied strings.  And this is one that does seem like a
> candidate.  Were you wanting it to look at a flowfile attribute to be
> the way of indicating the character set?
> 
> Thinking through this example the challenges that come to mind are:
> - What to do if the flow file doesn't have the charset indicated as an
> attribute?
> - What to do if the charset indicated by the flowfile attribute isn't
> supported?
> 
> There are various cases to consider is all and your idea is a good one
> to pursue in my view.  We had wanted to make it be an enumerated value
> at one point so users could only selected from known/valid charsets.
> But your idea is good too.

Yes, setting the character set or other properties as a flowfile attribute
would be helpful. I have already tweaked Extract Text in order to support
expression language as well as providing UTF-8 as the default character set
and remove its mandatory specification

I suppose the ExtractText processor could route to an "invalid character
set" relationship if there is a conflict. That would require a character set
detection service at the least though.

I only asked because our limitation was to use as much out-of-the-box
functionality and as little custom processors as possible for maintenance's
sake.

Would it be possible to implement this change (more properties supporting
expression language) in future releases? I know it would warrant an in-depth
discussion on the goals that NiFi would like to achieve



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221p10227.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Purpose of Disallowing Attribute Expression

2016-05-12 Thread dale.chang13
What is the purpose of not allowing a Processor property to support
expression language? Not allowing a property such as "Character set" in the
ExtractText Processor is proving to be a hindrance. Would it affect NiFi
under the hood if it were otherwise?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Exception while restarting the Nifi Cluster

2016-05-11 Thread dale.chang13
Rahul Dahiya wrote
> Hi Team,
> 
> 
> I am getting below exception while trying to restart the NiFi nodes :
> 
> 
> java.lang.Exception: Unable to load flow due to: java.io.IOException:
> org.apache.nifi.cluster.ConnectionException: Failed to connect node to
> cluster because local flow controller partially updated.  Administrator
> should disconnect node and review flow for corruption.
> at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:783)
> ~[nifi-jetty-0.6.1.jar:0.6.1]
> at org.apache.nifi.NiFi.
> 
> (NiFi.java:137) [nifi-runtime-0.6.1.jar:0.6.1]
> at org.apache.nifi.NiFi.main(NiFi.java:227)
> [nifi-runtime-0.6.1.jar:0.6.1]
> 
> 
> 
> Based on the following link :
> 
> https://mail-archives.apache.org/mod_mbox/nifi-dev/201508.mbox/%

> 3CBAY172-W19DF8C4EDE001FC6B8CA0ECE6B0@

> %3E
> 
> 
> it seems that the issue could be because of trailing while space / 
> incorrect entries in the following properties files
> 
> 
> nifi.sensitive.props.key
> nifi.sensitive.props.algorithm
> nifi.sensitive.props.provider
> 
> 
> I've checked these properties files on all the nodes and they are exactly
> the same on all nodes with no trailing white space.
> 
> 
> Can someone please help on what could be the root cause of this problem
> and how can it be resolved .
> 
> 
> Also I don't want to clean the nifi working directories (I know it will
> work fine on cleaning the directories). Thanks in advance for the help.
> 
> 
> Regards,
> 
> Rahul

Could you give us the rest of the stack trace? It should contain more
information that would help us further diagnose your problem



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Exception-while-restarting-the-Nifi-Cluster-tp10135p10137.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Deallocation of FlowFiles

2016-05-05 Thread dale.chang13
Is there a way to free up resources (memory and disk space in repos like
content_repo, flowfile_repo) at the conclusion of a NiFi flow? I would like
to reclaim those resources quickly so I can reuse them for newer FlowFiles.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Deallocation-of-FlowFiles-tp9946.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: FetchFile Cannot Allocate Enough Memory

2016-05-04 Thread dale.chang13
Mark Payne wrote
> Dale,
> 
> I think an image of the flow would be useful. Or better yet, if you can, a
> template of the flow, so
> that we can see all of the configuration being used.
> 
> When you said you "get stuck at around 20 MB and then NiFi moves to a
> crawl" I'm not clear on
> what you are saying exactly. After you process 20 MB of the 189 MB CSV
> file? After you ingest
> 20 MB worth of files via the second FetchFile?
> 
> Also, which directory has 85,000 files? The first directory being polled
> via ListFile, or the directory
> that you are picking up from via the second FetchFile?
> 
> Thanks
> -Mark

Attached below you should find the template the flow in question. I had to
remove or alter some information, but that does not impact the workflow. 

We have to ingest a CSV (called a DAT file) with variable delimiters and
qualifiers (so it's not really a CSV). The CSV has headers and many lines.
Each line corresponds to one text document. Each line also contains metadata
about and a URI to that document.
There are several folders which contain text files that are described by the
big CSV file. Two FetchFiles later in the flow will read attempt to find the
text documents corresponding to the URIs.

Here's a description of the directory structure:
*/dat* contains the gigantic CSV 
*/directory1* contains thousands of text documents that are described by the
CSV
*/directory2* contains additional documents described by the CSV
*/directory3*... and so on

Here are the steps to my flow:

1) The first List-Fetch File you will find in the first Process Group named
"Find and Read DATFile". The DAT file reads the CSV that contains hundreds
of thousands of lines.

2) The Split DATFile Process Group chunks the CSV file into individual
FlowFiles.

3) In the Clean/Extract Metadata Process Group, we have to use regular
expressions via ExtractText to write the metadata to FlowFile attributes, to
then use AttributesToJSON and then store those JSON documents to Solr. The
Processors in this group use regular expressions to clean and validate the
later generated JSON document.

4) The Read Extracted Text Process Group contains the second FetchFile that
reads in files according to the URIs listed in the CSV. This is where the
read/write speed dips ("NiFi moves to a crawl") once 20-30 MB of text files
have been read through the second FetchFile.

5) The Store in Solr Process Group batches up JSON documents and stores them
to SolrCloud.

Document_Ingestion_Redacted.xml

  



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9916.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: FetchFile Cannot Allocate Enough Memory

2016-05-04 Thread dale.chang13
Joe Witt wrote
> On May 4, 2016, at 8:56 AM, Joe Witt 

> joe.witt@

>  wrote:
> 
> Dale,
> 
> Where there is a fetch file there is usually a list file.  And while
> the symptom of memory issues is showing up in fetch file i am curious
> if the issue might actually be caused in ListFile.  How many files are
> in the directory being listed?
> 
> Mark,
> 
> Are we using a stream friendly API to list files and do we know if
> that API on all platforms really doing things in a stream friendly
> way?
> 
> Thanks
> Joe

So I will explain my flow first and then I will answer your question of how
I am using ListFile and FetchFile.

To begin my process, I am ingesting a CSV file that contains a list of
filenames. The first (and only ListFile) starts off the flow and passes it
to the first FetchFile to retrieve the contents of the documents. Afterward,
I use expression language (ExtractText) to extract all of the file names and
put them as attributes to individual FlowFiles. THEN I use a second
FetchFile (this is the processor that has trouble allocating memory) and use
expression language to use that file name to retrieve a text document.

The CSV file (189 MB) contains metadata and path/filenames for over 200,000
documents, and I am having trouble reading from a directory of about 85,000
documents (second FetchFile, each document is usually a few KB). I get stuck
at around 20 MB and then NiFi moves to a crawl.

I can give you a picture of our actual flow if you need it


Mark Payne wrote
> ListFile performs a listing using Java's File.listFiles(). This will
> provide a list of all files in the
> directory. I do not believe this to be related, though. Googling indicates
> that when this error
> occurs it is related to the ability to create a native process in order to
> interact with the file system.
> I don't think the issue is related to Java heap but rather available RAM
> on the box. How much RAM
> is actually available on the box? You mentioned IOPS - are you running in
> a virtual cloud environment?
> Using remote storage such as Amazon EBS?

I am running six Linux VMs on a Windows 8 machine. Three VMs (one ncm, two
nodes) use NiFi and those VMs have 20 GB assigned to them. Looking through
Ambari and monitoring the memory on the nodes, I have a little more than 4
GB free RAM on the nodes. It looks like the free memory dipped severely
during my NiFi flow, but no swap memory was used.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720p9911.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: FetchFile Cannot Allocate Enough Memory

2016-04-29 Thread dale.chang13
Mark Payne wrote
> Dale,
> 
> I haven't seen this issue personally. I don't believe it has to do with
> content/flowfile
> repo space. Can you check the logs/nifi-app.log file and give us the exact
> error message
> from the logs, with the stack trace if it is provided?
> 
> Thanks
> -Mark

Sure, this is from one of the slave nodes. I hardly provides any
information. I supposed I could do a jstat or a df -h.

I've also created a MonitorMemory Reporting Task, but I cannot seem to
provide the correct names for the memory pool. The only one that works fine
is G1 Old Gen memory pool

app.log wrote
> 2016-04-29 10:16:28,027 ERROR [Timer-Driven Process Thread-6]
> o.a.nifi.processors.standard.FetchFile
> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188512.txt from
> file system for
> StandardFlowFileRecord[uuid=1a2d7918-377e-4256-8610-1b12493eb16e,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
> container=default, section=463], offset=47518,
> length=1268],offset=0,name=FW: FERC Daily News.msg,size=1268] due to
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188512.txt
> (Cannot allocate memory); routing to failure:
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188512.txt
> (Cannot allocate memory)
> 
> 2016-04-29 10:16:28,028 ERROR [Timer-Driven Process Thread-6]
> o.a.nifi.processors.standard.FetchFile
> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188518.txt from
> file system for
> StandardFlowFileRecord[uuid=3b5fef42-2ded-47cc-aba2-6caf95f04977,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
> container=default, section=463], offset=48786,
> length=1640],offset=0,name=FW: FERC Docket No. EL01-47:  Removing
> Obstacles To Increased Eleu0020ctri c Generation And Natural Gas Supply In
> The Western United States.msg,size=1640] due to
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188518.txt
> (Cannot allocate memory); routing to failure:
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188518.txt
> (Cannot allocate memory)
> 
> 2016-04-29 10:16:28,029 ERROR [Timer-Driven Process Thread-8]
> o.a.nifi.processors.standard.FetchFile
> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188526.txt from
> file system for
> StandardFlowFileRecord[uuid=71715448-2acd-4f5c-af57-9209461fe62e,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
> container=default, section=463], offset=50426,
> length=1272],offset=0,name=FW: workshop notice.msg,size=1272] due to
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188526.txt
> (Cannot allocate memory); routing to failure:
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188526.txt
> (Cannot allocate memory)
> 
> 2016-04-29 10:16:28,030 ERROR [Timer-Driven Process Thread-9]
> o.a.nifi.processors.standard.FetchFile
> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188534.txt from
> file system for
> StandardFlowFileRecord[uuid=0bf8666b-a9bc-4412-905e-6e4a2b13253d,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
> container=default, section=463], offset=51698,
> length=1440],offset=0,name=FW: Final Report on Workshop Report to Discuss
> Alternative Gas Inu0020dice s.msg,size=1440] due to
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188534.txt
> (Cannot allocate memory); routing to failure:
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188534.txt
> (Cannot allocate memory)
> 
> 2016-04-29 10:16:28,034 ERROR [Timer-Driven Process Thread-3]
> o.a.nifi.processors.standard.FetchFile
> FetchFile[id=6c7482f2-5780-37c8-99a0-f2d87cbcbba9] Could not fetch file
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188558.txt from
> file system for
> StandardFlowFileRecord[uuid=805d4127-d86b-4b4b-a7a0-9f6f300dc13e,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1461938937407-463,
> container=default, section=463], offset=54433,
> length=1334],offset=0,name=FW: Proposed NARUC resolution on
> hedging.msg,size=1334] due to java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188558.txt
> (Cannot allocate memory); routing to failure:
> java.io.FileNotFoundException:
> /tmp/hddCobrasan/Export1/VOL01/TEXT/TEXT01/ENR-00188558.txt
> (Cannot allocate memory)
> 
> 2016-04-29 10:16:28,035 ERROR 

FetchFile Cannot Allocate Enough Memory

2016-04-29 Thread dale.chang13
I have been trying to run my data flow and I have been running into a problem
with being unable to read FetchFiles. I will detail my process below and I
would like some confirmation of my suspicions.

First I am ingesting an initial file that is fairly large, which contains
the path/filename of a ton of text files within another directory. The goal
is to read in the content of that large file, then read in the contents of
the thousands of text files, and then store the text file content into Solr.

The problem I am having is that the second FetchFile, the one that reads in
the smaller text files, frequently reports an error: /FileNotFoundException
xxx.txt (Cannot allocate memory); routing to failure/. This FetchFile runs
for about 2 files and then continuously reports the above error for the
rest of the files.

My suspicion is of two concerns: not enough heap space vs. not enough
content_repo/flowfile_repo space. Any ideas or questions?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/FetchFile-Cannot-Allocate-Enough-Memory-tp9720.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: NiFi Rest API Start

2016-04-25 Thread dale.chang13
Matt Burgess wrote
> It's part of the "Update Processor" API, check out this thread from
> the NiFi dev list archive:
> 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Reg-starting-and-Stopping-processor-tp7930p7949.html
> 
> Regards,
> Matt

Nice, so I was able to change the status and the properties of a processor,
thanks to your help. I have a follow up question:
/Would you be able to start a whole process group through a rest api?/

I have tried using the "Update Process Group" API with partial success. I
can't seem to provide the correct JSON attribute when it asks for the parent
group:


HTTP Request/Response wrote
> curl -i -X PUT -H 'Content-Type:application/json' -d '
> {"revision":
>   {"clientId":"test-api-1",
>"version":37},
>  "processGroup": {
>"id":"[PROCESS_GROUP_ID]",
>"running":"true",
>"parentGroupId":"[PARENT_GROUP_ID]"
>   }
> }'
> http://localhost:8181/nifi-api/controller/process-groups/[PROCESS_GROUP_ID]
> HTTP/1.1 400 Bad Request
> Date: Mon, 25 Apr 2016 20:22:05 GMT
> Server: Jetty(9.2.11.v20150529)
> Date: Mon, 25 Apr 2016 20:22:05 GMT
> Content-Type: text/plain
> Content-Length: 86
> 
> Unable to create the specified process group since the parent group was
> not specified.





--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/NiFi-Rest-API-Start-tp9598p9601.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


NiFi Rest API Start

2016-04-25 Thread dale.chang13
Looking through the NiFi API, I am not finding a way to start a Processor,
Process Group, or an instance of NiFi. Is there a Rest API command I can
invoke to start one of these? The API says there is a way to start and stop
processors



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/NiFi-Rest-API-Start-tp9598.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: PutSolrContentStream Doc and DocID confusion

2016-04-20 Thread dale.chang13
/Note: I omitted a bunch of fields from the JSON document when using the Solr
UI and curl./

I performed additional tests from the UI:

 Storing the entire JSON document using the UI resulted in an Http 400 Bad
Request Response. By comparing our JSON document to the Solr schema, I saw
the JSON document fields with dates were probably not formatted correctly
(unsure at the time of this post). By removing the dates I was able to
successfully store the JSON document.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/PutSolrContentStream-Doc-and-DocID-confusion-tp9400p9422.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: PutSolrContentStream Doc and DocID confusion

2016-04-20 Thread dale.chang13
Hi Brian,

Yes, the JSON object I am storing is a valid JSON document. The Content
Payload is set to true and the value is:

{"docid":"a1602677-fc7c-43ea-adba-c1ed945ede3d_1831"}


I believe I would have gotten a JSON syntax error saying that the JSON
object was invalid.

---

I have a PutSolrContentStream that routes FlowFiles to a LogAttribute on
/connection_failure/ or /failure/

Here is what I see in the Bulletin

14:55:15 EDT   ERROR 1ed45988-8ad6-3252-bd6d-7410b6dba8fd
localhost:8181
PutSolrContentStream[id=1ed45988-8ad6-3252-bd6d-7410b6dba8fd] Failed to send
StandardFlowFileRecord[uuid=6b929b50-57d9-4178-b276-43eea977569a,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1461178507835-17, container=default,
section=17], offset=173166, length=122],offset=0,name=block.msg,size=122] to
Solr due to org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/cobra_shard1_replica3:
[doc=doj_civ_fraud_ws1_a1602677-fc7c-43ea-adba-c1ed945ede3d_1649] missing
required field: docid; routing to failure:
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/cobra_shard1_replica3:
[doc=document_a1602677-fc7c-43ea-adba-c1ed945ede3d_1649] missing required
field: docid

14:55:15 EDT WARNING 0897791f-cc5d-4276-b5ce-76e610ce1478
localhost:8181
LogAttribute[id=0897791f-cc5d-4276-b5ce-76e610ce1478] logging for flow file
StandardFlowFileRecord[uuid=8f1efd1e-7f71-44bf-8b88-bb890dce5905,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1461178507835-17, container=default,
section=17], offset=173898, length=122],offset=0,name=CANCELLED  Capital
Allocations w/ Naveen, D Port, M Walkeryour ofc.msg,size=122]
--
Standard FlowFile Attributes
Key: 'entryDate'
Value: 'Wed Apr 20 14:54:59 EDT 2016'
Key: 'lineageStartDate'
Value: 'Wed Apr 20 14:54:59 EDT 2016'
Key: 'fileSize'
Value: '122'
FlowFile Attribute Map Content
Key: 'docid'
Value: 'a1602677-fc7c-43ea-adba-c1ed945ede3d_1831'
--
{"docid":"a1602677-fc7c-43ea-adba-c1ed945ede3d_1831"}



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/PutSolrContentStream-Doc-and-DocID-confusion-tp9400p9411.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


PutSolrContentStream Doc and DocID confusion

2016-04-20 Thread dale.chang13
While using PutSolrContentStream to store a JSON object in SolrCloud, I've
been running into this issue of being unable to store a document. I've
uploaded a solr schema that says that the field *docid* is required and a
string. Attempting to store a document in solr, this is the error I get:

Failed to send StandardFlowFileRecord[...] to Solr due to due to
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at localhost:8983/solr/cobra_shard1_replica3:
/[*doc*="95663ced-6a3b-4356-877a-7c5707c046e7_779"]/ missing required field:
*docid*; routing to failure:

HOWEVER,

Using LogAttribute to print out the JSON object stored as the FlowFile's
content and specifically docid, it has a key-value pair
/[*docid*="95663ced-6a3b-4356-877a-7c5707c046e7_779"]/, which is the same
string that is printed out in the error from PutSolrContentStream.

My question: /Is there some confusion between the way Nifi uses *doc* and
the attribute *docid*?/ It referred to the document via
/[*doc*="95663ced-6a3b-4356-877a-7c5707c046e7_779"]/ after shard3.

Additionally, it looks like replica3 is the only shard that has problem in
my SolrCloud instance. 



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/PutSolrContentStream-Doc-and-DocID-confusion-tp9400.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Variable FlowFile Attributes Defined by FlowFile Content

2016-04-04 Thread dale.chang13
Essentially all of the information contained in a FlowFile's contents would
be translated to attributes. I would like to pass in a generic delimited
file with two rows:
- the first row contains header names,
- the second row contains values for each header corresponding to a single
entry.

The use-case would take the header names as FlowFile keys, and the following
line's values as values to the FlowFile keys. 

I will probably use ExecuteProcess or ExecuteStreamCommand if no built-in
functionality exists




Matt Burgess wrote
> This is an interesting idea, can you elaborate on what such a file would
> look like and how it would be used? Would it contain values to be used as
> attributes in ExtractText as well as the content from which to extract the
> values for these attributes?
> 
> In general, I don't believe a property name can be defined dynamically
> (Expression Language, e.g.) for processors like ExtractText, but NiFi
> 0.5.0+ has ExecuteScript and InvokeScriptedProcessor, both of which offer
> quite a bit of flexibility in these areas.
> 
> Regards,
> Matt





--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Variable-FlowFile-Attributes-Defined-by-FlowFile-Content-tp8682p8756.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Variable FlowFile Attributes Defined by FlowFile Content

2016-03-31 Thread dale.chang13
I see that the ExtractText Processor performs regular expressions on the
FlowFile's content and adds the results as user-defined attributes. However,
I was wondering if there was a way to avoid "hard-coding" these values. I
was hoping of something along the lines where the key literal of the
FlowFile attributes were defined in the FlowFile content.



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Variable-FlowFile-Attributes-Defined-by-FlowFile-Content-tp8682.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Splitting Incoming FlowFile, Output Multiple FlowFiles

2016-03-31 Thread dale.chang13
My specific use-case calls for ingesting a CSV table with many rows and then
storing individual rows into HBase and Solar. Additionally, I would like to
avoid developing custom processors, but it seems like the SplitText and
SplitContent Processors do not return individual flowfiles, each with their
own attributes.

However, I was wondering what the best plan of attack would be when taking
an incoming FlowFile and sending FlowFiles through Process Session? Creating
multiple instances of Process Session? session.transfer within a loop?



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Splitting-Incoming-FlowFile-Output-Multiple-FlowFiles-tp8653.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: Flow.xml.gz are Empty

2016-03-24 Thread dale.chang13
UPDATE:

Here's the complete stack trace for from the slave node's app.log

java.lang.Exception: Unable to load flow due to: java.io.IOException:
org.apache.nifi.cluster.ConnectionException: Failed to connect node to
cluster because local or cluster flow is malformed.
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:784)
~[nifi-jetty-1.1.1.0-12.jar:1.1.1.0-12]
at org.apache.nifi.NiFi.(NiFi.java:137)
[nifi-runtime-1.1.1.0-12.jar:1.1.1.0-12]
at org.apache.nifi.NiFi.main(NiFi.java:227)
[nifi-runtime-1.1.1.0-12.jar:1.1.1.0-12]
Caused by: java.io.IOException: org.apache.nifi.cluster.ConnectionException:
Failed to connect node to cluster because local or cluster flow is
malformed.
at
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:453)
~[nifi-framework-core-1.1.1.0-12.jar:1.1.1.0-12]
at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:775)
~[nifi-jetty-1.1.1.0-12.jar:1.1.1.0-12]
... 2 common frames omitted
Caused by: org.apache.nifi.cluster.ConnectionException: Failed to connect
node to cluster because local or cluster flow is malformed.
at
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:734)
~[nifi-framework-core-1.1.1.0-12.jar:1.1.1.0-12]
at
org.apache.nifi.controller.StandardFlowService.load(StandardFlowService.java:433)
~[nifi-framework-core-1.1.1.0-12.jar:1.1.1.0-12]
... 3 common frames omitted
Caused by: org.apache.nifi.controller.FlowSerializationException:
org.xml.sax.SAXParseException; lineNumber: 3384; columnNumber: 23; Character
reference 

Re: Flow.xml.gz are Empty

2016-03-24 Thread dale.chang13
I went through the nifi.properties for my ncm and two nodes and made sure
that the "nifi.sensitive.props.key" values are all the same.

However, I do not know how to make sure my ports are clear between the
nodes. Could you help me or direct me to to some resources?




>Make that your 'nifi.sensitive.props.key" is the same between all the 
>nodes. If it is not, you will get a partial load of the file, basically up 
>until the point something gets loaded that needs to be decrypted. 

>Make sure that your ports a clear between all the nodes. There are 3 ports 
t>hat you need to open for clustering. Sounds simple, but lots of time, 
p>eople get 2 out of 3 on one of the boxes. 



--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/Nodes-Flow-xml-gz-Are-Empty-When-Recreated-tp8503p8506.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Flow.xml.gz are Empty

2016-03-24 Thread dale.chang13
Good morning guys,

I've been having trouble connecting two slave nodes to the ncm. 
In the logs, on the line before "/unable to load flow due to...
ConnectionException... local or cluster flow is malformed/," 
I see /[Fatal Error] :3384:23: Character reference