[jira] [Commented] (MINIFICPP-39) Create FocusArchive processor

2017-10-13 Thread marco polo (JIRA)

[ 
https://issues.apache.org/jira/browse/MINIFICPP-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204354#comment-16204354
 ] 

marco polo commented on MINIFICPP-39:
-

[~calebj] An alternative is to use log attributes to do validation through the 
log output. Not a great way for validation, but we'll get to the 
current_flowfile_ eventually. When we have a chance and would always welcome 
contributions there. 

> Create FocusArchive processor
> -
>
> Key: MINIFICPP-39
> URL: https://issues.apache.org/jira/browse/MINIFICPP-39
> Project: NiFi MiNiFi C++
>  Issue Type: Task
>Reporter: Andrew Christianson
>Assignee: Andrew Christianson
>Priority: Minor
>
> Create an FocusArchive processor which implements a lens over an archive 
> (tar, etc.). A concise, though informal, definition of a lens is as follows:
> "Essentially, they represent the act of “peering into” or “focusing in on” 
> some particular piece/path of a complex data object such that you can more 
> precisely target particular operations without losing the context or 
> structure of the overall data you’re working with." 
> https://medium.com/@dtipson/functional-lenses-d1aba9e52254#.hdgsvbraq
> Why an FocusArchive in MiNiFi? Simply put, it will enable us to "focus in on" 
> an entry in the archive, perform processing *in-context* of that entry, then 
> re-focus on the overall archive. This allows for transformation or other 
> processing of an entry in the archive without losing the overall context of 
> the archive.
> Initial format support is tar, due to its simplicity and ubiquity.
> Attributes:
> - Path (the path in the archive to focus; "/" to re-focus the overall archive)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFIREG-31) Update Admin/Workflow tab

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFIREG-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204176#comment-16204176
 ] 

ASF GitHub Bot commented on NIFIREG-31:
---

GitHub user scottyaslan opened a pull request:

https://github.com/apache/nifi-registry/pull/21

[NIFIREG-31] update buckets data table to include search/filter capab…

…ilities and add bucket creation dialog

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-31

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-registry/pull/21.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21


commit 166a6364fa4953239f6d10f366bc68a9ff7ac334
Author: Scott Aslan 
Date:   2017-10-13T20:33:59Z

[NIFIREG-31] update buckets data table to include search/filter 
capabilities and add bucket creation dialog




> Update Admin/Workflow tab
> -
>
> Key: NIFIREG-31
> URL: https://issues.apache.org/jira/browse/NIFIREG-31
> Project: NiFi Registry
>  Issue Type: Sub-task
>Reporter: Scott Aslan
>
> Rename Workflow tab to buckets.
> Increase width of buckets container to match the users.
> Add user search type functionality to the buckets data table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi-registry pull request #21: [NIFIREG-31] update buckets data table to in...

2017-10-13 Thread scottyaslan
GitHub user scottyaslan opened a pull request:

https://github.com/apache/nifi-registry/pull/21

[NIFIREG-31] update buckets data table to include search/filter capab…

…ilities and add bucket creation dialog

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scottyaslan/nifi-registry NIFIREG-31

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi-registry/pull/21.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21


commit 166a6364fa4953239f6d10f366bc68a9ff7ac334
Author: Scott Aslan 
Date:   2017-10-13T20:33:59Z

[NIFIREG-31] update buckets data table to include search/filter 
capabilities and add bucket creation dialog




---


[jira] [Commented] (NIFI-4471) Set flow limits at process group level

2017-10-13 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204100#comment-16204100
 ] 

Kevin Risden commented on NIFI-4471:


Currently in NiFi there is nothing stopped a user from setting 10 million as 
the connection flowfile queue object count. The UI/API allows basically any 
number to be put in there. This is an easy way to cause instability in the 
cluster by a user doing what is allowed in the UI.

> Set flow limits at process group level
> --
>
> Key: NIFI-4471
> URL: https://issues.apache.org/jira/browse/NIFI-4471
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework
>Reporter: Haimo Liu
>
> In a multi-tenancy type of operational environment, as a NIFI admin user, I 
> want to be able to set some limits at the Process Group level, to prevent my 
> NIFI server from being stressed out.
> 1. I want to say "no connection's limit may be set higher than xxx MB."
> 2. "I can queue no more than xxx FFs at any connections"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4472) add alerts when a node is disconnected in my NIFI cluster

2017-10-13 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204097#comment-16204097
 ] 

Kevin Risden commented on NIFI-4472:


Currently we use Ambari to managed HDF and the only Ambari alert is for the 
NiFi process being down (pid not alive). We added a custom alert to Ambari to 
check when an HDF node disconnects. 

> add alerts when a node is disconnected in my NIFI cluster
> -
>
> Key: NIFI-4472
> URL: https://issues.apache.org/jira/browse/NIFI-4472
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Core Framework
>Reporter: Haimo Liu
>
> when a NIFI node is disconnected from my cluster, it would be nice that I can 
> get timely alters/notifications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4481) Identify which processors have been configured for "primary node" only

2017-10-13 Thread Matt Gilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Gilman updated NIFI-4481:
--
Status: Patch Available  (was: In Progress)

> Identify which processors have been configured for "primary node" only
> --
>
> Key: NIFI-4481
> URL: https://issues.apache.org/jira/browse/NIFI-4481
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core UI
>Reporter: Rob Moran
>Assignee: Matt Gilman
>Priority: Minor
> Attachments: primary-processor-id.png
>
>
> Possibly identify on canvas components and in the Summary table 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi pull request #2210: NIFI-4481: Visualize Processors Running on Primary ...

2017-10-13 Thread mcgilman
GitHub user mcgilman opened a pull request:

https://github.com/apache/nifi/pull/2210

NIFI-4481: Visualize Processors Running on Primary Node

NIFI-4481:
- Adding support for visualizing if a component is scheduled for primary 
node only.


[Visualization](https://issues.apache.org/jira/secure/attachment/12891747/primary-processor-id.png)
 should only be available in cluster mode where the Execution Node 
configuration is available. In cluster mode, the nodes on the canvas should 
show the (P) icon when configured for Execution Node: Primary. Like Run Status, 
this indication will be available regardless of permission for the component. 
Additionally, this indication is available in the Summary Table.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mcgilman/nifi NIFI-4481

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2210


commit 3f0d9e05c7c1aea0bd03a999c16893fb308436ca
Author: Matt Gilman 
Date:   2017-10-12T17:58:50Z

NIFI-4481:
- Adding support for visualizing if a component is scheduled for primary 
node only.




---


[jira] [Commented] (NIFI-4481) Identify which processors have been configured for "primary node" only

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204008#comment-16204008
 ] 

ASF GitHub Bot commented on NIFI-4481:
--

GitHub user mcgilman opened a pull request:

https://github.com/apache/nifi/pull/2210

NIFI-4481: Visualize Processors Running on Primary Node

NIFI-4481:
- Adding support for visualizing if a component is scheduled for primary 
node only.


[Visualization](https://issues.apache.org/jira/secure/attachment/12891747/primary-processor-id.png)
 should only be available in cluster mode where the Execution Node 
configuration is available. In cluster mode, the nodes on the canvas should 
show the (P) icon when configured for Execution Node: Primary. Like Run Status, 
this indication will be available regardless of permission for the component. 
Additionally, this indication is available in the Summary Table.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mcgilman/nifi NIFI-4481

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2210


commit 3f0d9e05c7c1aea0bd03a999c16893fb308436ca
Author: Matt Gilman 
Date:   2017-10-12T17:58:50Z

NIFI-4481:
- Adding support for visualizing if a component is scheduled for primary 
node only.




> Identify which processors have been configured for "primary node" only
> --
>
> Key: NIFI-4481
> URL: https://issues.apache.org/jira/browse/NIFI-4481
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core UI
>Reporter: Rob Moran
>Assignee: Matt Gilman
>Priority: Minor
> Attachments: primary-processor-id.png
>
>
> Possibly identify on canvas components and in the Summary table 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread Matt Gilman (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Gilman resolved NIFI-4484.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
> Fix For: 1.5.0
>
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4444) Upgrade Jersey Versions

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16204000#comment-16204000
 ] 

ASF GitHub Bot commented on NIFI-:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2206


> Upgrade Jersey Versions
> ---
>
> Key: NIFI-
> URL: https://issues.apache.org/jira/browse/NIFI-
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.4.0
>Reporter: Matt Gilman
>Assignee: Matt Gilman
> Fix For: 1.5.0
>
> Attachments: NIFI-.xml
>
>
> Need to upgrade to a newer version of Jersey. The primary motivation is to 
> upgrade the version used within NiFi itself. However, there are a number of 
> extensions that also leverage it. Of those extensions, some utilize the older 
> version defined in dependencyManagement while others override explicitly 
> within their own bundle dependencyManagement. For this JIRA I propose 
> removing the Jersey artifacts from the root pom and allow the version to be 
> specified on a bundle by bundle basis.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203999#comment-16203999
 ] 

ASF GitHub Bot commented on NIFI-4484:
--

Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2209


> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203997#comment-16203997
 ] 

ASF GitHub Bot commented on NIFI-4484:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2209
  
Thanks @andrewmlim! This has been merged to master.


> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi pull request #2209: NIFI-4484 Update screenshots in User Guide for Repo...

2017-10-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2209


---


[GitHub] nifi pull request #2206: NIFI-4444: Upgrade to Jersey 2.x

2017-10-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/2206


---


[GitHub] nifi issue #2209: NIFI-4484 Update screenshots in User Guide for Reporting T...

2017-10-13 Thread mcgilman
Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2209
  
Thanks @andrewmlim! This has been merged to master.


---


[jira] [Commented] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203996#comment-16203996
 ] 

ASF subversion and git services commented on NIFI-4484:
---

Commit 0e3d83c3b848bed5e32c93371375aa4514137986 in nifi's branch 
refs/heads/master from [~andrewmlim]
[ https://git-wip-us.apache.org/repos/asf?p=nifi.git;h=0e3d83c ]

NIFI-4484 Update screenshots in User Guide for Reporting Task Controller 
Services tab. This closes #2209


> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203990#comment-16203990
 ] 

ASF GitHub Bot commented on NIFI-4484:
--

Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2209
  
Will review...


> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi issue #2209: NIFI-4484 Update screenshots in User Guide for Reporting T...

2017-10-13 Thread mcgilman
Github user mcgilman commented on the issue:

https://github.com/apache/nifi/pull/2209
  
Will review...


---


[jira] [Commented] (MINIFICPP-72) Add tar and compression support for MergeContent

2017-10-13 Thread bqiu (JIRA)

[ 
https://issues.apache.org/jira/browse/MINIFICPP-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203763#comment-16203763
 ] 

bqiu commented on MINIFICPP-72:
---

Aldrin,

Currently NIFI support merge format tar and zip. I already committed merge 
content for minifi. This jira is to add merge fomat tar and zip for minifI so 
that merge content processor for minifi has feature parity with nifi.
I will add compress content processor for minifi in different jira

> Add tar and compression support for MergeContent
> 
>
> Key: MINIFICPP-72
> URL: https://issues.apache.org/jira/browse/MINIFICPP-72
> Project: NiFi MiNiFi C++
>  Issue Type: New Feature
>Affects Versions: 1.0.0
>Reporter: bqiu
> Fix For: 1.0.0
>
>
> Add tar and compression support for MergeContent
> will use the https://www.libarchive.org



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MINIFICPP-39) Create FocusArchive processor

2017-10-13 Thread Andrew Christianson (JIRA)

[ 
https://issues.apache.org/jira/browse/MINIFICPP-39?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203718#comment-16203718
 ] 

Andrew Christianson commented on MINIFICPP-39:
--

[~calebj], it looks like we currently don't have that facility in the cpp unit 
test framework. So for now, the main options are to use the putfile, and verify 
the output after it's written to disk, or to use the docker system integration 
test framework. The docs on the docker system integration test framework are at 
https://github.com/apache/nifi-minifi-cpp/blob/master/docker/test/integration/README.md.
 To do this, you'd have to add support for these processors (simple python 
classes), then create a custom OutputValidator (look at the 
SingleFileOutputValidator for an example).

> Create FocusArchive processor
> -
>
> Key: MINIFICPP-39
> URL: https://issues.apache.org/jira/browse/MINIFICPP-39
> Project: NiFi MiNiFi C++
>  Issue Type: Task
>Reporter: Andrew Christianson
>Assignee: Andrew Christianson
>Priority: Minor
>
> Create an FocusArchive processor which implements a lens over an archive 
> (tar, etc.). A concise, though informal, definition of a lens is as follows:
> "Essentially, they represent the act of “peering into” or “focusing in on” 
> some particular piece/path of a complex data object such that you can more 
> precisely target particular operations without losing the context or 
> structure of the overall data you’re working with." 
> https://medium.com/@dtipson/functional-lenses-d1aba9e52254#.hdgsvbraq
> Why an FocusArchive in MiNiFi? Simply put, it will enable us to "focus in on" 
> an entry in the archive, perform processing *in-context* of that entry, then 
> re-focus on the overall archive. This allows for transformation or other 
> processing of an entry in the archive without losing the overall context of 
> the archive.
> Initial format support is tar, due to its simplicity and ubiquity.
> Attributes:
> - Path (the path in the archive to focus; "/" to re-focus the overall archive)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-4383:
-
Status: Patch Available  (was: Open)

> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.4.0, 1.3.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>  Labels: records
>
> At the moment, if trying to use the update record to update the elements of 
> an array it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|
> When elements of the array are records, it's possible to update fields of the 
> record but not the record itself as-is.
> Also in the MultiArrayIndexPath implementation, index of array elements is 
> not correctly provided. Because of that, wrong elements of the array could be 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4484) Update doc screenshots in User Guide section "Adding Controller Services for Reporting Tasks"

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203605#comment-16203605
 ] 

ASF GitHub Bot commented on NIFI-4484:
--

GitHub user andrewmlim opened a pull request:

https://github.com/apache/nifi/pull/2209

NIFI-4484 Update screenshots in User Guide for Reporting Task Controller 
Services tab



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewmlim/nifi NIFI-4484

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2209


commit 6baea8ccffe93e6ea6289cac8970f95e95f797bf
Author: Matt Gilman 
Date:   2017-10-02T21:01:31Z

NIFI-:
- Upgrading to Jersey 2.x.
- Updating NOTICE files where necessary.
- Fixing checkstyle issues.

This closes #2206.

Signed-off-by: Andy LoPresto 

commit f86148f6bce18a9b9b63f1527c868b96b12188e2
Author: Andrew Lim 
Date:   2017-10-12T20:01:54Z

NIFI-4484 Update screenshots in User Guide for Reporting Task Controller 
Services tab




> Update doc screenshots in User Guide section "Adding Controller Services for 
> Reporting Tasks"
> -
>
> Key: NIFI-4484
> URL: https://issues.apache.org/jira/browse/NIFI-4484
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Documentation & Website
>Affects Versions: 1.4.0
>Reporter: Andrew Lim
>Assignee: Andrew Lim
>Priority: Minor
>
> Per NIFI-3941, the tab name was changed from "Controller Services" to 
> "Reporting Task Controller Services".
> The relevant screenshots need to be updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi pull request #2209: NIFI-4484 Update screenshots in User Guide for Repo...

2017-10-13 Thread andrewmlim
GitHub user andrewmlim opened a pull request:

https://github.com/apache/nifi/pull/2209

NIFI-4484 Update screenshots in User Guide for Reporting Task Controller 
Services tab



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewmlim/nifi NIFI-4484

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2209


commit 6baea8ccffe93e6ea6289cac8970f95e95f797bf
Author: Matt Gilman 
Date:   2017-10-02T21:01:31Z

NIFI-:
- Upgrading to Jersey 2.x.
- Updating NOTICE files where necessary.
- Fixing checkstyle issues.

This closes #2206.

Signed-off-by: Andy LoPresto 

commit f86148f6bce18a9b9b63f1527c868b96b12188e2
Author: Andrew Lim 
Date:   2017-10-12T20:01:54Z

NIFI-4484 Update screenshots in User Guide for Reporting Task Controller 
Services tab




---


[jira] [Commented] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203579#comment-16203579
 ] 

ASF GitHub Bot commented on NIFI-4383:
--

GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2208

NIFI-4383 - Fix UpdateRecord when updating arrays elements

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-4383

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2208


commit 5edcfa9ac7833de62ba8b3198c0ad32b16239035
Author: Pierre Villard 
Date:   2017-10-12T21:51:09Z

NIFI-4383 - Fix UpdateRecord when updating arrays elements




> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>  Labels: records
>
> At the moment, if trying to use the update record to update the elements of 
> an array it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|
> When elements of the array are records, it's possible to update fields of the 
> record but not the record itself as-is.
> Also in the MultiArrayIndexPath implementation, index of array elements is 
> not correctly provided. Because of that, wrong elements of the array could be 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi pull request #2208: NIFI-4383 - Fix UpdateRecord when updating arrays e...

2017-10-13 Thread pvillard31
GitHub user pvillard31 opened a pull request:

https://github.com/apache/nifi/pull/2208

NIFI-4383 - Fix UpdateRecord when updating arrays elements

Thank you for submitting a contribution to Apache NiFi.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

### For all changes:
- [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
 in the commit message?

- [ ] Does your PR title start with NIFI- where  is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.

- [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?

- [ ] Is your initial contribution a single, squashed commit?

### For code changes:
- [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
- [ ] Have you written or updated unit tests to verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
- [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
- [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
- [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?

### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?

### Note:
Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pvillard31/nifi NIFI-4383

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/2208.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2208


commit 5edcfa9ac7833de62ba8b3198c0ad32b16239035
Author: Pierre Villard 
Date:   2017-10-12T21:51:09Z

NIFI-4383 - Fix UpdateRecord when updating arrays elements




---


[jira] [Updated] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-4383:
-
Labels: records  (was: )

> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>  Labels: records
>
> At the moment, if trying to use the update record to update the elements of 
> an array it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|
> When elements of the array are records, it's possible to update fields of the 
> record but not the record itself as-is.
> Also in the MultiArrayIndexPath implementation, index of array elements is 
> not correctly provided. Because of that, wrong elements of the array could be 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-4383:
-
Affects Version/s: 1.4.0

> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>  Labels: records
>
> At the moment, if trying to use the update record to update the elements of 
> an array it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|
> When elements of the array are records, it's possible to update fields of the 
> record but not the record itself as-is.
> Also in the MultiArrayIndexPath implementation, index of array elements is 
> not correctly provided. Because of that, wrong elements of the array could be 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-4383:
-
Description: 
At the moment, if trying to use the update record to update the elements of an 
array it won't have any effect.

Input:
{noformat}
{
  "numbers" : [ 1, null, 4 ]
}
{noformat}

Parameters:

||Path||Value||Expected output||
|{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
|{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
|{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
|{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|

When elements of the array are records, it's possible to update fields of the 
record but not the record itself as-is.

Also in the MultiArrayIndexPath implementation, index of array elements is not 
correctly provided. Because of that, wrong elements of the array could be 
updated.


  was:
At the moment, if trying to use the update record to update an array of simple 
fields (not records) it won't have any effect.

Input:
{noformat}
{
  "numbers" : [ 1, null, 4 ]
}
{noformat}

Parameters:

||Path||Value||Expected output||
|{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
|{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
|{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
|{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|




> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>
> At the moment, if trying to use the update record to update the elements of 
> an array it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|
> When elements of the array are records, it's possible to update fields of the 
> record but not the record itself as-is.
> Also in the MultiArrayIndexPath implementation, index of array elements is 
> not correctly provided. Because of that, wrong elements of the array could be 
> updated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4383) UpdateRecord - cannot update arrays elements

2017-10-13 Thread Pierre Villard (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard updated NIFI-4383:
-
Summary: UpdateRecord - cannot update arrays elements  (was: Fix 
UpdateRecord when updating arrays of simple fields)

> UpdateRecord - cannot update arrays elements
> 
>
> Key: NIFI-4383
> URL: https://issues.apache.org/jira/browse/NIFI-4383
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.3.0
>Reporter: Pierre Villard
>Assignee: Pierre Villard
>
> At the moment, if trying to use the update record to update an array of 
> simple fields (not records) it won't have any effect.
> Input:
> {noformat}
> {
>   "numbers" : [ 1, null, 4 ]
> }
> {noformat}
> Parameters:
> ||Path||Value||Expected output||
> |{{/numbers[*]}}|{{8}}|{{"numbers" : [ 8, 8, 8 ]}}|
> |{{/numbers[1]}}|{{8}}|{{"numbers" : [ 1, 8, 4 ]}}|
> |{{/numbers[0..1]}}|{{8}}|{{"numbers" : [ 8, 8, 4 ]}}|
> |{{/numbers[0,2]}}|{{8}}|{{"numbers" : [ 8, null, 8 ]}}|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4325) Create a new ElasticSearch processor that supports the JSON DSL

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203507#comment-16203507
 ] 

ASF GitHub Bot commented on NIFI-4325:
--

Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2113
  
@mattyb149 I'm going to leave this open, but I decided to refactor the heck 
out of it around a client service for ElasticSearch. The service only has one 
method for now, but I think it's the way to go so that in the future as 
services become injectable in scripts and such, it'll be more flexible.


> Create a new ElasticSearch processor that supports the JSON DSL
> ---
>
> Key: NIFI-4325
> URL: https://issues.apache.org/jira/browse/NIFI-4325
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike Thomsen
>Priority: Minor
>
> The existing ElasticSearch processors use the Lucene-style syntax for 
> querying, not the JSON DSL. A new processor is needed that can take a full 
> JSON query and execute it. It should also support aggregation queries in this 
> syntax. A user needs to be able to take a query as-is from Kibana and drop it 
> into NiFi and have it just run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] nifi issue #2113: NIFI-4325 Added new processor that uses the JSON DSL.

2017-10-13 Thread MikeThomsen
Github user MikeThomsen commented on the issue:

https://github.com/apache/nifi/pull/2113
  
@mattyb149 I'm going to leave this open, but I decided to refactor the heck 
out of it around a client service for ElasticSearch. The service only has one 
method for now, but I think it's the way to go so that in the future as 
services become injectable in scripts and such, it'll be more flexible.


---


[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144532208
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -66,42 +79,64 @@
 import org.apache.solr.common.SolrDocument;
 import org.apache.solr.common.SolrDocumentList;
 import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CursorMarkParams;
 
-@Tags({"Apache", "Solr", "Get", "Pull"})
+@Tags({"Apache", "Solr", "Get", "Pull", "Records"})
 @InputRequirement(Requirement.INPUT_FORBIDDEN)
-@CapabilityDescription("Queries Solr and outputs the results as a 
FlowFile")
+@CapabilityDescription("Queries Solr and outputs the results as a FlowFile 
in the format of XML or using a Record Writer")
+@Stateful(scopes = {Scope.LOCAL}, description = "Stores latest date of 
Date Field so that the same data will not be fetched multiple times.")
--- End diff --

GetSolr used to use local file to store lastEndDate. We need migration code 
so that lastEndDate to be taken over to managed state when there's no state but 
the lastEndDate file exists.


---


[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203452#comment-16203452
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144533090
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -66,42 +79,64 @@
 import org.apache.solr.common.SolrDocument;
 import org.apache.solr.common.SolrDocumentList;
 import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CursorMarkParams;
 
-@Tags({"Apache", "Solr", "Get", "Pull"})
+@Tags({"Apache", "Solr", "Get", "Pull", "Records"})
 @InputRequirement(Requirement.INPUT_FORBIDDEN)
-@CapabilityDescription("Queries Solr and outputs the results as a 
FlowFile")
+@CapabilityDescription("Queries Solr and outputs the results as a FlowFile 
in the format of XML or using a Record Writer")
+@Stateful(scopes = {Scope.LOCAL}, description = "Stores latest date of 
Date Field so that the same data will not be fetched multiple times.")
--- End diff --

State scope should be CLUSTER, I think. Also, capability description should 
mention that this processor is designed to run on Primary Node only. Please 
refer ListHDFS processor documentation.

Or does this processor work nicely in distributed fashion by utilizing 
multiple NiFi nodes against a Solr cluster?


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature 

[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144533090
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -66,42 +79,64 @@
 import org.apache.solr.common.SolrDocument;
 import org.apache.solr.common.SolrDocumentList;
 import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CursorMarkParams;
 
-@Tags({"Apache", "Solr", "Get", "Pull"})
+@Tags({"Apache", "Solr", "Get", "Pull", "Records"})
 @InputRequirement(Requirement.INPUT_FORBIDDEN)
-@CapabilityDescription("Queries Solr and outputs the results as a 
FlowFile")
+@CapabilityDescription("Queries Solr and outputs the results as a FlowFile 
in the format of XML or using a Record Writer")
+@Stateful(scopes = {Scope.LOCAL}, description = "Stores latest date of 
Date Field so that the same data will not be fetched multiple times.")
--- End diff --

State scope should be CLUSTER, I think. Also, capability description should 
mention that this processor is designed to run on Primary Node only. Please 
refer ListHDFS processor documentation.

Or does this processor work nicely in distributed fashion by utilizing 
multiple NiFi nodes against a Solr cluster?


---


[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203447#comment-16203447
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144532208
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -66,42 +79,64 @@
 import org.apache.solr.common.SolrDocument;
 import org.apache.solr.common.SolrDocumentList;
 import org.apache.solr.common.SolrInputDocument;
+import org.apache.solr.common.params.CursorMarkParams;
 
-@Tags({"Apache", "Solr", "Get", "Pull"})
+@Tags({"Apache", "Solr", "Get", "Pull", "Records"})
 @InputRequirement(Requirement.INPUT_FORBIDDEN)
-@CapabilityDescription("Queries Solr and outputs the results as a 
FlowFile")
+@CapabilityDescription("Queries Solr and outputs the results as a FlowFile 
in the format of XML or using a Record Writer")
+@Stateful(scopes = {Scope.LOCAL}, description = "Stores latest date of 
Date Field so that the same data will not be fetched multiple times.")
--- End diff --

GetSolr used to use local file to store lastEndDate. We need migration code 
so that lastEndDate to be taken over to managed state when there's no state but 
the lastEndDate file exists.


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real 

[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144527126
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/test/resources/solr/testCollection/conf/schema.xml
 ---
@@ -16,6 +16,16 @@
 
 
 
+
 
 
+
+
+
+
+
+
+
+id
--- End diff --

What if Solr doc doesn't have an uniqueKey? Does this processor still work 
without uniqueKey??


---


[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144526595
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrProcessor.java
 ---
@@ -275,7 +275,7 @@ protected final boolean isBasicAuthEnabled() {
 }
 
 @Override
-protected final Collection 
customValidate(ValidationContext context) {
+protected Collection 
customValidate(ValidationContext context) {
--- End diff --

Shouldn't we add another protected method to override at sub-classes?


---


[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203450#comment-16203450
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144526595
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/SolrProcessor.java
 ---
@@ -275,7 +275,7 @@ protected final boolean isBasicAuthEnabled() {
 }
 
 @Override
-protected final Collection 
customValidate(ValidationContext context) {
+protected Collection 
customValidate(ValidationContext context) {
--- End diff --

Shouldn't we add another protected method to override at sub-classes?


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real Time, but it's not real time. 
> This latency can be controlled by either on client side which requests the 
> update operation by specifying "commitWithin" parameter, or on the Solr 
> server side, "autoCommit" and "autoSoftCommit" in 
> [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits].
> Since commit and updating index can be costly, it's recommended to set this 
> interval long enough up to the maximum tolerable latency.
> However, this can be problematic with GetSolr. For instance, as shown in the 
> simple NiFi flow 

[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203448#comment-16203448
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144530918
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -172,157 +203,196 @@ protected void init(final 
ProcessorInitializationContext context) {
 
 @Override
 public void onPropertyModified(PropertyDescriptor descriptor, String 
oldValue, String newValue) {
-lastEndDatedRef.set(UNINITIALIZED_LAST_END_DATE_VALUE);
+clearState.set(true);
--- End diff --

Probably we'd like to clear state only when following properties get 
changed? It would be a bad UX if state is cleared when user re-configure batch 
size.
- SOLR_TYPE
- SOLR_LOCATION
- COLLECTION
- SOLR_QUERY
- DATE_FIELD
- RETURN_FIELDS



> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real Time, but it's not real time. 
> This latency can be controlled by either on client side which requests the 
> update operation by specifying "commitWithin" parameter, or on the Solr 
> server side, "autoCommit" and "autoSoftCommit" in 
> 

[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203449#comment-16203449
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144530989
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -138,10 +168,11 @@ protected void init(final 
ProcessorInitializationContext context) {
 descriptors.add(SOLR_TYPE);
 descriptors.add(SOLR_LOCATION);
 descriptors.add(COLLECTION);
+descriptors.add(RETURN_TYPE);
+descriptors.add(RECORD_WRITER);
 descriptors.add(SOLR_QUERY);
-descriptors.add(RETURN_FIELDS);
-descriptors.add(SORT_CLAUSE);
--- End diff --

Is it safe to remove an existing property? The existing code should not 
sort result anyway, or should store last sorted field value to paginate 
properly when docs with the same date split more than one page. So I think it's 
safe..


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real Time, but it's not real time. 
> This latency can be controlled by either on client side which requests the 
> update operation by specifying "commitWithin" parameter, or on the Solr 
> server side, "autoCommit" and "autoSoftCommit" in 
> 

[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203446#comment-16203446
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144527126
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/test/resources/solr/testCollection/conf/schema.xml
 ---
@@ -16,6 +16,16 @@
 
 
 
+
 
 
+
+
+
+
+
+
+
+id
--- End diff --

What if Solr doc doesn't have an uniqueKey? Does this processor still work 
without uniqueKey??


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real Time, but it's not real time. 
> This latency can be controlled by either on client side which requests the 
> update operation by specifying "commitWithin" parameter, or on the Solr 
> server side, "autoCommit" and "autoSoftCommit" in 
> [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits].
> Since commit and updating index can be costly, it's recommended to set this 
> interval long enough up to the maximum tolerable latency.
> However, this can be problematic with GetSolr. For instance, as shown in the 
> simple NiFi flow below, GetSolr can miss updated documents:
> {code}
> t1: GetSolr queried
> t2: 

[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents

2017-10-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16203451#comment-16203451
 ] 

ASF GitHub Bot commented on NIFI-3248:
--

Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144533800
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -172,157 +203,196 @@ protected void init(final 
ProcessorInitializationContext context) {
 
 @Override
 public void onPropertyModified(PropertyDescriptor descriptor, String 
oldValue, String newValue) {
-lastEndDatedRef.set(UNINITIALIZED_LAST_END_DATE_VALUE);
+clearState.set(true);
 }
 
-@OnStopped
-public void onStopped() {
-writeLastEndDate();
-}
+@OnScheduled
+public void onScheduled2(final ProcessContext context) throws 
IOException {
--- End diff --

Please change method name appropriately to represent what it does, such as 
`clearState`. The annotation explains when it's called.


> GetSolr can miss recently updated documents
> ---
>
> Key: NIFI-3248
> URL: https://issues.apache.org/jira/browse/NIFI-3248
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Extensions
>Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, 
> 1.0.1
>Reporter: Koji Kawamura
>Assignee: Johannes Peter
> Attachments: nifi-flow.png, query-result-with-curly-bracket.png, 
> query-result-with-square-bracket.png
>
>
> GetSolr holds the last query timestamp so that it only fetches documents 
> those have been added or updated since the last query.
> However, GetSolr misses some of those updated documents, and once the 
> documents date field value becomes older than last query timestamp, the 
> document won't be able to be queried by GetSolr any more.
> This JIRA is for tracking the process of investigating this behavior, and 
> discussion on them.
> Here are things that can be a cause of this behavior:
> |#|Short description|Should we address it?|
> |1|Timestamp range filter, curly or square bracket?|No|
> |2|Timezone difference between update and query|Additional docs might be 
> helpful|
> |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, 
> add 'commit lag-time'?|
> h2. 1. Timestamp range filter, curly or square bracket?
> At the first glance, using curly and square bracket in mix looked strange 
> ([source 
> code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]).
>  But these difference has a meaning.
> The square bracket on the range query is inclusive and the curly bracket is 
> exclusive. If we use inclusive on both sides and a document has a time stamp 
> exactly on the boundary then it could be returned in two consecutive 
> executions, and we only want it in one.
> This is intentional, and it should be as it is.
> h2. 2. Timezone difference between update and query
> Solr treats date fields as [UTC 
> representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|].
>  If date field String value of an updated document represents time without 
> timezone, and NiFi is running on an environment using timezone other than 
> UTC, GetSolr can't perform date range query as users expect.
> Let's say NiFi is running with JST(UTC+9). A process added a document to Solr 
> at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it 
> as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any 
> documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, 
> i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date 
> range filter.
> To avoid this, updated documents must have proper timezone in date field 
> string representation.
> If one uses NiFi expression language to set current timestamp to that date 
> field, following NiFi expression can be used:
> {code}
> ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")}
> {code}
> It will produce a result like:
> {code}
> 2016-12-27T15:30:04.895+0900
> {code}
> Then it will be indexed in Solr with UTC and will be queried by GetSolr as 
> expected.
> h2. 3. Lag comes from NearRealTIme nature of Solr
> Solr provides Near Real Time search capability, that means, the recently 
> updated documents can be queried in Near Real Time, but it's not real time. 
> This latency can be controlled by either on client side which requests the 
> update operation by specifying "commitWithin" parameter, or on the Solr 
> server side, "autoCommit" and "autoSoftCommit" in 
> 

[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144530989
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -138,10 +168,11 @@ protected void init(final 
ProcessorInitializationContext context) {
 descriptors.add(SOLR_TYPE);
 descriptors.add(SOLR_LOCATION);
 descriptors.add(COLLECTION);
+descriptors.add(RETURN_TYPE);
+descriptors.add(RECORD_WRITER);
 descriptors.add(SOLR_QUERY);
-descriptors.add(RETURN_FIELDS);
-descriptors.add(SORT_CLAUSE);
--- End diff --

Is it safe to remove an existing property? The existing code should not 
sort result anyway, or should store last sorted field value to paginate 
properly when docs with the same date split more than one page. So I think it's 
safe..


---


[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144530918
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -172,157 +203,196 @@ protected void init(final 
ProcessorInitializationContext context) {
 
 @Override
 public void onPropertyModified(PropertyDescriptor descriptor, String 
oldValue, String newValue) {
-lastEndDatedRef.set(UNINITIALIZED_LAST_END_DATE_VALUE);
+clearState.set(true);
--- End diff --

Probably we'd like to clear state only when following properties get 
changed? It would be a bad UX if state is cleared when user re-configure batch 
size.
- SOLR_TYPE
- SOLR_LOCATION
- COLLECTION
- SOLR_QUERY
- DATE_FIELD
- RETURN_FIELDS



---


[GitHub] nifi pull request #2199: NIFI-3248: Improvement of GetSolr Processor

2017-10-13 Thread ijokarumawak
Github user ijokarumawak commented on a diff in the pull request:

https://github.com/apache/nifi/pull/2199#discussion_r144533800
  
--- Diff: 
nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java
 ---
@@ -172,157 +203,196 @@ protected void init(final 
ProcessorInitializationContext context) {
 
 @Override
 public void onPropertyModified(PropertyDescriptor descriptor, String 
oldValue, String newValue) {
-lastEndDatedRef.set(UNINITIALIZED_LAST_END_DATE_VALUE);
+clearState.set(true);
 }
 
-@OnStopped
-public void onStopped() {
-writeLastEndDate();
-}
+@OnScheduled
+public void onScheduled2(final ProcessContext context) throws 
IOException {
--- End diff --

Please change method name appropriately to represent what it does, such as 
`clearState`. The annotation explains when it's called.


---


[jira] [Created] (NIFI-4485) It should be possible to boot the NiFi engine via the Java API

2017-10-13 Thread Peter Horvath (JIRA)
Peter Horvath created NIFI-4485:
---

 Summary: It should be possible to boot the NiFi engine via the 
Java API
 Key: NIFI-4485
 URL: https://issues.apache.org/jira/browse/NIFI-4485
 Project: Apache NiFi
  Issue Type: Improvement
Reporter: Peter Horvath


Class {{org.apache.nifi.NiFi}} was not designed with extensibility or 
programmatic access in mind.

This class is the entry point of the engine, however, the current 
implementation does not allow
a potential caller (e.g. an integration test harness) to bootstrap the engine 
and then shut it down properly. Please Change this so that a NiFi instance can 
be started via the Java API:   

Introduce a separate class, which allows the engine to be started in "embedded" 
mode, this should be basically an extension to the existing class 
{{org.apache.nifi.NiFi#NiFi}}, but with some enhancements: The constructor 
{{org.apache.nifi.NiFi#NiFi}} registers an {{UncaughtExceptionHandler}}, a JVM 
{{Shutdown Hook}} and changes logging framework settings. These should NOT 
happen in embedded mode; in addition to that, it should be possible to shut the 
engine down via the API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (NIFI-4424) org.apache.nifi.NiFi does not allow programmatic access to the NiFi engine

2017-10-13 Thread Peter Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/NIFI-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Horvath updated NIFI-4424:

Description: 
Class {{org.apache.nifi.NiFi}} was not designed with extensibility or 
programmatic access in mind.

This class is the entry point of the engine, however, the current 
implementation does not allow
a potential caller (e.g. an integration test harness) to bootstrap the engine 
and then shut it down properly:


The main method {{org.apache.nifi.NiFi#main}} simply logs any exception, which 
is fine
when started from the command line, however prevents programmatic usage and
detecting error conditions (Exceptions) that would be essential to 
programatically access 
it from an integration test.

The constructor {{org.apache.nifi.NiFi#NiFi}} registers an 
{{UncaughtExceptionHandler}}, 
a JVM {{Shutdown Hook}} and changes logging framework settings.

*Please change this behaviour:*

Expose *two* methods, one of which accepts the command line argument one would 
pass
to the NiFi process and another one, which allows the NiFiProperties object to 
be passed.
This method should return the {{NiFi}} object instance for further programmatic 
access.

The logic used to register {{UncaughtExceptionHandler}}, a JVM Shutdown Hook 
and 
changing logging framework settings should be extracted to a {{protected}} 
*instance*
method so that a client can override their behaviour with a NO-OP.

A second class called e.g. {{org.apache.nifi.EmbeddedNiFi}} could be introduced 
as
a base class for this use-case, where the engine is started through the Java 
API.

*Please note these changes are baby-steps towards the implementation of a 
NiFi integration test harness.*

  was:
Class {{org.apache.nifi.NiFi}} was not designed with extensibility or 
programmatic access in mind.

This class is the entry point of the engine, however, the current 
implementation does not allow
a potential caller (e.g. an integration test harness) to bootstrap the engine 
and then shut it down properly:


The main method {{org.apache.nifi.NiFi#main}} simply logs any exception, which 
is fine
when started from the command line, however prevents programmatic usage and
detecting error conditions (Exceptions) that would be essential to 
programatically access 
it from an integration test.

The constructor {{org.apache.nifi.NiFi#NiFi}} registers an 
{{UncaughtExceptionHandler}}, 
a JVM {{Shutdown Hook}} and changes logging framework settings.

*Please change this behaviour:*

Expose *two* methods, one of which accepts the command line argument one would 
pass
to the NiFi process and another one, which allows the NiFiProperties object to 
be passed.
This method should return the {{NiFi}} object instance for further programmatic 
access.

The logic used to register {{UncaughtExceptionHandler}}, a JVM Shutdown Hook 
and 
changing logging framework settings should be extracted to a {{protected}} 
*instance*
method so that a client can override their behaviour with a NO-OP.

*Please note these changes are baby-steps towards the implementation of a 
NiFi integration test harness.*


> org.apache.nifi.NiFi does not allow programmatic access to the NiFi engine
> --
>
> Key: NIFI-4424
> URL: https://issues.apache.org/jira/browse/NIFI-4424
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Affects Versions: 1.3.0
>Reporter: Peter Horvath
>
> Class {{org.apache.nifi.NiFi}} was not designed with extensibility or 
> programmatic access in mind.
> This class is the entry point of the engine, however, the current 
> implementation does not allow
> a potential caller (e.g. an integration test harness) to bootstrap the engine 
> and then shut it down properly:
> The main method {{org.apache.nifi.NiFi#main}} simply logs any exception, 
> which is fine
> when started from the command line, however prevents programmatic usage and
> detecting error conditions (Exceptions) that would be essential to 
> programatically access 
> it from an integration test.
> The constructor {{org.apache.nifi.NiFi#NiFi}} registers an 
> {{UncaughtExceptionHandler}}, 
> a JVM {{Shutdown Hook}} and changes logging framework settings.
> *Please change this behaviour:*
> Expose *two* methods, one of which accepts the command line argument one 
> would pass
> to the NiFi process and another one, which allows the NiFiProperties object 
> to be passed.
> This method should return the {{NiFi}} object instance for further 
> programmatic access.
> The logic used to register {{UncaughtExceptionHandler}}, a JVM Shutdown Hook 
> and 
> changing logging framework settings should be extracted to a {{protected}} 
> *instance*
> method so that a client can override their behaviour with a NO-OP.
> A second class called e.g.