[GitHub] [nifi] emiliosetiadarma opened a new pull request, #7269: NIFI-11549: implemented AzureQueueStorage_v12 processors

2023-05-18 Thread via GitHub


emiliosetiadarma opened a new pull request, #7269:
URL: https://github.com/apache/nifi/pull/7269

   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-11549](https://issues.apache.org/jira/browse/NIFI-11549)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [x] Build completed using `mvn clean install -P contrib-check`
 - [x] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-11552) Support FlowFile attributes in PutIceberg's Table Name property

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-11552:

Status: Patch Available  (was: In Progress)

> Support FlowFile attributes in PutIceberg's Table Name property
> ---
>
> Key: NIFI-11552
> URL: https://issues.apache.org/jira/browse/NIFI-11552
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> The documentation for PutIceberg's Table Name property says it doesn’t 
> support any Expression Language but the code calls the evaluate method on the 
> property without passing in a FlowFile, so at the very least it supports 
> Variable Registry. this Jira proposes to add EL support including the 
> FlowFile attributes for the Table Name property and update the documentation 
> to reflect the new behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] mattyb149 opened a new pull request, #7268: NIFI-11552: Support FlowFile Attributes in some PutIceberg proeprties

2023-05-18 Thread via GitHub


mattyb149 opened a new pull request, #7268:
URL: https://github.com/apache/nifi/pull/7268

   # Summary
   
   [NIFI-11552](https://issues.apache.org/jira/browse/NIFI-11552) This PR adds 
support for FlowFile attributes in Expression Language for such PutIceberg 
properties as Catalog Name and Table Name.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-11552`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-11552`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [x] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (NIFI-11552) Support FlowFile attributes in PutIceberg's Table Name property

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-11552:
---

Assignee: Matt Burgess

> Support FlowFile attributes in PutIceberg's Table Name property
> ---
>
> Key: NIFI-11552
> URL: https://issues.apache.org/jira/browse/NIFI-11552
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> The documentation for PutIceberg's Table Name property says it doesn’t 
> support any Expression Language but the code calls the evaluate method on the 
> property without passing in a FlowFile, so at the very least it supports 
> Variable Registry. this Jira proposes to add EL support including the 
> FlowFile attributes for the Table Name property and update the documentation 
> to reflect the new behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198284268


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+ 

[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198279649


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+

[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198272978


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+ 

[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198269634


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/ExcelReader.java:
##
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.excel;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.context.PropertyContext;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaAccessStrategy;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.schema.inference.InferSchemaAccessStrategy;
+import org.apache.nifi.schema.inference.RecordSourceFactory;
+import org.apache.nifi.schema.inference.SchemaInferenceEngine;
+import org.apache.nifi.schema.inference.SchemaInferenceUtil;
+import org.apache.nifi.schema.inference.TimeValueInference;
+import org.apache.nifi.schemaregistry.services.SchemaRegistry;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.stream.io.NonCloseableInputStream;
+import org.apache.poi.ss.usermodel.Row;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicReferenceArray;
+import java.util.stream.IntStream;
+
+@Tags({"excel", "spreadsheet", "xlsx", "parse", "record", "row", "reader", 
"values", "cell"})
+@CapabilityDescription("Parses a Microsoft Excel document returning each row 
in each sheet as a separate record. "
++ "This reader allows for inferring a schema either based on the first 
line of an Excel sheet if a 'header line' is "
++ "present or from all the desired sheets, or providing an explicit 
schema "
++ "for interpreting the values. See Controller Service's Usage for 
further documentation. "
++ "This reader is currently only capable of processing .xlsx "
++ "(XSSF 2007 OOXML file format) Excel documents and not older .xls 
(HSSF '97(-2007) file format) documents.)")
+public class ExcelReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+private static final AllowableValue HEADER_DERIVED = new 
AllowableValue("excel-header-derived", "Use fields From Header",
+"The first chosen row of the Excel sheet is a header row that 
contains the columns representative of all the rows " +
+"in the desired sheets. The schema will be derived by 
using those columns in the header.");
+public static final PropertyDescriptor DESIRED_SHEETS = new 
PropertyDescriptor
+.Builder().name("extract-sheets")
+.displayName("Sheets to Extract")
+.description("Comma separated list of Excel document sheet names 
whose rows should be extracted from the excel document. If this property" +
+" is left blank then all the rows from all the sheets will 
be extracted from the Excel document. The list of names is case in-sensitive. 
Any sheets not" +
+" specified in this value will be ignored. A bulletin will 
be generated if a specified sheet(s) are not found.")
+.required(false)
+
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+  

[jira] [Updated] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-5151:
---
Fix Version/s: 2.0.0
   1.22.0
   (was: 1.latest)
   (was: 2.latest)
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Fix For: 2.0.0, 1.22.0
>
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724010#comment-17724010
 ] 

ASF subversion and git services commented on NIFI-5151:
---

Commit 6c70471cc6fefe00b9a69a6eeba8dbd9f0a5c7aa in nifi's branch 
refs/heads/main from Lehel Boér
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=6c70471cc6 ]

NIFI-5151: Add UPSERT support for Apache Phoenix

Signed-off-by: Matthew Burgess 

This closes #7263


> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Fix For: 1.latest, 2.latest
>
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724009#comment-17724009
 ] 

ASF subversion and git services commented on NIFI-5151:
---

Commit 4fa47ecc2ad02fe266bcfee38b954d00e7aeddf8 in nifi's branch 
refs/heads/support/nifi-1.x from Lehel Boér
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=4fa47ecc2a ]

NIFI-5151: Add UPSERT support for Apache Phoenix

Signed-off-by: Matthew Burgess 


> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Fix For: 1.latest, 2.latest
>
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198252637


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/ExcelReader.java:
##
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.excel;
+
+import org.apache.commons.lang3.StringUtils;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnEnabled;
+import org.apache.nifi.components.AllowableValue;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.context.PropertyContext;
+import org.apache.nifi.controller.ConfigurationContext;
+import org.apache.nifi.expression.ExpressionLanguageScope;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.schema.access.SchemaAccessStrategy;
+import org.apache.nifi.schema.access.SchemaNotFoundException;
+import org.apache.nifi.schema.inference.InferSchemaAccessStrategy;
+import org.apache.nifi.schema.inference.RecordSourceFactory;
+import org.apache.nifi.schema.inference.SchemaInferenceEngine;
+import org.apache.nifi.schema.inference.SchemaInferenceUtil;
+import org.apache.nifi.schema.inference.TimeValueInference;
+import org.apache.nifi.schemaregistry.services.SchemaRegistry;
+import org.apache.nifi.serialization.DateTimeUtils;
+import org.apache.nifi.serialization.MalformedRecordException;
+import org.apache.nifi.serialization.RecordReader;
+import org.apache.nifi.serialization.RecordReaderFactory;
+import org.apache.nifi.serialization.SchemaRegistryService;
+import org.apache.nifi.serialization.record.RecordSchema;
+import org.apache.nifi.stream.io.NonCloseableInputStream;
+import org.apache.poi.ss.usermodel.Row;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicReferenceArray;
+import java.util.stream.IntStream;
+
+@Tags({"excel", "spreadsheet", "xlsx", "parse", "record", "row", "reader", 
"values", "cell"})
+@CapabilityDescription("Parses a Microsoft Excel document returning each row 
in each sheet as a separate record. "
++ "This reader allows for inferring a schema either based on the first 
line of an Excel sheet if a 'header line' is "
++ "present or from all the desired sheets, or providing an explicit 
schema "
++ "for interpreting the values. See Controller Service's Usage for 
further documentation. "
++ "This reader is currently only capable of processing .xlsx "
++ "(XSSF 2007 OOXML file format) Excel documents and not older .xls 
(HSSF '97(-2007) file format) documents.)")
+public class ExcelReader extends SchemaRegistryService implements 
RecordReaderFactory {
+
+private static final AllowableValue HEADER_DERIVED = new 
AllowableValue("excel-header-derived", "Use fields From Header",
+"The first chosen row of the Excel sheet is a header row that 
contains the columns representative of all the rows " +
+"in the desired sheets. The schema will be derived by 
using those columns in the header.");
+public static final PropertyDescriptor DESIRED_SHEETS = new 
PropertyDescriptor
+.Builder().name("extract-sheets")
+.displayName("Sheets to Extract")
+.description("Comma separated list of Excel document sheet names 
whose rows should be extracted from the excel document. If this property" +
+" is left blank then all the rows from all the sheets will 
be extracted from the Excel document. The list of names is case in-sensitive. 
Any sheets not" +
+" specified in this value will be ignored. A bulletin will 
be generated if a specified sheet(s) are not found.")
+.required(false)
+
.expressionLanguageSupported(ExpressionLanguageScope.FLOWFILE_ATTRIBUTES)
+.addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
+

[GitHub] [nifi] mattyb149 closed pull request #7263: NIFI-5151: Add UPSERT support for Apache Phoenix

2023-05-18 Thread via GitHub


mattyb149 closed pull request #7263: NIFI-5151: Add UPSERT support for Apache 
Phoenix
URL: https://github.com/apache/nifi/pull/7263


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-5151:
---
Fix Version/s: 1.latest
   2.latest

> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Fix For: 1.latest, 2.latest
>
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] mattyb149 commented on pull request #7263: NIFI-5151: Add UPSERT support for Apache Phoenix

2023-05-18 Thread via GitHub


mattyb149 commented on PR #7263:
URL: https://github.com/apache/nifi/pull/7263#issuecomment-1553571399

   +1 LGTM, tried on a live Phoenix/HBase instance, was able to add new rows as 
well as updating existing rows using UPSERT Thanks for the improvement! Merging 
to main


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198212598


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+

[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198210056


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/ExcelUtils.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.excel;
+
+import org.apache.poi.ss.usermodel.Row;
+
+public class ExcelUtils {

Review Comment:
   Thanks, it is fine for now, leaving it as is sounds good.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198208433


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/ExcelUtils.java:
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.excel;
+
+import org.apache.poi.ss.usermodel.Row;
+
+public class ExcelUtils {

Review Comment:
   @exceptionfactory What would you like me to do for this? Leave as is, 
declare the method static in one class and have the others use it or duplicate 
it for all 3 classes that use it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198205592


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+ 

[GitHub] [nifi] exceptionfactory commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


exceptionfactory commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198205061


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}

Review Comment:
   The problem with the warning is that it is not actionable in terms of flow 
handling. Using the `record.count` attribute provides the opportunity to warn 
if a flow configuration expects to see records in all cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-11567) GeoEnrichIP processors should auto-reload the database file

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-11567:

Status: Patch Available  (was: In Progress)

> GeoEnrichIP processors should auto-reload the database file
> ---
>
> Key: NIFI-11567
> URL: https://issues.apache.org/jira/browse/NIFI-11567
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> Currently the GeoEnrichIP processors only load the database when the 
> processor is scheduled. This requires a processor restart if the database 
> file changes. Instead, the processors should auto-reload the database file 
> when it detects a change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198175910


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}
+}
+return next;
+}
+
+private void setCurrent() {
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+
+currentSheet = null;
+currentRows = null;
+while (sheets.hasNext()) {
+currentSheet = sheets.next();
+if (isIterateOverAllSheets() || 
hasSheet(currentSheet.getSheetName())) {
+currentRows = currentSheet.iterator();
+currentRow = getNextRow();
+if (currentRow != null) {
+return;
+}
+}
+}
+}
+
+private Row getNextRow() {
+while (currentRows != null && !hasExhaustedRows()) {
+Row tempCurrentRow = currentRows.next();
+if (!isSkip(tempCurrentRow)) {
+return tempCurrentRow;
+}
+}
+return null;
+}
+
+private boolean hasExhaustedRows() {
+boolean exhausted = !currentRows.hasNext();
+if (log && exhausted) {
+logger.info("Exhausted all rows from sheet {}", 
currentSheet.getSheetName());
+}
+return exhausted;
+}
+
+private boolean isSkip(Row row) {
+return row.getRowNum() < firstRow;
+}
+
+private boolean isIterateOverAllSheets() {
+boolean iterateAllSheets = desiredSheets.isEmpty();
+if (iterateAllSheets && log) {
+logger.info("Advanced to sheet {}", currentSheet.getSheetName());
+}
+return iterateAllSheets;
+}
+
+private boolean hasSheet(String name) {
+boolean sheetByName = !desiredSheets.isEmpty()
+&& desiredSheets.keySet().stream()
+.anyMatch(desiredSheet -> desiredSheet.equalsIgnoreCase(name));
+if (sheetByName) {
+desiredSheets.put(name, Boolean.TRUE);
+}
+return sheetByName;
+}
+
+

[GitHub] [nifi] dan-s1 commented on a diff in pull request #7194: NIFI-11167 - Add Excel Record Reader

2023-05-18 Thread via GitHub


dan-s1 commented on code in PR #7194:
URL: https://github.com/apache/nifi/pull/7194#discussion_r1198172969


##
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/excel/RowIterator.java:
##
@@ -0,0 +1,155 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */package org.apache.nifi.excel;
+
+import com.github.pjfanning.xlsx.StreamingReader;
+import org.apache.nifi.logging.ComponentLog;
+import org.apache.poi.ss.usermodel.Row;
+import org.apache.poi.ss.usermodel.Sheet;
+import org.apache.poi.ss.usermodel.Workbook;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+public class RowIterator implements Iterator, Closeable {
+private final Workbook workbook;
+private final Iterator sheets;
+private Sheet currentSheet;
+private Iterator currentRows;
+private final Map desiredSheets;
+private final int firstRow;
+private ComponentLog logger;
+private boolean log;
+private Row currentRow;
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow) {
+this(in, desiredSheets, firstRow, null);
+}
+
+public RowIterator(InputStream in, List desiredSheets, int 
firstRow, ComponentLog logger) {
+this.workbook = StreamingReader.builder()
+.rowCacheSize(100)
+.bufferSize(4096)
+.open(in);
+this.sheets = this.workbook.iterator();
+this.desiredSheets = desiredSheets != null ? desiredSheets.stream()
+.collect(Collectors.toMap(key -> key, value -> Boolean.FALSE)) 
: new HashMap<>();
+this.firstRow = firstRow;
+this.logger = logger;
+this.log = logger != null;
+}
+
+@Override
+public boolean hasNext() {
+setCurrent();
+boolean next = currentRow != null;
+if(!next) {
+String sheetsNotFound = getSheetsNotFound(desiredSheets);
+if (!sheetsNotFound.isEmpty() && log) {
+logger.warn("Excel sheet(s) not found: {}", sheetsNotFound);
+}

Review Comment:
   I thought it may be useful for operators to know explicitly when a sheet did 
not exist. I personally think that is clearer than a record count which could 
be misleading. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] gresockj opened a new pull request, #7267: NIFI-11566: Adding updateTimeout argument to parameter commands in CLI

2023-05-18 Thread via GitHub


gresockj opened a new pull request, #7267:
URL: https://github.com/apache/nifi/pull/7267

   
   # Summary
   
   [NIFI-11566](https://issues.apache.org/jira/browse/NIFI-11566)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] mattyb149 commented on pull request #7263: NIFI-5151: Add UPSERT support for Apache Phoenix

2023-05-18 Thread via GitHub


mattyb149 commented on PR #7263:
URL: https://github.com/apache/nifi/pull/7263#issuecomment-1553405287

   Reviewing...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-11567) GeoEnrichIP processors should auto-reload the database file

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess updated NIFI-11567:

Fix Version/s: 1.latest
   2.latest

> GeoEnrichIP processors should auto-reload the database file
> ---
>
> Key: NIFI-11567
> URL: https://issues.apache.org/jira/browse/NIFI-11567
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> Currently the GeoEnrichIP processors only load the database when the 
> processor is scheduled. This requires a processor restart if the database 
> file changes. Instead, the processors should auto-reload the database file 
> when it detects a change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] mattyb149 opened a new pull request, #7266: NIFI-11567: Auto-reload database file in GeoEnrichIP processors

2023-05-18 Thread via GitHub


mattyb149 opened a new pull request, #7266:
URL: https://github.com/apache/nifi/pull/7266

   
   # Summary
   
   [NIFI-11567](https://issues.apache.org/jira/browse/NIFI-11567) This PR adds 
a SynchronousFileWatcher and retry logic to reload the specified database file 
if it has changed.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [x] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [x] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [x] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [x] Pull Request based on current revision of the `main` branch
   - [x] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [x] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-11557) Eliminate use of Files.walkFileTree for any performance-critical parts of application

2023-05-18 Thread Mark Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-11557:
--
Status: Patch Available  (was: Open)

> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -
>
> Key: NIFI-11557
> URL: https://issues.apache.org/jira/browse/NIFI-11557
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Extensions
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>  Labels: content-repo, content-repository, performance, slowness, 
> startup
> Fix For: 1.latest, 2.latest
>
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11557) Eliminate use of Files.walkFileTree for any performance-critical parts of application

2023-05-18 Thread Mark Payne (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Payne updated NIFI-11557:
--
Labels: content-repo content-repository performance slowness startup  (was: 
)

> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -
>
> Key: NIFI-11557
> URL: https://issues.apache.org/jira/browse/NIFI-11557
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Extensions
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
>  Labels: content-repo, content-repository, performance, slowness, 
> startup
> Fix For: 1.latest, 2.latest
>
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] markap14 opened a new pull request, #7265: NIFI-11557: Avoid using the expensive and unnecessary Files.walkFileT…

2023-05-18 Thread via GitHub


markap14 opened a new pull request, #7265:
URL: https://github.com/apache/nifi/pull/7265

   …ree on startup and initialization of Content Repository. Also performed 
some code cleanup: IntelliJ flagged many warnings in the class, mostly around 
methods that are no longer used and potential NullPointerExceptions, so those 
were cleaned up. Additionally, removed the nifi property for max flowfiles per 
claim - this property was never implemented. It was referenced, but the way in 
which is was used curiously had nothing to do with what the property was 
intended to be used for or for how it was documented. Instead, it was used to 
limit the max number of claims that could remain writable. As a result, it was 
removed.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-0](https://issues.apache.org/jira/browse/NIFI-0)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NIFI-4298) NiFi allows users to remove critical Attributes that are needed by processors.

2023-05-18 Thread Michael W Moser (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723953#comment-17723953
 ] 

Michael W Moser commented on NIFI-4298:
---

NIFI-8971 partially resolves this, by fixing the specific MergeContent problem 
in the description.

> NiFi allows users to remove critical Attributes that are needed by processors.
> --
>
> Key: NIFI-4298
> URL: https://issues.apache.org/jira/browse/NIFI-4298
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Affects Versions: 1.2.0
>Reporter: Matthew Clarke
>Priority: Major
>
> The UpdateAttribute processor provides users with the ability to provide a 
> "Delete Attributes Expression".   
> While FlowFile properties entryDate, lineageDate, fileSize, and uuid are 
> protected and can not be removed, FlowFile attributes path and filename can 
> be removed.
> Removal of these attributes has adverse affects on many other processors.  
> Any processor that will write a FlowFile out requires the filename attribute.
> In addition, I have found that the MergeContent processor (configured to use 
> FlowFileStreams as the merge strategy). also, for whatever reason, requires 
> that the path attribute exists on the FlowFile.
> If this attribute is missing, a NPE is thrown and the session is rolled back.
> 2017-08-14 19:27:00,156 ERROR [Timer-Driven Process Thread-7] 
> o.a.n.processors.standard.MergeContent 
> MergeContent[id=d7213e1f-0c03-1715-93cc-b1be9228ec36] Failed to process 
> bundle of 1 files due to java.lang.NullPointerException; rolling back 
> sessions: {}
> A stack trace is not produced even if DEBUG is enabled for this processor.
> NiFi needs to prevent users from being able to remove attributes which may be 
> "required" by other processors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-11557) Eliminate use of Files.walkFileTree for any performance-critical parts of application

2023-05-18 Thread Mark Payne (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-11557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723950#comment-17723950
 ] 

Mark Payne commented on NIFI-11557:
---

Looking further into this, I found that the logic that we have currently that 
scans through the content repo serves two purposes:
1. To count how many files are archived
2. To determine the timestamp of the oldest archived file.

The timestamp of the oldest archived file was to be used for performance gains, 
in order to determine that there are no files that need to be cleaned up due to 
time constraints and as a result don't bother scanning in the background.
Interestingly, this code was buggy - while it checked the last modified time of 
each file, it then compared it to the 'oldestTimestamp' but 'oldestTimestamp' 
was initialized to 0, which means that it would always remain 0. As a result, 
this code was very expensive and unneeded.

We only really need to count the number of files archived. This can be achieved 
MUCH more efficiently by simply performing a {{File.listFiles}} call on each 
archive directory. This will drastically improve startup performance in cases 
where there are millions of files archived.

> Eliminate use of Files.walkFileTree for any performance-critical parts of 
> application
> -
>
> Key: NIFI-11557
> URL: https://issues.apache.org/jira/browse/NIFI-11557
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework, Extensions
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Major
> Fix For: 1.latest, 2.latest
>
>
> The FileSystemRepository (content repo implementation) as well as ListFile 
> both make use of the {{Files.walkFileTree}} method. Recently, I worked with a 
> user who had horribly long startup times. Thread dumps show that the time was 
> almost entirely in the FileSystemRepository's {{initializeRepository}} method 
> as it is walking the file tree in order to determine which archive files can 
> be cleaned up next. This is done during startup and again periodically in 
> background threads.
> I made a small modification locally to instead use the standard synchronous 
> IO methods ( {{File.listFiles}} method. I used GenerateFlowFile to generate 
> 1-byte FlowFiles and set  {{nifi.content.claim.max.appendable.size=1 B}} in 
> nifi.properties in order to generate a huge number of files - about 1.2 
> million files in the content repository and restarted a few times. 
> Additionally, added some log lines to show how long this part of the startup 
> process took.
> With the existing code, startup took 210 seconds (3.5 mins). With the new 
> implementation, it took 6.7 seconds. The appears to be due to the fact that 
> when using NIO.2 for every file, it does an individual disk access to obtain 
> File attributes, while when using the {{File.listFiles}} method the File 
> objects that are returned already have the necessary attributes. As a result, 
> the NIO.2 approach makes millions of disk accesses that are unnecessary. As 
> the number of files in the repository grows, the discrepancy also grows.
> We need to eliminate any use of {{File.walkFileTree}} for any 
> performance-critical parts of the codebase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11568) Remove Apache DS Test Dependency

2023-05-18 Thread David Handermann (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-11568:

Status: Patch Available  (was: Open)

> Remove Apache DS Test Dependency
> 
>
> Key: NIFI-11568
> URL: https://issues.apache.org/jira/browse/NIFI-11568
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions, NiFi Registry
>Reporter: David Handermann
>Assignee: David Handermann
>Priority: Minor
> Fix For: 1.latest, 2.latest
>
>
> With recent refactoring of LDAP test cases, the Apache DS dependency is no 
> longer used and should be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] exceptionfactory commented on pull request #7257: NIFI-11555-Upgrade-apacheds-all-to-2.0.0.AM26

2023-05-18 Thread via GitHub


exceptionfactory commented on PR #7257:
URL: https://github.com/apache/nifi/pull/7257#issuecomment-1553242827

   Thanks for your work on this @jbalchan. It appears that more recent versions 
of the Apache DS All JAR introduced issues with manifest signatures. On further 
investigation, the library is no longer used, so I created a new Jira issue and 
a separate pull request to remove the references. This was helpful for 
highlighting the opportunity to remove the dependency.
   
   Closing in favor of #7264.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (NIFI-11555) Upgrade apacheds-all to 2.0.0.AM26

2023-05-18 Thread David Handermann (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann resolved NIFI-11555.
-
  Assignee: David Handermann
Resolution: Workaround

Recent LDAP test refactoring removed the need for the Apache DS dependency, so 
this issue is superceded by NIFI-11568.

> Upgrade apacheds-all to 2.0.0.AM26
> --
>
> Key: NIFI-11555
> URL: https://issues.apache.org/jira/browse/NIFI-11555
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike R
>Assignee: David Handermann
>Priority: Major
>
> Upgrade apacheds-all to 2.0.0.AM26



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11555) Upgrade apacheds-all to 2.0.0.AM26

2023-05-18 Thread David Handermann (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann updated NIFI-11555:

Affects Version/s: (was: 1.21.0)

> Upgrade apacheds-all to 2.0.0.AM26
> --
>
> Key: NIFI-11555
> URL: https://issues.apache.org/jira/browse/NIFI-11555
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike R
>Priority: Major
>
> Upgrade apacheds-all to 2.0.0.AM26



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] exceptionfactory opened a new pull request, #7264: NIFI-11568 Remove Apache DS Test Dependency

2023-05-18 Thread via GitHub


exceptionfactory opened a new pull request, #7264:
URL: https://github.com/apache/nifi/pull/7264

   # Summary
   
   [NIFI-11568](https://issues.apache.org/jira/browse/NIFI-11568) Removes the 
Apache Directory Server test dependency from Registry and NiFi LDAP Provider 
modules. Recent test refactoring using the Unbound library eliminated the need 
for Apache DS.
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [X] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [X] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [X] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [X] Pull Request based on current revision of the `main` branch
   - [X] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (NIFI-11568) Remove Apache DS Test Dependency

2023-05-18 Thread David Handermann (Jira)
David Handermann created NIFI-11568:
---

 Summary: Remove Apache DS Test Dependency
 Key: NIFI-11568
 URL: https://issues.apache.org/jira/browse/NIFI-11568
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions, NiFi Registry
Reporter: David Handermann
Assignee: David Handermann
 Fix For: 1.latest, 2.latest


With recent refactoring of LDAP test cases, the Apache DS dependency is no 
longer used and should be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] exceptionfactory commented on pull request #7231: [NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when either outputting to a flow file or an attribute.

2023-05-18 Thread via GitHub


exceptionfactory commented on PR #7231:
URL: https://github.com/apache/nifi/pull/7231#issuecomment-1553212554

   > @exceptionfactory After much deliberation we realized you were right that 
NIFI has other processors to handle the JSON transformations needed. We would 
still like though the ability to handle attributes as nested objects as you 
suggested:
   > 
   > > If we go forward with this change, it seems better to have a simple 
property like JSON Handling Strategy with values of Escaped String or Nested 
Object. That way, anything detected as JSON would be treated the same way, 
without the potential complexity of pattern-matching on FlowFile attribute 
names.
   > 
   > This would clearly highlight the fact that the processor can handle nested 
objects. Is that still okay?
   
   Thanks for evaluating the options. Yes, I think the Handling Strategy 
approach with the two options provides clarity on both the implementation side 
and the usability side. If you can make those adjustments to the PR, that would 
be great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] mosermw commented on pull request #6560: NIFI-10676 USE_SPECIFIED_OR_COMPATIBLE_OR_GHOST on flow load from bytes

2023-05-18 Thread via GitHub


mosermw commented on PR #6560:
URL: https://github.com/apache/nifi/pull/6560#issuecomment-1553196696

   @genehynson @mh013370 If there is still interest in this PR, then the Admin 
Guide should be updated to describe this new property and its options, while 
stressing that changing the default is an experimental and risky feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread Jira


[ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723931#comment-17723931
 ] 

Lehel Boér commented on NIFI-5151:
--

new PR created: https://github.com/apache/nifi/pull/7263

> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] Lehel44 opened a new pull request, #7263: NIFI-5151: Add UPSERT support for Apache Phoenix

2023-05-18 Thread via GitHub


Lehel44 opened a new pull request, #7263:
URL: https://github.com/apache/nifi/pull/7263

   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-5151](https://issues.apache.org/jira/browse/NIFI-5151)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (NIFI-5151) Patch Nifi with Upsert functions for PutDatabaseRecord processor

2023-05-18 Thread Jira


 [ 
https://issues.apache.org/jira/browse/NIFI-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lehel Boér reassigned NIFI-5151:


Assignee: Lehel Boér

> Patch Nifi with Upsert functions for PutDatabaseRecord processor
> 
>
> Key: NIFI-5151
> URL: https://issues.apache.org/jira/browse/NIFI-5151
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Affects Versions: 1.7.0
>Reporter: Karl Amundsson
>Assignee: Lehel Boér
>Priority: Major
>  Labels: Processor
> Attachments: 
> 0001-NIFI-5151-Adding-support-for-UPSERT-in-PutDatabaseRe.patch, 
> 0001-NIFI-5151-Using-DatabaseAdapter-to-generate-INSERT-S.patch
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Since Phoenix doesn't support the SQL statement INSERT you have to use a 
> process like: ConvertAttributesToJSON->ConvertJSONToSQL in Insert 
> mode->ReplaceText to replace "INSERT" with "UPSERT" -> PutSQL (See: 
> [https://community.hortonworks.com/questions/40561/nifi-phoenix-processor.html)]
> With this patch you can choose to use UPSERT directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] dan-s1 commented on pull request #7231: [NIFI-2964] Added ability for AttributeToJSON to handle nested JSON when either outputting to a flow file or an attribute.

2023-05-18 Thread via GitHub


dan-s1 commented on PR #7231:
URL: https://github.com/apache/nifi/pull/7231#issuecomment-1553177346

   @exceptionfactory After much deliberation we realized you were right that 
NIFI has other processors to handle the JSON transformations needed. We would 
still like though the ability to handle attributes as nested objects as you 
suggested:
   
   > If we go forward with this change, it seems better to have a simple 
property like JSON Handling Strategy with values of Escaped String or Nested 
Object. That way, anything detected as JSON would be treated the same way, 
without the potential complexity of pattern-matching on FlowFile attribute 
names.
   
   This would clearly highlight the fact that the processor can handle nested 
objects. Is that still okay?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (NIFI-11567) GeoEnrichIP processors should auto-reload the database file

2023-05-18 Thread Matt Burgess (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Burgess reassigned NIFI-11567:
---

Assignee: Matt Burgess

> GeoEnrichIP processors should auto-reload the database file
> ---
>
> Key: NIFI-11567
> URL: https://issues.apache.org/jira/browse/NIFI-11567
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: Matt Burgess
>Priority: Major
>
> Currently the GeoEnrichIP processors only load the database when the 
> processor is scheduled. This requires a processor restart if the database 
> file changes. Instead, the processors should auto-reload the database file 
> when it detects a change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-11567) GeoEnrichIP processors should auto-reload the database file

2023-05-18 Thread Matt Burgess (Jira)
Matt Burgess created NIFI-11567:
---

 Summary: GeoEnrichIP processors should auto-reload the database 
file
 Key: NIFI-11567
 URL: https://issues.apache.org/jira/browse/NIFI-11567
 Project: Apache NiFi
  Issue Type: Improvement
  Components: Extensions
Reporter: Matt Burgess


Currently the GeoEnrichIP processors only load the database when the processor 
is scheduled. This requires a processor restart if the database file changes. 
Instead, the processors should auto-reload the database file when it detects a 
change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-11566) CLI set-param command can timeout in some cases

2023-05-18 Thread Joe Gresock (Jira)
Joe Gresock created NIFI-11566:
--

 Summary: CLI set-param command can timeout in some cases
 Key: NIFI-11566
 URL: https://issues.apache.org/jira/browse/NIFI-11566
 Project: Apache NiFi
  Issue Type: Bug
  Components: Tools and Build
Affects Versions: 1.21.0
Reporter: Joe Gresock
Assignee: Joe Gresock


In cases where controller services take a long time to disable/enable, the NIFI 
CLI command 'set-param' can timeout, causing problems for client code.  This 
command currently has a hard-coded 60-second timeout, which needs to be 
configurable.  An update timeout should be added as an argument to the command.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (NIFI-11562) ValidateRecord doesn't work correctly; routes all to 'valid'

2023-05-18 Thread crissaegrim (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

crissaegrim resolved NIFI-11562.

Resolution: Not A Bug

Oops, was using 'reader's schema' and not writer's schema.  Closing.

> ValidateRecord doesn't work correctly; routes all to 'valid'
> 
>
> Key: NIFI-11562
> URL: https://issues.apache.org/jira/browse/NIFI-11562
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.20.0
> Environment: linux, docker
>Reporter: crissaegrim
>Priority: Major
> Attachments: image-2023-05-17-17-32-04-404.png
>
>
> Replicate this with.  Expecting `age` field (int in schema) to cause failure. 
>  But it's not failing.
>  # Generate Flow File
> id,name,age,employer
> 1,joe,30,mlp
> 2,bob,thirty,google
> 3,linda,32.123,yahoo
> 4,anne,31,
>  # ValidateRecord
>  # Connect `invalid` and `valid` to downstream to visualize
> Validate against this avro schema
>  
> protocol Test {
>     record TestEmployer {
>         int id;
>         string name;
>         int age;
>         string? employer = null;
>     }
> }
> {
>   "type" : "record",
>   "name" : "TestEmployer",
>   "fields" : [ {
>     "name" : "id",
>     "type" : "int"
>   }, {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "age",
>     "type" : "int"
>   }, {
>     "name" : "employer",
>     "type" : [ "null", "string" ],
>     "default" : null
>   } ]
> }
> See that all records are routed to `valid`.  Why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11565) Upgrade Jackson-databind to 2.15.1

2023-05-18 Thread Mike R (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike R updated NIFI-11565:
--
Affects Version/s: 1.21.0

> Upgrade Jackson-databind to 2.15.1
> --
>
> Key: NIFI-11565
> URL: https://issues.apache.org/jira/browse/NIFI-11565
> Project: Apache NiFi
>  Issue Type: Improvement
>Affects Versions: 1.21.0
>Reporter: Mike R
>Assignee: Mike R
>Priority: Minor
>
> Upgrade Jackson-databind to 2.15.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (NIFI-11565) Upgrade Jackson-databind to 2.15.1

2023-05-18 Thread Mike R (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike R updated NIFI-11565:
--
Priority: Minor  (was: Major)

> Upgrade Jackson-databind to 2.15.1
> --
>
> Key: NIFI-11565
> URL: https://issues.apache.org/jira/browse/NIFI-11565
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike R
>Assignee: Mike R
>Priority: Minor
>
> Upgrade Jackson-databind to 2.15.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (NIFI-11565) Upgrade Jackson-databind to 2.15.1

2023-05-18 Thread Mike R (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike R reassigned NIFI-11565:
-

Assignee: Mike R

> Upgrade Jackson-databind to 2.15.1
> --
>
> Key: NIFI-11565
> URL: https://issues.apache.org/jira/browse/NIFI-11565
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike R
>Assignee: Mike R
>Priority: Major
>
> Upgrade Jackson-databind to 2.15.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-11565) Upgrade Jackson-databind to 2.15.1

2023-05-18 Thread Mike R (Jira)
Mike R created NIFI-11565:
-

 Summary: Upgrade Jackson-databind to 2.15.1
 Key: NIFI-11565
 URL: https://issues.apache.org/jira/browse/NIFI-11565
 Project: Apache NiFi
  Issue Type: Improvement
Reporter: Mike R


Upgrade Jackson-databind to 2.15.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] mr1716 opened a new pull request, #7262: NIFI-11564 Update msal4j to 1.13.8

2023-05-18 Thread via GitHub


mr1716 opened a new pull request, #7262:
URL: https://github.com/apache/nifi/pull/7262

   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-11564](https://issues.apache.org/jira/browse/NIFI-11564)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [X] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI-11564) 
issue created
   
   ### Pull Request Tracking
   
   - [X] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [X] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [X] Pull Request based on current revision of the `main` branch
   - [X] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (NIFI-11564) Update msal4j to 1.13.8

2023-05-18 Thread Mike R (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike R reassigned NIFI-11564:
-

Assignee: Mike R

> Update msal4j to 1.13.8
> ---
>
> Key: NIFI-11564
> URL: https://issues.apache.org/jira/browse/NIFI-11564
> Project: Apache NiFi
>  Issue Type: Improvement
>Reporter: Mike R
>Assignee: Mike R
>Priority: Minor
>
> Update msal4j to 1.13.8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-11564) Update msal4j to 1.13.8

2023-05-18 Thread Mike R (Jira)
Mike R created NIFI-11564:
-

 Summary: Update msal4j to 1.13.8
 Key: NIFI-11564
 URL: https://issues.apache.org/jira/browse/NIFI-11564
 Project: Apache NiFi
  Issue Type: Improvement
Reporter: Mike R


Update msal4j to 1.13.8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi] gresockj opened a new pull request, #7261: NIFI-11563: Allowing source connectables to be restarted on new conne…

2023-05-18 Thread via GitHub


gresockj opened a new pull request, #7261:
URL: https://github.com/apache/nifi/pull/7261

   …ctions in the StandardVersionedComponentSynchronizer
   
   # Summary
   
   [NIFI-11563](https://issues.apache.org/jira/browse/NIFI-11563)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (NIFI-11562) ValidateRecord doesn't work correctly; routes all to 'valid'

2023-05-18 Thread crissaegrim (Jira)


 [ 
https://issues.apache.org/jira/browse/NIFI-11562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

crissaegrim updated NIFI-11562:
---
Summary: ValidateRecord doesn't work correctly; routes all to 'valid'  
(was: ValidateRecord doesn't work correctly; routes all to 'valid' for invalid 
records)

> ValidateRecord doesn't work correctly; routes all to 'valid'
> 
>
> Key: NIFI-11562
> URL: https://issues.apache.org/jira/browse/NIFI-11562
> Project: Apache NiFi
>  Issue Type: Bug
>Affects Versions: 1.20.0
> Environment: linux, docker
>Reporter: crissaegrim
>Priority: Major
> Attachments: image-2023-05-17-17-32-04-404.png
>
>
> Replicate this with.  Expecting `age` field (int in schema) to cause failure. 
>  But it's not failing.
>  # Generate Flow File
> id,name,age,employer
> 1,joe,30,mlp
> 2,bob,thirty,google
> 3,linda,32.123,yahoo
> 4,anne,31,
>  # ValidateRecord
>  # Connect `invalid` and `valid` to downstream to visualize
> Validate against this avro schema
>  
> protocol Test {
>     record TestEmployer {
>         int id;
>         string name;
>         int age;
>         string? employer = null;
>     }
> }
> {
>   "type" : "record",
>   "name" : "TestEmployer",
>   "fields" : [ {
>     "name" : "id",
>     "type" : "int"
>   }, {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "age",
>     "type" : "int"
>   }, {
>     "name" : "employer",
>     "type" : [ "null", "string" ],
>     "default" : null
>   } ]
> }
> See that all records are routed to `valid`.  Why?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (NIFI-11563) Versioned component synchronizer does not stop source connectable for new connection

2023-05-18 Thread Joe Gresock (Jira)
Joe Gresock created NIFI-11563:
--

 Summary: Versioned component synchronizer does not stop source 
connectable for new connection
 Key: NIFI-11563
 URL: https://issues.apache.org/jira/browse/NIFI-11563
 Project: Apache NiFi
  Issue Type: Bug
  Components: Core Framework
Affects Versions: 1.21.0
Reporter: Joe Gresock
Assignee: Joe Gresock


Although this cannot be reproduced directly in NiFi, if any external tools use 
the StandardVersionedComponentSynchronizer to synchronize a flow, then adding a 
connection with an already-running source throws an exception.  This is because 
although upstream connections are restarted for existing connections, they are 
not restarted for new connections.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [nifi-minifi-cpp] adamdebreceni opened a new pull request, #1576: MINIFICPP-2121 - Use std::atomic_flag instead of semaphore

2023-05-18 Thread via GitHub


adamdebreceni opened a new pull request, #1576:
URL: https://github.com/apache/nifi-minifi-cpp/pull/1576

   Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
   
   In order to streamline the review of the contribution we ask you
   to ensure the following steps have been taken:
   
   ### For all changes:
   - [ ] Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
   
   - [ ] Does your PR title start with MINIFICPP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
   
   - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically main)?
   
   - [ ] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the LICENSE file?
   - [ ] If applicable, have you updated the NOTICE file?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which 
it is rendered?
   
   ### Note:
   Please ensure that once the PR is submitted, you check GitHub Actions CI 
results for build issues and submit an update to your PR as soon as possible.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [nifi] simonbence opened a new pull request, #7260: NIFI-11559 Increase poll time in test to avoid breaking test on slowe…

2023-05-18 Thread via GitHub


simonbence opened a new pull request, #7260:
URL: https://github.com/apache/nifi/pull/7260

   …r environments
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   # Summary
   
   [NIFI-0](https://issues.apache.org/jira/browse/NIFI-0)
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 11
 - [ ] JDK 17
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org