[ https://issues.apache.org/jira/browse/NIFI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007716#comment-15007716 ]
ASF GitHub Bot commented on NIFI-1156: -------------------------------------- Github user olegz commented on a diff in the pull request: https://github.com/apache/nifi/pull/124#discussion_r45008293 --- Diff: nifi-nar-bundles/nifi-html-bundle/nifi-html-processors/pom.xml --- @@ -0,0 +1,59 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> + <modelVersion>4.0.0</modelVersion> + + <parent> + <groupId>org.apache.nifi</groupId> + <artifactId>nifi-html-bundle</artifactId> + <version>0.4.0-SNAPSHOT</version> + </parent> + + <artifactId>nifi-html-processors</artifactId> + <description>Support for parsing HTML documents</description> + + <dependencies> + <dependency> + <groupId>org.jsoup</groupId> + <artifactId>jsoup</artifactId> + <version>1.8.3</version> + </dependency> + <dependency> + <groupId>org.apache.nifi</groupId> + <artifactId>nifi-api</artifactId> + </dependency> + <dependency> + <groupId>org.apache.nifi</groupId> + <artifactId>nifi-processor-utils</artifactId> + </dependency> + <dependency> + <groupId>org.apache.nifi</groupId> + <artifactId>nifi-mock</artifactId> + <scope>test</scope> + </dependency> + <dependency> + <groupId>org.slf4j</groupId> + <artifactId>slf4j-simple</artifactId> + <scope>test</scope> + </dependency> + <dependency> + <groupId>junit</groupId> + <artifactId>junit</artifactId> + <version>4.11</version> --- End diff -- Does the main POM declares a version of JUnit 4.12? Can we use that one? > HTML Parsing Processors Bundle > ------------------------------ > > Key: NIFI-1156 > URL: https://issues.apache.org/jira/browse/NIFI-1156 > Project: Apache NiFi > Issue Type: New Feature > Components: Core Framework > Reporter: Jeremy Dyer > Priority: Minor > > NiFi provides the ability to ingest HTML but lacks the convenience to easily > interact with that HTML once it has entered the flow. There should be a HTML > Processing Bundle that provides mechanisms for manipulating and interacting > with HTML data once it has entered the flow. Jsoup http://jsoup.org/ seems > like a logical tool to use since it is mature and has a MIT license which > would allow it to be incorporated into NiFi. > “GetHTMLElement” should use the CSS selector-syntax > (http://www.w3schools.com/cssref/css_selectors.asp) built into Jsoup to > extract 0-N HTML elements from the original HTML input. This processor should > support a delimited string of selectors allowing the user to build compound > HTML element output. Each HTML element (or compound element result) extracted > will create a new Flowfile where the element will be in either the Flowfile > content or an attribute depending on the user configuration. > “ModifyHTMLElement” should provide the ability to modify the original input > HTML and overwrite any existing element values. The HTML element that will be > modified can be selected by using the CSS selector-syntax > “PutHTMLElement” should provide the ability to put a new HTML element > anywhere in the original input HTML using CSS selector-syntax to indicate the > position that the new HTML element should be placed. > There seems to be a potential for adding more processors but this seems like > a good start. Since there is a dependency on Jsoup and a potential for more > processors to come I think it makes sense to add this logic as its own nar > bundle but I could be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)