[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread jskora
GitHub user jskora opened a pull request:

https://github.com/apache/nifi/pull/556

NIFI-615 - Create a processor to extract WAV file characteristics

* Create new ExtractMediaMetadata processor using Apache Tika Detector and 
Parser.
* Refactored nifi-image-bundle, nifi-image-nar, and nifi-image-processors 
to nifi-media-bundle, nifi-media-nar, and nifi-media-processors to reflect 
broader media related purpose.
* Preserved existing ExtractImageMetadata and ResizeImage processors as 
well as existing ImageViewerController components to prevent impact on existing 
uses.
* Resolved collision between ExtractImage and ExtractMedia processors due 
to common dependency on Noakes' Metadata Extractor project.
  - Updated bundle's Tika dependency from 1.7 to 1.8 and Drew Noakes' 
Metadata Extractor from 2.7.2 to 2.8.0.
  - Adjusted ExtractImageMetadata tests for enhanced attribute names in new 
Noakes' Metadata Extractor version.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jskora/nifi NIFI-615-v2b

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/556.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #556


commit 236266c9e4ed89b4c78438f36408b5f6e0b0c488
Author: Joe Skora 
Date:   2016-02-26T20:33:40Z

NIFI-615 - Create a processor to extract WAV file characteristics.
* Create new ExtractMediaMetadata processor using Apache Tika Detector and 
Parser.
* Refactored nifi-image-bundle, nifi-image-nar, and nifi-image-processors 
to nifi-media-bundle, nifi-media-nar, and nifi-media-processors to reflect 
broader media related purpose.
* Preserved existing ExtractImageMetadata and ResizeImage processors as 
well as existing ImageViewerController components to prevent impact on existing 
uses.
* Resolved collision between ExtractImage and ExtractMedia processors due 
to common dependency on Noakes' Metadata Extractor project.
  - Updated bundle's Tika dependency from 1.7 to 1.8 and Drew Noakes' 
Metadata Extractor from 2.7.2 to 2.8.0.
  - Adjusted ExtractImageMetadata tests for enhanced attribute names in new 
Noakes' Metadata Extractor version.

commit 8b06ae648b6b541735c950b200dd61a097a3dd4a
Author: Joe Skora 
Date:   2016-06-21T19:32:10Z

* Fix assembly POM to remove duplicate reference to site-to-site nar and 
change nifi-image-nar reference to nifi-media-nar.

commit 85387332e7a25ea5b264d070fadaac14543c9725
Author: Joe Skora 
Date:   2016-06-21T19:56:37Z

* Fix artifactId in media nar pom.

commit 5fc18b157d8b4d9cf617dcafadf5f62104214306
Author: Joe Skora 
Date:   2016-06-21T20:56:20Z

* Fix pom references within the nifi-media-bundle to match bundle rename.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67962838
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

Same comment as on PR-252: Should this be required? How many attributes 
could Tika potentially create?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67963044
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67963320
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67963557
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67964066
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
--- End diff --

Here it has "." making it seem like there 
is automatically a "." added but in the property descriptor it says the "." or 
"-" is not automatically added. I agree that we shouldn't lock the user into 
using ".", "-", etc. so this should be changed to reflect that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67964654
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

Since this is generically processing any media file, I don't have any idea 
how many attributes will be created.  This provides a safety net in case of the 
unexpected.


---
If your project is set up for it, you can reply to t

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67965642
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67965763
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67965881
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67966203
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread joewitt
Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67966608
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attri

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67967967
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67968149
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67969095
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/image/ExtractImageMetadataTest.java
 ---
@@ -37,7 +37,7 @@
 private static String BMP_HEADER = "BMP Header.";
 private static String JPEG_HEADER = "JPEG.";
 private static String GIF_HEADER = "GIF Header.";
-private static String PNG_HEADER = "PNG.";
+private static String PNG_HEADER = "PNG-";
--- End diff --

Did the PNG header change when changing version of metadata extractor? If 
so, does that constitute breaking backwards compatibility?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67969885
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/media/TestExtractMediaMetadata.java
 ---
@@ -0,0 +1,450 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.util.MockFlowFile;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class TestExtractMediaMetadata {
+
+@Test
+public void testProperties() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Map propertyValues = 
context.getProperties();
+assertEquals(6, propertyValues.size());
+}
+
+@Test
+public void testRelationships() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Set relationships = 
context.getAvailableRelationships();
+assertEquals(2, relationships.size());
+assertTrue(relationships.contains(ExtractMediaMetadata.SUCCESS));
+assertTrue(relationships.contains(ExtractMediaMetadata.FAILURE));
+}
+
+@Test
+public void testTextBytes() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FILTER, "");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_PREFIX, 
"txt.");
+runner.assertValid();
+
+final Map attrs = new HashMap<>();
+attrs.put("filename", "test1.txt");
+runner.enqueue("test1".getBytes(), attrs);
+runner.run();
+
+runner.assertAllFlowFilesTransferred(ExtractMediaMetadata.SUCCESS, 
1);
+runner.assertTransferCount(ExtractMediaMetadata.FAILURE, 0);
+
+final List successFiles = 
runner.getFlowFilesForRelationship(ExtractMediaMetadata.SUCCESS);
+MockFlowFile flowFile0 = successFiles.get(0);
+flowFile0.assertAttributeExists("filename");
+flowFile0.assertAttributeEquals("filename", "test1.txt");
+flowFile0.assertAttributeExists("txt.Content-Type");
+
assertTrue(flowFile0.getAttribute("txt.Content-Type").startsWith("text/plain"));
+flowFile0.assertAttributeExists("txt.X-Parsed-By");
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.DefaultParser"));
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.txt.TXTParser"));
+flowFile0.assertAttributeExists("txt.Content-Encoding");
+flowFile0.assertAttributeEquals("txt.Content-Encoding", 
"ISO-8859-1");
+flowFile0.assertContentEquals("test1".getBytes("UTF-8"));
+}
+
+@Test
+public void testNoFlowFile() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FIL

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67982895
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-21 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r67982943
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68039891
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041058
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/image/ExtractImageMetadataTest.java
 ---
@@ -37,7 +37,7 @@
 private static String BMP_HEADER = "BMP Header.";
 private static String JPEG_HEADER = "JPEG.";
 private static String GIF_HEADER = "GIF Header.";
-private static String PNG_HEADER = "PNG.";
+private static String PNG_HEADER = "PNG-";
--- End diff --

Yes, [Drew Noakes Metadata Extractor](https://drewnoakes.com/code/exif/) 
appears to have changed.  The Tika parsers can't work with the older version, 
and they don't coexist well with the same nar bundle.

The only option I see for these to co-exist without this change is to 
revert the nifi-image-bundle back to what it was and create nifi-media-bundle 
as a whole new bundle, but that seems like it will be confusing down the road.  
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041375
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041597
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/media/TestExtractMediaMetadata.java
 ---
@@ -0,0 +1,450 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.util.MockFlowFile;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class TestExtractMediaMetadata {
+
+@Test
+public void testProperties() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Map propertyValues = 
context.getProperties();
+assertEquals(6, propertyValues.size());
+}
+
+@Test
+public void testRelationships() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Set relationships = 
context.getAvailableRelationships();
+assertEquals(2, relationships.size());
+assertTrue(relationships.contains(ExtractMediaMetadata.SUCCESS));
+assertTrue(relationships.contains(ExtractMediaMetadata.FAILURE));
+}
+
+@Test
+public void testTextBytes() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FILTER, "");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_PREFIX, 
"txt.");
+runner.assertValid();
+
+final Map attrs = new HashMap<>();
+attrs.put("filename", "test1.txt");
+runner.enqueue("test1".getBytes(), attrs);
+runner.run();
+
+runner.assertAllFlowFilesTransferred(ExtractMediaMetadata.SUCCESS, 
1);
+runner.assertTransferCount(ExtractMediaMetadata.FAILURE, 0);
+
+final List successFiles = 
runner.getFlowFilesForRelationship(ExtractMediaMetadata.SUCCESS);
+MockFlowFile flowFile0 = successFiles.get(0);
+flowFile0.assertAttributeExists("filename");
+flowFile0.assertAttributeEquals("filename", "test1.txt");
+flowFile0.assertAttributeExists("txt.Content-Type");
+
assertTrue(flowFile0.getAttribute("txt.Content-Type").startsWith("text/plain"));
+flowFile0.assertAttributeExists("txt.X-Parsed-By");
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.DefaultParser"));
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.txt.TXTParser"));
+flowFile0.assertAttributeExists("txt.Content-Encoding");
+flowFile0.assertAttributeEquals("txt.Content-Encoding", 
"ISO-8859-1");
+flowFile0.assertContentEquals("test1".getBytes("UTF-8"));
+}
+
+@Test
+public void testNoFlowFile() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FILTER,

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread joewitt
Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041768
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/image/ExtractImageMetadataTest.java
 ---
@@ -37,7 +37,7 @@
 private static String BMP_HEADER = "BMP Header.";
 private static String JPEG_HEADER = "JPEG.";
 private static String GIF_HEADER = "GIF Header.";
-private static String PNG_HEADER = "PNG.";
+private static String PNG_HEADER = "PNG-";
--- End diff --

When composing Nar's people often think about how various processors do 
similar things and make sense as a bundle of like things from a user 
perspective.  However, they're really only about classloader isolation and so 
the thinking should be more purely about their dependencies and if they 
can/should co-exist nicely.  This case sounds like this to me where we're 
thinking intuitively that these processors should be bundled because they do 
similar (media processing) things and then we're trying to wrangle their 
dependencies.

This is just a food for thought comment.  I'm not suggesting we have to 
change this one.  It may be correct as is.

For this specific case, could we perhaps bridge the header/attribute used 
so that it will map from the new name to the old name? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041847
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread joewitt
Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68041998
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attri

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68042461
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68042651
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68042689
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
--- End diff --

Good point, fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68042751
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not w

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68042865
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68044015
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/image/ExtractImageMetadataTest.java
 ---
@@ -37,7 +37,7 @@
 private static String BMP_HEADER = "BMP Header.";
 private static String JPEG_HEADER = "JPEG.";
 private static String GIF_HEADER = "GIF Header.";
-private static String PNG_HEADER = "PNG.";
+private static String PNG_HEADER = "PNG-";
--- End diff --

If by "bridge the header/attribute" you mean adjust keys/values to match 
the old version, I don't have a list identifying what changed.  This particular 
change was caught on chance because of a unit test.

I'm open to suggestions, but the only option I see that has doesn't break 
backward compatiblity is isolate the processors back into separate NARs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread joewitt
Github user joewitt commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68044679
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/image/ExtractImageMetadataTest.java
 ---
@@ -37,7 +37,7 @@
 private static String BMP_HEADER = "BMP Header.";
 private static String JPEG_HEADER = "JPEG.";
 private static String GIF_HEADER = "GIF Header.";
-private static String PNG_HEADER = "PNG.";
+private static String PNG_HEADER = "PNG-";
--- End diff --

Ok.  Even that punts the problem because it just means we'll stick with an 
older dependency.  My vote here would be we keep it as-is.  Update the 
migration guide to call this issue out when going to 0.7 (and 1.0).  And we 
update the processor documentation to highlight that the attributes extracted 
by this processor are not stable and subject to change across upgrades.  It 
isn't great but it is honest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68048957
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

And returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but 

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68052320
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68052582
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

Sorry if I was confusing, I really like this property and was wondering if 
it should be made required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If yo

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68055976
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68056970
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/test/java/org/apache/nifi/processors/media/TestExtractMediaMetadata.java
 ---
@@ -0,0 +1,450 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.util.MockFlowFile;
+import org.apache.nifi.util.TestRunner;
+import org.apache.nifi.util.TestRunners;
+import org.junit.Test;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+import java.util.Set;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class TestExtractMediaMetadata {
+
+@Test
+public void testProperties() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Map propertyValues = 
context.getProperties();
+assertEquals(6, propertyValues.size());
+}
+
+@Test
+public void testRelationships() {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+ProcessContext context = runner.getProcessContext();
+Set relationships = 
context.getAvailableRelationships();
+assertEquals(2, relationships.size());
+assertTrue(relationships.contains(ExtractMediaMetadata.SUCCESS));
+assertTrue(relationships.contains(ExtractMediaMetadata.FAILURE));
+}
+
+@Test
+public void testTextBytes() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FILTER, "");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_PREFIX, 
"txt.");
+runner.assertValid();
+
+final Map attrs = new HashMap<>();
+attrs.put("filename", "test1.txt");
+runner.enqueue("test1".getBytes(), attrs);
+runner.run();
+
+runner.assertAllFlowFilesTransferred(ExtractMediaMetadata.SUCCESS, 
1);
+runner.assertTransferCount(ExtractMediaMetadata.FAILURE, 0);
+
+final List successFiles = 
runner.getFlowFilesForRelationship(ExtractMediaMetadata.SUCCESS);
+MockFlowFile flowFile0 = successFiles.get(0);
+flowFile0.assertAttributeExists("filename");
+flowFile0.assertAttributeEquals("filename", "test1.txt");
+flowFile0.assertAttributeExists("txt.Content-Type");
+
assertTrue(flowFile0.getAttribute("txt.Content-Type").startsWith("text/plain"));
+flowFile0.assertAttributeExists("txt.X-Parsed-By");
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.DefaultParser"));
+
assertTrue(flowFile0.getAttribute("txt.X-Parsed-By").contains("org.apache.tika.parser.txt.TXTParser"));
+flowFile0.assertAttributeExists("txt.Content-Encoding");
+flowFile0.assertAttributeEquals("txt.Content-Encoding", 
"ISO-8859-1");
+flowFile0.assertContentEquals("test1".getBytes("UTF-8"));
+}
+
+@Test
+public void testNoFlowFile() throws IOException {
+final TestRunner runner = TestRunners.newTestRunner(new 
ExtractMediaMetadata());
+runner.setProperty(ExtractMediaMetadata.MIME_TYPE_FILTER, 
"text/.*");
+runner.setProperty(ExtractMediaMetadata.METADATA_KEY_FIL

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68061144
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68070698
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
--- End diff --

I've added it back with a default value, so it will be an explicit choice 
to delete the limit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68073301
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread JPercivall
Github user JPercivall commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68074121
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max At

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread bbende
Github user bbende commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68075384
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/556#discussion_r68076407
  
--- Diff: 
nifi-nar-bundles/nifi-media-bundle/nifi-media-processors/src/main/java/org/apache/nifi/processors/media/ExtractMediaMetadata.java
 ---
@@ -0,0 +1,311 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.media;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.regex.Pattern;
+
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.SupportsBatching;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.annotation.lifecycle.OnScheduled;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.components.ValidationContext;
+import org.apache.nifi.components.ValidationResult;
+import org.apache.nifi.components.Validator;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.logging.ProcessorLog;
+import org.apache.nifi.processor.AbstractProcessor;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.ProcessorInitializationContext;
+import org.apache.nifi.processor.Relationship;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.util.ObjectHolder;
+
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.TikaInputStream;
+import org.apache.tika.metadata.Metadata;
+import org.apache.tika.parser.AutoDetectParser;
+import org.apache.tika.sax.BodyContentHandler;
+import org.xml.sax.SAXException;
+
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"media", "file", "format", "metadata", "audio", "video", "image", 
"document", "pdf"})
+@CapabilityDescription("Extract the content metadata from flowfiles 
containing audio, video, image, and other file "
++ "types.  This processor relies on the Apache Tika project for 
file format detection and parsing.  It "
++ "extracts a long list of metadata types for media files 
including audio, video, and print media "
++ "formats."
++ "For the more details and the list of supported file types, 
visit the library's website "
++ "at http://tika.apache.org/.";)
+@WritesAttributes({@WritesAttribute(attribute = ".", description = "The extracted content metadata "
++ "will be inserted with the attribute name \".\", or \"\" if "
++ "\"Metadata Key Prefix\" is not provided.")})
+@SupportsBatching
+public class ExtractMediaMetadata extends AbstractProcessor {
+
+static final PropertyDescriptor MAX_NUMBER_OF_ATTRIBUTES = new 
PropertyDescriptor.Builder()
+.name("Max Number of Attributes")
+.description("Specify the max number of attributes to add to 
the flowfile. There is no guarantee in what order"
++ " the tags will be processed. By default it will 
process all of them.")
+.required(false)
+
.addValidator(StandardValidators.NON_NEGATIVE_INTEGER_VALIDATOR)
+.build();
+
+private static final PropertyDescriptor MAX_ATTRIBUTE_LENGTH = new 
PropertyDescriptor.Builder()
+.name("Max Attrib

[GitHub] nifi pull request #556: NIFI-615 - Create a processor to extract WAV file ch...

2016-06-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---