[ https://issues.apache.org/jira/browse/TIKA-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223878#comment-17223878 ]
Hudson commented on TIKA-3216: ------------------------------ SUCCESS: Integrated in Jenkins build Tika ยป tika-branch1x-jdk8 #24 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/24/]) TIKA-3216 -- Add FileProfiler (tallison: [https://github.com/apache/tika/commit/65c318358db36e64afa477d2b90713f77ec73c4c]) * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java * (add) tika-eval/src/main/java/org/apache/tika/eval/FileProfiler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/TikaEvalCLI.java * (edit) tika-core/src/main/java/org/apache/tika/detect/FileCommandDetector.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/io/DBWriter.java * (add) tika-eval/src/main/java/org/apache/tika/eval/batch/FileProfilerBuilder.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/MimeBuffer.java * (add) tika-eval/src/main/resources/tika-eval-file-profiler-config.xml * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/EvalConsumerBuilder.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/XMLErrorLogUpdater.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/JDBCUtil.java > Add FileProfiler to tika-eval > ----------------------------- > > Key: TIKA-3216 > URL: https://issues.apache.org/jira/browse/TIKA-3216 > Project: Tika > Issue Type: Task > Reporter: Tim Allison > Assignee: Tim Allison > Priority: Major > Fix For: 1.25 > > > So far, tika-eval has been focused on processing "extracts", that is, the > result of Tika or another text extractor. I think it would be useful to add > a basic FileProfiler that handles the raw input files only but does not parse > them. This is useful as a first step when profiling a directory of files > before going through the costly process of parsing. > Without parsing, we can get file length, digest and file type detection. -- This message was sent by Atlassian Jira (v8.3.4#803005)