[ https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187225#comment-16187225 ]
ASF GitHub Bot commented on TIKA-2400: -------------------------------------- smadha commented on a change in pull request #208: Fix for TIKA-2400 Standardizing current Object Recognition REST parsers URL: https://github.com/apache/tika/pull/208#discussion_r142016934 ########## File path: tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java ########## @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, ContentHandler handler, Metad for (RecognisedObject object : objects) { if (object instanceof CaptionObject) { if (xhtmlStartVal == null) xhtmlStartVal = "captions"; - LOG.debug("Add {}", object); - String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)", - object.getLabel(), object.getConfidence()); - metadata.add(MD_KEY_IMG_CAP, mdValue); - acceptedObjects.add(object); + String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", object.getLabel(), object.getConfidence()); Review comment: As of now to get label and confidence people have to split. I think traversing two arrays in a single loop will be easier than that. We can ensure that these two arrays are of same length. Also if you want JSON why don't store a serialised JSON in one metadata key, looks bad but better than a single String with space separated label and confidence. I'll leave it upto you guys. :+1: ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Standardizing current Object Recognition REST parsers > ----------------------------------------------------- > > Key: TIKA-2400 > URL: https://issues.apache.org/jira/browse/TIKA-2400 > Project: Tika > Issue Type: Sub-task > Components: parser > Reporter: Thejan Wijesinghe > Priority: Minor > Fix For: 1.17 > > > # This involves adding apiBaseUris and refactoring current Object Recognition > REST parsers, > # Refactoring dockerfiles related to those parsers. > # Moving the logic related to checking minimum confidence into servers -- This message was sent by Atlassian JIRA (v6.4.14#64029)