[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338663#comment-17338663 ]
ASF GitHub Bot commented on TIKA-94: ------------------------------------ lewismc commented on pull request #406: URL: https://github.com/apache/tika/pull/406#issuecomment-831595625 @tballison I know you and I spoke about refactoring this as simple a parser interface... I would like to merge it for the time being and I can begin to work on the refactoring in a separate ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Speech-to-text transcription > ---------------------------- > > Key: TIKA-94 > URL: https://issues.apache.org/jira/browse/TIKA-94 > Project: Tika > Issue Type: New Feature > Components: parser > Reporter: Jukka Zitting > Assignee: Lewis John McGibbney > Priority: Minor > Labels: new-parser > > Like OCR for image files (TIKA-93), we could try using speech recognition to > extract text content (where available) from audio (and video!) files. > The CMU Sphinx engine (http://cmusphinx.sourceforge.net/) looks promising and > comes with a friendly license. -- This message was sent by Atlassian Jira (v8.3.4#803005)