Hi Ken, I used Nutch's LanguageProfiler in order to produce language profile. More about this issue you can find: http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/authors.html (It's not self - promoting !) Download the sources, using ant task you'll able to create lang profile. If you need any help, please do not hesitate to ask.
BR, Oleg. 2010/8/24 Jan Høydahl (JIRA) <[email protected]> > > [ > https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901900#action_12901900] > > Jan Høydahl commented on TIKA-492: > ---------------------------------- > > I'm in the process of gathering enough text content for the profiles. > > I also posted a question to the user list to ask what tool/process you use > to generate profiles but did not see an answer yet. > > > Add language identification support for North Sami, Lule Sami and South > Sami > > > ---------------------------------------------------------------------------- > > > > Key: TIKA-492 > > URL: https://issues.apache.org/jira/browse/TIKA-492 > > Project: Tika > > Issue Type: New Feature > > Components: languageidentifier > > Affects Versions: 0.7 > > Reporter: Jan Høydahl > > Assignee: Ken Krugler > > Priority: Minor > > > > We need added support for Sami languages. > > According to document "Requirements for support for Sami languages in > data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic > Level" support by detecting North Sami, Lule Sami and South Sami. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- Best regards, Oleg.
