Hi Ken,
I used Nutch's LanguageProfiler in order to produce language profile.
More about this issue you can find:
http://www.ibm.com/developerworks/opensource/tutorials/os-apache-tika/authors.html
(It's not self - promoting !)
Download the sources, using ant task you'll able to create lang profile.
If you need any help, please do not hesitate to ask.


BR,
Oleg.

2010/8/24 Jan Høydahl (JIRA) <[email protected]>

>
>    [
> https://issues.apache.org/jira/browse/TIKA-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901900#action_12901900]
>
> Jan Høydahl commented on TIKA-492:
> ----------------------------------
>
> I'm in the process of gathering enough text content for the profiles.
>
> I also posted a question to the user list to ask what tool/process you use
> to generate profiles but did not see an answer yet.
>
> > Add language identification support for North Sami, Lule Sami and South
> Sami
> >
> ----------------------------------------------------------------------------
> >
> >                 Key: TIKA-492
> >                 URL: https://issues.apache.org/jira/browse/TIKA-492
> >             Project: Tika
> >          Issue Type: New Feature
> >          Components: languageidentifier
> >    Affects Versions: 0.7
> >            Reporter: Jan Høydahl
> >            Assignee: Ken Krugler
> >            Priority: Minor
> >
> > We need added support for Sami languages.
> > According to document "Requirements for support for Sami languages in
> data processing" (http://www.samit.no/01-850-51.pdf) Tika will get "Basic
> Level" support by detecting North Sami, Lule Sami and South Sami.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Best regards, Oleg.

Reply via email to