[
https://issues.apache.org/jira/browse/HIVEMALL-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077682#comment-16077682
]
Makoto Yui edited comment on HIVEMALL-130 at 7/7/17 6:24 AM:
-------------------------------------------------------------
{code:sql}
set hivevar:dicturl =
"https://raw.githubusercontent.com/atilika/kuromoji/d0700ab6dd489aaf0fcb1e4e78ce2f682be9f255/kuromoji-core/src/test/resources/userdict.txt";
select tokenize_ja(text, mode, stopWords, stopTags, ${dicturl}) from src;
{code}
was (Author: myui):
{code:java}
set hivevar:dicturl =
"https://raw.githubusercontent.com/atilika/kuromoji/d0700ab6dd489aaf0fcb1e4e78ce2f682be9f255/kuromoji-core/src/test/resources/userdict.txt";
select tokenize_ja(text, mode, stopWords, stopTags, ${dicturl}) from src;
{code}
> Support user-defined dictionary for `tokenize_ja`
> -------------------------------------------------
>
> Key: HIVEMALL-130
> URL: https://issues.apache.org/jira/browse/HIVEMALL-130
> Project: Hivemall
> Issue Type: Improvement
> Reporter: Takuya Kitazawa
> Assignee: Takuya Kitazawa
>
> Support another argument "userDict". Type would be List<String>, and each
> element defines a new word in the following format:
> <word>,<result>,<read>,<class>
> https://github.com/atilika/kuromoji/blob/d0700ab6dd489aaf0fcb1e4e78ce2f682be9f255/kuromoji-core/src/test/resources/userdict.txt
> Reference for adding user dictionary in the Lucene API (Japanese):
> http://d.hatena.ne.jp/Kazuhira/20130616/1371390716
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)