Re: Where is the source for the .dat files in Kuromoji?
On Dec 3, 2013, at 8:38 AM, Benson Margulies ben...@basistech.com wrote: I'm not clear that there's anything that anyone would complain of. The question is, are the .dat files part of the source bundle that is the 'official release'? I just fetched from git, not from the official release, so I don't know. I’d say the .dat files are part of the source bundle, which is the ‘official release’, but PMCs feel free to chime in... The .dat files have been checked into SVN in binary form to make Lucene easy to build and they’re also rather modest in size thanks to squeezing work done by Robert and Uwe. Best, Christian - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Where is the source for the .dat files in Kuromoji?
There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named unk.def. In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release?
RE: Where is the source for the .dat files in Kuromoji?
Hi Benson, If you run ant regenerate, it downloads the source files (which is ant download-dict) and then rebuilds (ant build-dict) the FSTs and other binary stuff stored in the dat file. See also the ivy.xml. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Benson Margulies [mailto:ben...@basistech.com] Sent: Monday, December 02, 2013 6:12 PM To: java-user@lucene.apache.org; Christian Moen Subject: Where is the source for the .dat files in Kuromoji? There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named unk.def. In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Where is the source for the .dat files in Kuromoji?
Thanks. On Mon, Dec 2, 2013 at 12:21 PM, Uwe Schindler u...@thetaphi.de wrote: Hi Benson, If you run ant regenerate, it downloads the source files (which is ant download-dict) and then rebuilds (ant build-dict) the FSTs and other binary stuff stored in the dat file. See also the ivy.xml. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Benson Margulies [mailto:ben...@basistech.com] Sent: Monday, December 02, 2013 6:12 PM To: java-user@lucene.apache.org; Christian Moen Subject: Where is the source for the .dat files in Kuromoji? There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named unk.def. In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Where is the source for the .dat files in Kuromoji?
Hello Benson, The sources for the .dat files are available from https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz and a range of other places. I’m not sure I follow what you’re saying regarding unk.def -- it’s to my knowledge used as-is from the above sources when the binary .dat files are made. (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.) Perhaps I’m missing something. Could you clarify how you think things should be done? Many thanks, Christian Moen アティリカ株式会社 http://www.atilika.com On Dec 3, 2013, at 2:11 AM, Benson Margulies ben...@basistech.com wrote: There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named unk.def. In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Where is the source for the .dat files in Kuromoji?
On Mon, Dec 2, 2013 at 6:27 PM, Christian Moen c...@atilika.com wrote: Hello Benson, The sources for the .dat files are available from https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz and a range of other places. I’m not sure I follow what you’re saying regarding unk.def -- it’s to my knowledge used as-is from the above sources when the binary .dat files are made. (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.) Perhaps I’m missing something. Could you clarify how you think things should be done? I'm not clear that there's anything that anyone would complain of. The question is, are the .dat files part of the source bundle that is the 'official release'? I just fetched from git, not from the official release, so I don't know. Many thanks, Christian Moen アティリカ株式会社 http://www.atilika.com On Dec 3, 2013, at 2:11 AM, Benson Margulies ben...@basistech.com wrote: There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named unk.def. In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release?