Re: Where is the source for the .dat files in Kuromoji?

2013-12-03 Thread Christian Moen
On Dec 3, 2013, at 8:38 AM, Benson Margulies ben...@basistech.com wrote:

 I'm not clear that there's anything that anyone would complain of. The 
 question is, are the .dat files part of the source bundle that is the 
 'official release'? I just fetched from git, not from the official release, 
 so I don't know.

I’d say the .dat files are part of the source bundle, which is the ‘official 
release’, but PMCs feel free to chime in...

The .dat files have been checked into SVN in binary form to make Lucene easy to 
build and they’re also rather modest in size thanks to squeezing work done by 
Robert and Uwe.

Best,
Christian


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
There are a handful of binary files
in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
ending in .dat.

Trailing around in the source, it seems as if at least one of these derives
from a source file named unk.def.  In turn, this file comes from a
dependency. should the build generate the file rather than having it in the
tree and shipped as part of the source release?


RE: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Uwe Schindler
Hi Benson,

If you run ant regenerate, it downloads the source files (which is ant 
download-dict) and then rebuilds (ant build-dict) the FSTs and other binary 
stuff stored in the dat file. See also the ivy.xml.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Benson Margulies [mailto:ben...@basistech.com]
 Sent: Monday, December 02, 2013 6:12 PM
 To: java-user@lucene.apache.org; Christian Moen
 Subject: Where is the source for the .dat files in Kuromoji?
 
 There are a handful of binary files
 in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
 ending in .dat.
 
 Trailing around in the source, it seems as if at least one of these derives 
 from
 a source file named unk.def.  In turn, this file comes from a dependency.
 should the build generate the file rather than having it in the tree and
 shipped as part of the source release?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
Thanks.


On Mon, Dec 2, 2013 at 12:21 PM, Uwe Schindler u...@thetaphi.de wrote:

 Hi Benson,

 If you run ant regenerate, it downloads the source files (which is ant
 download-dict) and then rebuilds (ant build-dict) the FSTs and other
 binary stuff stored in the dat file. See also the ivy.xml.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Benson Margulies [mailto:ben...@basistech.com]
  Sent: Monday, December 02, 2013 6:12 PM
  To: java-user@lucene.apache.org; Christian Moen
  Subject: Where is the source for the .dat files in Kuromoji?
 
  There are a handful of binary files
  in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
  ending in .dat.
 
  Trailing around in the source, it seems as if at least one of these
 derives from
  a source file named unk.def.  In turn, this file comes from a
 dependency.
  should the build generate the file rather than having it in the tree and
  shipped as part of the source release?


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Christian Moen
Hello Benson,

The sources for the .dat files are available from

https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz

http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz

and a range of other places.

I’m not sure I follow what you’re saying regarding unk.def -- it’s to my 
knowledge used as-is from the above sources when the binary .dat files are 
made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)

Perhaps I’m missing something.  Could you clarify how you think things should 
be done?

Many thanks,

Christian Moen
アティリカ株式会社
http://www.atilika.com

On Dec 3, 2013, at 2:11 AM, Benson Margulies ben...@basistech.com wrote:

 There are a handful of binary files in 
 ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in 
 .dat.
 
 Trailing around in the source, it seems as if at least one of these derives 
 from a source file named unk.def.  In turn, this file comes from a 
 dependency. should the build generate the file rather than having it in the 
 tree and shipped as part of the source release?
 
 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
On Mon, Dec 2, 2013 at 6:27 PM, Christian Moen c...@atilika.com wrote:

 Hello Benson,

 The sources for the .dat files are available from


 https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz

 http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz





 and a range of other places.

 I’m not sure I follow what you’re saying regarding unk.def -- it’s to my
 knowledge used as-is from the above sources when the binary .dat files are
 made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)

 Perhaps I’m missing something.  Could you clarify how you think things
 should be done?


I'm not clear that there's anything that anyone would complain of. The
question is, are the .dat files part of the source bundle that is the
'official release'? I just fetched from git, not from the official release,
so I don't know.








 Many thanks,

 Christian Moen
 アティリカ株式会社
 http://www.atilika.com

 On Dec 3, 2013, at 2:11 AM, Benson Margulies ben...@basistech.com wrote:

  There are a handful of binary files in
 ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending
 in .dat.
 
  Trailing around in the source, it seems as if at least one of these
 derives from a source file named unk.def.  In turn, this file comes from
 a dependency. should the build generate the file rather than having it in
 the tree and shipped as part of the source release?