Number of Times 1 Field has occured in a document within a Given TimeRange,.
Hello. This might be a long mail but I have mentioned everything very clearly so that I can get needed assistance. Indexing: I have a use case. I am indexing two fields. Field 1 : Value. Say suppose 1,2,3,4,5 etc.. Field 2 : Time in Long Format . Say 20131203010005, 20131203132332 etc.. Both the field values are extracted from number of documents. Each document contains N number of such entry. Indexing is not a problem. I am able to index both the fields properly. Search: Construction of Query: During searching, I want that given a value (for field 1 say E.g. 2), get me the count of occurence of 2 each hour in the given index. i.e. From 2013120300-20131203005959 and then from 2013120301-20131203015959.. I gave first field in TermQuery. and for second I used NumericRange Query -creating query for 24 time slots in a day. Created a Boolean Query and gave TermQuery and NumericRange Query as two clauses with MUST and executed. Execution/Result: The query is giving me the output in terms of documents where the value 2 and given range is present. Based on current implementation, I need to iterate through each doc found, get all the value field (matching input value=2) and then again impose an IF condition for the range and increment a counter everytime the IF is executed. This is OK but I am looking for a shorter method. A. Is it possible that on firing first query, I get the count of occurence itself. I think .search always returns number of docs. B. If this is not possible, is it possible that having obtained the document in which the given input might be present, again execute a query on that document itself and find the occurence of given input for the given time range. C. I tried with putting a count of occurence of given value during the indexing phase in index itself. But since TIME CROSSOVER can also happen inside the same file, the count which is stored during the indexing process is not proper. Hence I don't think I can store the count of the occurence during the indexing phase itself. Please assist. Let me know if any point is not clear and I will clarify it again. -- Regards Ankit Murarka "What lies behind us and what lies before us are tiny matters compared with what lies within us" - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Where is the source for the .dat files in Kuromoji?
On Mon, Dec 2, 2013 at 6:27 PM, Christian Moen wrote: > Hello Benson, > > The sources for the .dat files are available from > > > https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz > > http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz > > and a range of other places. > > I’m not sure I follow what you’re saying regarding unk.def -- it’s to my > knowledge used as-is from the above sources when the binary .dat files are > made. (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.) > > Perhaps I’m missing something. Could you clarify how you think things > should be done? > I'm not clear that there's anything that anyone would complain of. The question is, are the .dat files part of the source bundle that is the 'official release'? I just fetched from git, not from the official release, so I don't know. > > Many thanks, > > Christian Moen > アティリカ株式会社 > http://www.atilika.com > > On Dec 3, 2013, at 2:11 AM, Benson Margulies wrote: > > > There are a handful of binary files in > ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending > in .dat. > > > > Trailing around in the source, it seems as if at least one of these > derives from a source file named "unk.def". In turn, this file comes from > a dependency. should the build generate the file rather than having it in > the tree and shipped as part of the source release? > > > > > >
Re: Where is the source for the .dat files in Kuromoji?
Hello Benson, The sources for the .dat files are available from https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz and a range of other places. I’m not sure I follow what you’re saying regarding unk.def -- it’s to my knowledge used as-is from the above sources when the binary .dat files are made. (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.) Perhaps I’m missing something. Could you clarify how you think things should be done? Many thanks, Christian Moen アティリカ株式会社 http://www.atilika.com On Dec 3, 2013, at 2:11 AM, Benson Margulies wrote: > There are a handful of binary files in > ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in > .dat. > > Trailing around in the source, it seems as if at least one of these derives > from a source file named "unk.def". In turn, this file comes from a > dependency. should the build generate the file rather than having it in the > tree and shipped as part of the source release? > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Where is the source for the .dat files in Kuromoji?
Thanks. On Mon, Dec 2, 2013 at 12:21 PM, Uwe Schindler wrote: > Hi Benson, > > If you run "ant regenerate", it downloads the source files (which is "ant > download-dict") and then rebuilds ("ant build-dict") the FSTs and other > binary stuff stored in the dat file. See also the ivy.xml. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Benson Margulies [mailto:ben...@basistech.com] > > Sent: Monday, December 02, 2013 6:12 PM > > To: java-user@lucene.apache.org; Christian Moen > > Subject: Where is the source for the .dat files in Kuromoji? > > > > There are a handful of binary files > > in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames > > ending in .dat. > > > > Trailing around in the source, it seems as if at least one of these > derives from > > a source file named "unk.def". In turn, this file comes from a > dependency. > > should the build generate the file rather than having it in the tree and > > shipped as part of the source release? > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
RE: Where is the source for the .dat files in Kuromoji?
Hi Benson, If you run "ant regenerate", it downloads the source files (which is "ant download-dict") and then rebuilds ("ant build-dict") the FSTs and other binary stuff stored in the dat file. See also the ivy.xml. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Benson Margulies [mailto:ben...@basistech.com] > Sent: Monday, December 02, 2013 6:12 PM > To: java-user@lucene.apache.org; Christian Moen > Subject: Where is the source for the .dat files in Kuromoji? > > There are a handful of binary files > in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames > ending in .dat. > > Trailing around in the source, it seems as if at least one of these derives > from > a source file named "unk.def". In turn, this file comes from a dependency. > should the build generate the file rather than having it in the tree and > shipped as part of the source release? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Where is the source for the .dat files in Kuromoji?
There are a handful of binary files in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in .dat. Trailing around in the source, it seems as if at least one of these derives from a source file named "unk.def". In turn, this file comes from a dependency. should the build generate the file rather than having it in the tree and shipped as part of the source release?
Re: JLemmaGen project
One important update about project! .lem files are NOT licensed under Apache License 2.0 and are restricted to use only in non-commercial software. thank, miso Na Monday 11 November 2013 13:28:21 Michal Hlavac napísali: > Hi, > > I changed license to Apache Licence 2.0 to be more compatible with lucene. > > m. > > Na Monday 04 November 2013 19:14:57 Lance Norskog napísali: > > This is very cool! Lemmatization is an important tool for making search > > work better. > > > > Would you consider changing the licensing to the Apache 2.0 license? > > > > On 10/23/2013 08:17 AM, Michal Hlavac wrote: > > > Hi, > > > > > > I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. > > > Originally it's written in C#. > > > Lemmagen project uses rules to lemmatize word. Algorithm is described > > > here: > > > http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf > > > > > > Project is writtten under GPLv3. Sources are located on bitbucket server: > > > https://bitbucket.org/hlavki/jlemmagen > > > > > > There is also Lemmagen4j project which use more memory and without > > > prebuilded trees. > > > > > > I obtained also licenced dictionaries to build rules tree for 15 > > > languages. Dictionaries are licenced, but prebuilded trees don't. > > > But you can also build your own dictionary. > > > > > > Project contains also TokenFilter for lucene/solr. > > > Project is not stable, but any feedback is appreciated. > > > > > > Supported languages are: > > > mlteast-bg - Bulgarian > > > mlteast-cs - Czech > > > mlteast-en - English > > > mlteast-et - Estonian > > > mlteast-fr - French > > > mlteast-hu - Hungarian > > > mlteast-mk - Macedonia > > > mlteast-pl - Polish > > > mlteast-ro - Romanian > > > mlteast-ru - Russian > > > mlteast-sk - Slovak > > > mlteast-sl - Slovene > > > mlteast-sr - Serbian > > > mlteast-uk - Ukrainian > > > > > > thanks, miso > > > > > > > > > - > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org