Number of Times 1 Field has occured in a document within a Given TimeRange,.

2013-12-02 Thread Ankit Murarka

Hello.

This might be a long mail but I have mentioned everything very clearly 
so that I can get needed assistance.


Indexing:
I have a use case. I am indexing two fields.

Field 1 : Value. Say suppose 1,2,3,4,5 etc..

Field 2 : Time  in Long Format . Say 20131203010005, 20131203132332 etc..

Both the field values are extracted from number of documents. Each 
document contains N number of such entry.


Indexing is not a problem. I am able to index both the fields properly.

Search:

Construction of Query:
During searching, I want that given a value (for field 1 say E.g. 2), 
get me the count of occurence of 2 each hour in the given index. i.e. 
From 2013120300-20131203005959 and then from 
2013120301-20131203015959..


I gave first field in TermQuery. and for second I used NumericRange 
Query -creating query for 24 time slots in a day.


Created a Boolean Query and gave TermQuery and NumericRange Query as two 
clauses with MUST and executed.


Execution/Result:

The query is giving me the output in terms of documents where the value 
2 and given range is present. Based on current implementation, I need to 
iterate through each doc found, get all the value field (matching input 
value=2) and then again impose an IF condition for the range and 
increment a counter everytime the IF is executed.


This is OK but I am looking for a shorter method.

A. Is it possible that on firing first query, I get the count of 
occurence itself. I think .search always returns number of docs.
B. If this is not possible, is it possible that having obtained the 
document in which the given input might be present, again execute a 
query on that document itself and find the occurence of given input for 
the given time range.
C. I tried with putting a count of occurence of given value during the 
indexing phase in index itself. But since TIME CROSSOVER can also happen 
inside the same file, the count which is stored during the indexing 
process is not proper. Hence I don't think I can store the count of the 
occurence during the indexing phase itself.


Please assist. Let me know if any point is not clear and I will clarify 
it again.


--
Regards

Ankit Murarka

"What lies behind us and what lies before us are tiny matters compared with what 
lies within us"


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
On Mon, Dec 2, 2013 at 6:27 PM, Christian Moen  wrote:

> Hello Benson,
>
> The sources for the .dat files are available from
>
>
> https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
>
> http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz




>
> and a range of other places.
>
> I’m not sure I follow what you’re saying regarding unk.def -- it’s to my
> knowledge used as-is from the above sources when the binary .dat files are
> made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)
>
> Perhaps I’m missing something.  Could you clarify how you think things
> should be done?
>

I'm not clear that there's anything that anyone would complain of. The
question is, are the .dat files part of the source bundle that is the
'official release'? I just fetched from git, not from the official release,
so I don't know.







>
> Many thanks,
>
> Christian Moen
> アティリカ株式会社
> http://www.atilika.com
>
> On Dec 3, 2013, at 2:11 AM, Benson Margulies  wrote:
>
> > There are a handful of binary files in
> ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending
> in .dat.
> >
> > Trailing around in the source, it seems as if at least one of these
> derives from a source file named "unk.def".  In turn, this file comes from
> a dependency. should the build generate the file rather than having it in
> the tree and shipped as part of the source release?
> >
> >
>
>


Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Christian Moen
Hello Benson,

The sources for the .dat files are available from

https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz

http://atilika.com/releases/mecab-ipadic/mecab-ipadic-2.7.0-20070801.tar.gz

and a range of other places.

I’m not sure I follow what you’re saying regarding unk.def -- it’s to my 
knowledge used as-is from the above sources when the binary .dat files are 
made.  (See lucene/analysis/kuromoji/src/tools in the Lucene code tree.)

Perhaps I’m missing something.  Could you clarify how you think things should 
be done?

Many thanks,

Christian Moen
アティリカ株式会社
http://www.atilika.com

On Dec 3, 2013, at 2:11 AM, Benson Margulies  wrote:

> There are a handful of binary files in 
> ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames ending in 
> .dat.
> 
> Trailing around in the source, it seems as if at least one of these derives 
> from a source file named "unk.def".  In turn, this file comes from a 
> dependency. should the build generate the file rather than having it in the 
> tree and shipped as part of the source release?
> 
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
Thanks.


On Mon, Dec 2, 2013 at 12:21 PM, Uwe Schindler  wrote:

> Hi Benson,
>
> If you run "ant regenerate", it downloads the source files (which is "ant
> download-dict") and then rebuilds ("ant build-dict") the FSTs and other
> binary stuff stored in the dat file. See also the ivy.xml.
>
> Uwe
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -Original Message-
> > From: Benson Margulies [mailto:ben...@basistech.com]
> > Sent: Monday, December 02, 2013 6:12 PM
> > To: java-user@lucene.apache.org; Christian Moen
> > Subject: Where is the source for the .dat files in Kuromoji?
> >
> > There are a handful of binary files
> > in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
> > ending in .dat.
> >
> > Trailing around in the source, it seems as if at least one of these
> derives from
> > a source file named "unk.def".  In turn, this file comes from a
> dependency.
> > should the build generate the file rather than having it in the tree and
> > shipped as part of the source release?
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


RE: Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Uwe Schindler
Hi Benson,

If you run "ant regenerate", it downloads the source files (which is "ant 
download-dict") and then rebuilds ("ant build-dict") the FSTs and other binary 
stuff stored in the dat file. See also the ivy.xml.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Benson Margulies [mailto:ben...@basistech.com]
> Sent: Monday, December 02, 2013 6:12 PM
> To: java-user@lucene.apache.org; Christian Moen
> Subject: Where is the source for the .dat files in Kuromoji?
> 
> There are a handful of binary files
> in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
> ending in .dat.
> 
> Trailing around in the source, it seems as if at least one of these derives 
> from
> a source file named "unk.def".  In turn, this file comes from a dependency.
> should the build generate the file rather than having it in the tree and
> shipped as part of the source release?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Where is the source for the .dat files in Kuromoji?

2013-12-02 Thread Benson Margulies
There are a handful of binary files
in ./src/resources/org/apache/lucene/analysis/ja/dict/ with filenames
ending in .dat.

Trailing around in the source, it seems as if at least one of these derives
from a source file named "unk.def".  In turn, this file comes from a
dependency. should the build generate the file rather than having it in the
tree and shipped as part of the source release?


Re: JLemmaGen project

2013-12-02 Thread Michal Hlavac
One important update about project!

.lem files are NOT licensed under Apache License 2.0 and are restricted to use 
only in non-commercial software.

thank, miso

Na Monday 11 November 2013 13:28:21 Michal Hlavac napísali:
> Hi,
> 
> I changed license to Apache Licence 2.0 to be more compatible with lucene.
> 
> m.
> 
> Na Monday 04 November 2013 19:14:57 Lance Norskog napísali:
> > This is very cool! Lemmatization is an important tool for making search 
> > work better.
> > 
> > Would you consider changing the licensing to the Apache 2.0 license?
> > 
> > On 10/23/2013 08:17 AM, Michal Hlavac wrote:
> > > Hi,
> > >
> > > I rewrote lemmatizer project LemmaGen (http://lemmatise.ijs.si/) to java. 
> > > Originally it's written in C#.
> > > Lemmagen project uses rules to lemmatize word. Algorithm is described 
> > > here:
> > > http://lemmatise.ijs.si/Download/File/Documentation%23JournalPaper.pdf
> > >
> > > Project is writtten under GPLv3. Sources are located on bitbucket server:
> > > https://bitbucket.org/hlavki/jlemmagen
> > >
> > > There is also Lemmagen4j project which use more memory and without 
> > > prebuilded trees.
> > >
> > > I obtained also licenced dictionaries to build rules tree for 15 
> > > languages. Dictionaries are licenced, but prebuilded trees don't.
> > > But you can also build your own dictionary.
> > >
> > > Project contains also TokenFilter for lucene/solr.
> > > Project is not stable, but any feedback is appreciated.
> > >
> > > Supported languages are:
> > > mlteast-bg - Bulgarian
> > > mlteast-cs - Czech
> > > mlteast-en - English
> > > mlteast-et - Estonian
> > > mlteast-fr - French
> > > mlteast-hu - Hungarian
> > > mlteast-mk - Macedonia
> > > mlteast-pl - Polish
> > > mlteast-ro - Romanian
> > > mlteast-ru - Russian
> > > mlteast-sk - Slovak
> > > mlteast-sl - Slovene
> > > mlteast-sr - Serbian
> > > mlteast-uk - Ukrainian
> > >
> > > thanks, miso
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >
> > 
> > 
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org