Re: Semantic indexing in Lucene

2011-05-23 Thread Paul Libbrecht
Diego,

The semanticvectors project has a mailing list and his author, Dominic Widdows, 
is responding actively there.

paul


Le 24 mai 2011 à 02:34, Diego Cavalcanti a écrit :

> Sorry, I thought the blog was yours! I will read the post and see if it
> helps me. Thank you!
> 
> About the Semantic Vectors project, surely I know how to get its source
> code. What I said is that I cannot use it only by API, because the Javadoc
> does not show all methods. I really do not want to change the project's
> source code. Well... this is not important for this list!
> 
> If anyone has another idea about how to implement semantic indexing in
> Lucene, I would be grateful!
> 
> []s,
> --
> Diego
> 
> 
> On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas  wrote:
> 
>> It's not my blog! :D
>> I used some of the ideas in that article
>> 
>> http://sujitpal.blogspot.com/2009/03/vector-space-classifier-using-lucene.html
>> in
>> order to perform classification with lucene for my tasks.
>> You can get full access to the source code of the project by typing in the
>> command line:
>> 
>> svn checkout *http*://
>> semanticvectors.googlecode.com/svn/trunk/semanticvectors-read-only
>> 
>> Or you can access the trunk directly by the url
>> http://semanticvectors.googlecode.com/svn/trunk/
>> 
>> On Tue, May 24, 2011 at 3:22 AM, Diego Cavalcanti <
>> di...@diegocavalcanti.com
>>> wrote:
>> 
>>> Hi Yiannis,
>>> 
>>> Thank your for your reply.
>>> 
>>> Yes, I'm referring to project Semantic Vectors. Before sending the
>> previous
>>> email, I read the project API and noticed that its most classes don't
>>> contain public methods, so that we cannot use the project
>> programmatically
>>> (only by command line).
>>> 
>>> I've seen your blog, but I haven't found any post about semantic indexing
>>> in
>>> Lucene. Can you point that for me, please?
>>> 
>>> Thanks,
>>> --
>>> Diego
>>> 
>>> 
>>> On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas 
>>> wrote:
>>> 
 Hi Diego,
 
 Are you referring to that project-->
 http://code.google.com/p/semanticvectors/ ?
 If yes , then documentation exists here
 
>>> 
>> http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html
>>> .
 Also I think this blog might interest you -->
 http://sujitpal.blogspot.com/ and
 the project related to it ---> http://jtmt.sf.net/
 
 BR,
 Yiannis
 
 On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti <
 di...@diegocavalcanti.com
> wrote:
 
> Hello,
> 
> I have a project which indexes and scores documents using Lucene.
 However,
> I'd like to do that using semantic indexing (LSI, LSA or Semantic
 Vectors).
> 
> I've read old posts and some people said that Semantic Vectors plays
>>> well
> with Lucene. However, I noticed that its classes are used only by
>>> command
> line (throw method main) instead of by API.
> 
> So, I'd like to know if anyone can suggest any other approach so that
>> I
> could use semantic indexing in Lucene.
> 
> Thanks,
> Diego
> 
 
>>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Semantic indexing in Lucene

2011-05-23 Thread Sujit Pal
I meant to check out the Semantic vectors project, but never got around
to it, so there is nothing in the blog (sujitpal.blogspot.com) that
talks about semantic vectors at the moment. Its on my (rather long) todo
list though... Sorry about that...

-sujit

On Mon, 2011-05-23 at 21:22 -0300, Diego Cavalcanti wrote:
> Hi Yiannis,
> 
> Thank your for your reply.
> 
> Yes, I'm referring to project Semantic Vectors. Before sending the previous
> email, I read the project API and noticed that its most classes don't
> contain public methods, so that we cannot use the project programmatically
> (only by command line).
> 
> I've seen your blog, but I haven't found any post about semantic indexing in
> Lucene. Can you point that for me, please?
> 
> Thanks,
> --
> Diego
> 
> 
> On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas  wrote:
> 
> > Hi Diego,
> >
> > Are you referring to that project-->
> > http://code.google.com/p/semanticvectors/ ?
> > If yes , then documentation exists here
> > http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html.
> > Also I think this blog might interest you -->
> > http://sujitpal.blogspot.com/ and
> > the project related to it ---> http://jtmt.sf.net/
> >
> > BR,
> > Yiannis
> >
> > On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti <
> > di...@diegocavalcanti.com
> > > wrote:
> >
> > > Hello,
> > >
> > > I have a project which indexes and scores documents using Lucene.
> > However,
> > > I'd like to do that using semantic indexing (LSI, LSA or Semantic
> > Vectors).
> > >
> > > I've read old posts and some people said that Semantic Vectors plays well
> > > with Lucene. However, I noticed that its classes are used only by command
> > > line (throw method main) instead of by API.
> > >
> > > So, I'd like to know if anyone can suggest any other approach so that I
> > > could use semantic indexing in Lucene.
> > >
> > > Thanks,
> > > Diego
> > >
> >


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
Sorry, I thought the blog was yours! I will read the post and see if it
helps me. Thank you!

About the Semantic Vectors project, surely I know how to get its source
code. What I said is that I cannot use it only by API, because the Javadoc
does not show all methods. I really do not want to change the project's
source code. Well... this is not important for this list!

If anyone has another idea about how to implement semantic indexing in
Lucene, I would be grateful!

[]s,
--
Diego


On Mon, May 23, 2011 at 21:30, Yiannis Gkoufas  wrote:

> It's not my blog! :D
> I used some of the ideas in that article
>
> http://sujitpal.blogspot.com/2009/03/vector-space-classifier-using-lucene.html
> in
> order to perform classification with lucene for my tasks.
> You can get full access to the source code of the project by typing in the
> command line:
>
> svn checkout *http*://
> semanticvectors.googlecode.com/svn/trunk/semanticvectors-read-only
>
> Or you can access the trunk directly by the url
> http://semanticvectors.googlecode.com/svn/trunk/
>
> On Tue, May 24, 2011 at 3:22 AM, Diego Cavalcanti <
> di...@diegocavalcanti.com
> > wrote:
>
> > Hi Yiannis,
> >
> > Thank your for your reply.
> >
> > Yes, I'm referring to project Semantic Vectors. Before sending the
> previous
> > email, I read the project API and noticed that its most classes don't
> > contain public methods, so that we cannot use the project
> programmatically
> > (only by command line).
> >
> > I've seen your blog, but I haven't found any post about semantic indexing
> > in
> > Lucene. Can you point that for me, please?
> >
> > Thanks,
> > --
> > Diego
> >
> >
> > On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas 
> > wrote:
> >
> > > Hi Diego,
> > >
> > > Are you referring to that project-->
> > > http://code.google.com/p/semanticvectors/ ?
> > > If yes , then documentation exists here
> > >
> >
> http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html
> > .
> > > Also I think this blog might interest you -->
> > > http://sujitpal.blogspot.com/ and
> > > the project related to it ---> http://jtmt.sf.net/
> > >
> > > BR,
> > > Yiannis
> > >
> > > On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti <
> > > di...@diegocavalcanti.com
> > > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I have a project which indexes and scores documents using Lucene.
> > > However,
> > > > I'd like to do that using semantic indexing (LSI, LSA or Semantic
> > > Vectors).
> > > >
> > > > I've read old posts and some people said that Semantic Vectors plays
> > well
> > > > with Lucene. However, I noticed that its classes are used only by
> > command
> > > > line (throw method main) instead of by API.
> > > >
> > > > So, I'd like to know if anyone can suggest any other approach so that
> I
> > > > could use semantic indexing in Lucene.
> > > >
> > > > Thanks,
> > > > Diego
> > > >
> > >
> >
>


Re: Semantic indexing in Lucene

2011-05-23 Thread Yiannis Gkoufas
It's not my blog! :D
I used some of the ideas in that article
http://sujitpal.blogspot.com/2009/03/vector-space-classifier-using-lucene.html
in
order to perform classification with lucene for my tasks.
You can get full access to the source code of the project by typing in the
command line:

svn checkout 
*http*://semanticvectors.googlecode.com/svn/trunk/semanticvectors-read-only

Or you can access the trunk directly by the url
http://semanticvectors.googlecode.com/svn/trunk/

On Tue, May 24, 2011 at 3:22 AM, Diego Cavalcanti  wrote:

> Hi Yiannis,
>
> Thank your for your reply.
>
> Yes, I'm referring to project Semantic Vectors. Before sending the previous
> email, I read the project API and noticed that its most classes don't
> contain public methods, so that we cannot use the project programmatically
> (only by command line).
>
> I've seen your blog, but I haven't found any post about semantic indexing
> in
> Lucene. Can you point that for me, please?
>
> Thanks,
> --
> Diego
>
>
> On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas 
> wrote:
>
> > Hi Diego,
> >
> > Are you referring to that project-->
> > http://code.google.com/p/semanticvectors/ ?
> > If yes , then documentation exists here
> >
> http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html
> .
> > Also I think this blog might interest you -->
> > http://sujitpal.blogspot.com/ and
> > the project related to it ---> http://jtmt.sf.net/
> >
> > BR,
> > Yiannis
> >
> > On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti <
> > di...@diegocavalcanti.com
> > > wrote:
> >
> > > Hello,
> > >
> > > I have a project which indexes and scores documents using Lucene.
> > However,
> > > I'd like to do that using semantic indexing (LSI, LSA or Semantic
> > Vectors).
> > >
> > > I've read old posts and some people said that Semantic Vectors plays
> well
> > > with Lucene. However, I noticed that its classes are used only by
> command
> > > line (throw method main) instead of by API.
> > >
> > > So, I'd like to know if anyone can suggest any other approach so that I
> > > could use semantic indexing in Lucene.
> > >
> > > Thanks,
> > > Diego
> > >
> >
>


Re: Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
Hi Yiannis,

Thank your for your reply.

Yes, I'm referring to project Semantic Vectors. Before sending the previous
email, I read the project API and noticed that its most classes don't
contain public methods, so that we cannot use the project programmatically
(only by command line).

I've seen your blog, but I haven't found any post about semantic indexing in
Lucene. Can you point that for me, please?

Thanks,
--
Diego


On Mon, May 23, 2011 at 21:17, Yiannis Gkoufas  wrote:

> Hi Diego,
>
> Are you referring to that project-->
> http://code.google.com/p/semanticvectors/ ?
> If yes , then documentation exists here
> http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html.
> Also I think this blog might interest you -->
> http://sujitpal.blogspot.com/ and
> the project related to it ---> http://jtmt.sf.net/
>
> BR,
> Yiannis
>
> On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti <
> di...@diegocavalcanti.com
> > wrote:
>
> > Hello,
> >
> > I have a project which indexes and scores documents using Lucene.
> However,
> > I'd like to do that using semantic indexing (LSI, LSA or Semantic
> Vectors).
> >
> > I've read old posts and some people said that Semantic Vectors plays well
> > with Lucene. However, I noticed that its classes are used only by command
> > line (throw method main) instead of by API.
> >
> > So, I'd like to know if anyone can suggest any other approach so that I
> > could use semantic indexing in Lucene.
> >
> > Thanks,
> > Diego
> >
>


Re: Semantic indexing in Lucene

2011-05-23 Thread Yiannis Gkoufas
Hi Diego,

Are you referring to that project-->
http://code.google.com/p/semanticvectors/ ?
If yes , then documentation exists here
http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/index.html .
Also I think this blog might interest you --> http://sujitpal.blogspot.com/ and
the project related to it ---> http://jtmt.sf.net/

BR,
Yiannis

On Tue, May 24, 2011 at 3:09 AM, Diego Cavalcanti  wrote:

> Hello,
>
> I have a project which indexes and scores documents using Lucene. However,
> I'd like to do that using semantic indexing (LSI, LSA or Semantic Vectors).
>
> I've read old posts and some people said that Semantic Vectors plays well
> with Lucene. However, I noticed that its classes are used only by command
> line (throw method main) instead of by API.
>
> So, I'd like to know if anyone can suggest any other approach so that I
> could use semantic indexing in Lucene.
>
> Thanks,
> Diego
>


Semantic indexing in Lucene

2011-05-23 Thread Diego Cavalcanti
Hello,

I have a project which indexes and scores documents using Lucene. However,
I'd like to do that using semantic indexing (LSI, LSA or Semantic Vectors).

I've read old posts and some people said that Semantic Vectors plays well
with Lucene. However, I noticed that its classes are used only by command
line (throw method main) instead of by API.

So, I'd like to know if anyone can suggest any other approach so that I
could use semantic indexing in Lucene.

Thanks,
Diego


FastVectorHighlighter - can FieldFragList expose fragInfo?

2011-05-23 Thread Sujit Pal
Hello,

My version: Lucene 3.1.0

I've had to customize the snippet for highlighting based on our
application requirements. Specifically, instead of the snippet being a
set of relevant fragments in the text, I need it to be the first
sentence where a match occurs, with a fixed size from the beginning of
the sentence.

For this, I built (in my application code, using Lucene jars) a custom
FragmentsBuilder (subclassing SimpleFragmentBuilder and overriding the
createFragment(IndexReader reader, int docId, String fieldName,
FieldFragList fieldFragList). 

However, the FieldFragList does not allow access to the
List member variable. I changed this locally to be
public so my subclass can access it, ie:

public List fragInfos = new
ArrayList();

Once this is done, my createFragment method can get at the fragInfos
from the passed in fieldFragList, iterate through its
WeightedFragInfo.SubInfo.Toffs to get the term offsets, which I then use
to calculate and highlight my snippet (I can provide the code if it
makes things clearer, but thats the gist).

So my question is - would it be feasible to make the
FieldFragList.fragInfos variable public in a future release?

If not, is there some other way that I should do what I need to do?

Thanks very much,
Sujit



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: QueryParser/StopAnalyzer question

2011-05-23 Thread Mindaugas Žakšauskas
Hi Erick,

I think answer to this question depends which hat you put on.

If you put search engine hat (or do similar things in, i.e. Google),
the results will be the same as what Lucene does at the moment. And
that's fair enough - getting more results in search engine world is
almost always better than getting less. Even if a bunch of slightly
irrelevant results is returned, nobody cares.

But if you put a database hat, the world view suddenly changes. I am
sure there are plenty of people who use Lucene in situations where
they need exact matches and any excess results are not desirable.

The root of the evil here is coming from the fact that stopwords are
not indexed and reasonable defaults have to be assumed in different
situations. Thinking of this, to return all data for stopword-only
query would probably be least expected and I don't disagree on your
argument about the mixed case, too.

This probably leaves me with a single option which is not to use
stopwords at all, allowing me to get the best of the both worlds. Does
anyone have any experience on how much of increased index size
(roughly) can I expect?

Regards,
Mindaugas

On Mon, May 23, 2011 at 3:13 PM, Erick Erickson  wrote:
> Hmmm, somehow I missed this days ago
>
> Anyway, the Lucene query parsing process isn't quite Boolean logic.
> I encourage you to think in terms of "required", "optional", and
> "prohibited".
>
> Both queries are equivalent, to see this try attaching &debugQuery=on
> to your URL and look at the "parsed query" in the debug info
>
> Anyway, to your qestion.
> +foo:bar +baz:"there is"
>
> reads that "bar" must appear in the field "foo". So far so good.
> But it's also required that baz contain the empty clause, which
> is different than saying baz must be empty. One can argue that
> any field contains, by definition, nothing.
>
> But imagine the impact of what you're requesting. If all stop words
> get removed, then no query would ever match yours. Which
> would be very counter-intuitive IMO. Your users have no clue
> that you've removed stopwords, so they'll sit there saying "Look, I
> KNOW that "bar" was in foo and I KNOW that "there is" was in
> baz, why the heck didn't this cursed system find my doc?
>
> Anyway, I don't think you really want this behavior in the
> stopword removal case. If you can post some use-cases where this
> would be desirable, maybe we can noodle about a solution
>
> Best
> Erick
>
>
> 2011/5/23 Mindaugas Žakšauskas :
>> Not much luck so far :(
>>
>> Just in case if anyone wants to earn some virtual dosh, I have added
>> some 50 bonus points to this question on StackOverflow:
>>
>> http://stackoverflow.com/questions/6H044061/lucene-query-parsing-behaviour-joining-query-parts-with-and
>>
>> I also promise to post a solution here if anything satisfactory turns up.
>>
>> m.
>>
>> 2011/5/17 Mindaugas Žakšauskas :
>>> Hi,
>>>
>>> Let's say we have an index having few documents indexed using
>>> StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries:
>>> 1) foo:bar
>>> 2) baz:"there is"
>>>
>>> Let's assume that the first query yields some results because there
>>> are documents matching that query.
>>>
>>> The second query contains two stopwords ("there" and "is") and yields
>>> 0 results. The reason for this is because when baz:"there is" is
>>> parsed, it ends up as a void query as both "there" and "is" are
>>> stopwords (technically speaking, this is converted to an empty
>>> BooleanQuery having no clauses). So far so good.
>>>
>>> However, any of the following combined queries
>>>
>>> +foo:bar +baz:"there is"
>>> foo:bar AND baz:"there is"
>>>
>>> behave exactly the same way as query +foo:bar, that is, brings back
>>> some results. The second AND part which is supposed to yield no
>>> results is completely ignored.
>>>
>>> One might argue that when ANDing both conditions have to be met, that
>>> is, documents having foo=bar and baz being empty have to be retrieved,
>>> as when issued seperately, baz:"there is" yields 0 results.
>>>
>>> It seem contradictory as an atomic query component has different
>>> impact on the overall query depending on the context. Is there any
>>> logical explanation for this? Can this be addressed in any way,
>>> preferably without writing own QueryAnalyzer?
>>>
>>> If this makes any difference, observed behaviour happens under Lucene 
>>> v3.0.2.
>>>
>>> Regards,
>>> Mindaugas
>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional 

Re: QueryParser/StopAnalyzer question

2011-05-23 Thread Erick Erickson
Hmmm, somehow I missed this days ago

Anyway, the Lucene query parsing process isn't quite Boolean logic.
I encourage you to think in terms of "required", "optional", and
"prohibited".

Both queries are equivalent, to see this try attaching &debugQuery=on
to your URL and look at the "parsed query" in the debug info

Anyway, to your qestion.
+foo:bar +baz:"there is"

reads that "bar" must appear in the field "foo". So far so good.
But it's also required that baz contain the empty clause, which
is different than saying baz must be empty. One can argue that
any field contains, by definition, nothing.

But imagine the impact of what you're requesting. If all stop words
get removed, then no query would ever match yours. Which
would be very counter-intuitive IMO. Your users have no clue
that you've removed stopwords, so they'll sit there saying "Look, I
KNOW that "bar" was in foo and I KNOW that "there is" was in
baz, why the heck didn't this cursed system find my doc?

Anyway, I don't think you really want this behavior in the
stopword removal case. If you can post some use-cases where this
would be desirable, maybe we can noodle about a solution

Best
Erick


2011/5/23 Mindaugas Žakšauskas :
> Not much luck so far :(
>
> Just in case if anyone wants to earn some virtual dosh, I have added
> some 50 bonus points to this question on StackOverflow:
>
> http://stackoverflow.com/questions/6H044061/lucene-query-parsing-behaviour-joining-query-parts-with-and
>
> I also promise to post a solution here if anything satisfactory turns up.
>
> m.
>
> 2011/5/17 Mindaugas Žakšauskas :
>> Hi,
>>
>> Let's say we have an index having few documents indexed using
>> StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries:
>> 1) foo:bar
>> 2) baz:"there is"
>>
>> Let's assume that the first query yields some results because there
>> are documents matching that query.
>>
>> The second query contains two stopwords ("there" and "is") and yields
>> 0 results. The reason for this is because when baz:"there is" is
>> parsed, it ends up as a void query as both "there" and "is" are
>> stopwords (technically speaking, this is converted to an empty
>> BooleanQuery having no clauses). So far so good.
>>
>> However, any of the following combined queries
>>
>> +foo:bar +baz:"there is"
>> foo:bar AND baz:"there is"
>>
>> behave exactly the same way as query +foo:bar, that is, brings back
>> some results. The second AND part which is supposed to yield no
>> results is completely ignored.
>>
>> One might argue that when ANDing both conditions have to be met, that
>> is, documents having foo=bar and baz being empty have to be retrieved,
>> as when issued seperately, baz:"there is" yields 0 results.
>>
>> It seem contradictory as an atomic query component has different
>> impact on the overall query depending on the context. Is there any
>> logical explanation for this? Can this be addressed in any way,
>> preferably without writing own QueryAnalyzer?
>>
>> If this makes any difference, observed behaviour happens under Lucene v3.0.2.
>>
>> Regards,
>> Mindaugas
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: # search in Query

2011-05-23 Thread Ian Lea
Are you sure that it isn't working?  If you use the same analyzer at
both indexing and query time you should end up with consistent
results.

Read up on exactly what your analyzer is doing by looking at the javadocs.

Google will find you lots of info on analysis, or get hold of a copy
of Lucene In Action 2nd edition to learn all about lucene.  And use
Luke to see what is being indexed.


--
Ian.


On Mon, May 23, 2011 at 12:44 PM, Yogesh Dabhi  wrote:
>
>
> I have some bellow value in lucene index field
>
>
>
> 1#abcd
>
> 2#test wer
>
> 3# testing rty
>
>
>
> I wright the query like bellow
>
>
>
> +fieldname:1#
>
>
>
> After  query parser I see query string become
>
> +fieldname:1
>
>
>
> is there a way to search given string
>
>
>
>
> Thanks & Regards
>
> Yogesh
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: FastVectorHighlighter StringIndexOutofBounds bug

2011-05-23 Thread Koji Sekiguchi
(11/05/23 14:36), Weiwei Wang wrote:
> 1. source string: 7
> 2. WhitespaceTokenizer + EGramTokenFilter
> 3. FastVectorHighlighter,
> 4. debug info:  subInfos=(777((8,11))777((5,8))777((2,5)))/3.0(2,102),
> srcIndex is not correctly computed for the second loop of the outer for-loop
> 

How does your query look like?
And what is EGramTokenFilter? Is it NGramTokenFilter?
If so, what are min and max gram sizes?
Note that FVH has a restriction - min and max should equal.
(i.e. min=1 and max=3 cannot be supported by FVH)

koji
-- 
http://www.rondhuit.com/en/

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



# search in Query

2011-05-23 Thread Yogesh Dabhi
 

I have some bellow value in lucene index field

 

1#abcd

2#test wer

3# testing rty

 

I wright the query like bellow 

 

+fieldname:1#

 

After  query parser I see query string become 

+fieldname:1

 

is there a way to search given string 

 


Thanks & Regards 

Yogesh



Re: QueryParser/StopAnalyzer question

2011-05-23 Thread Mindaugas Žakšauskas
Not much luck so far :(

Just in case if anyone wants to earn some virtual dosh, I have added
some 50 bonus points to this question on StackOverflow:

http://stackoverflow.com/questions/6044061/lucene-query-parsing-behaviour-joining-query-parts-with-and

I also promise to post a solution here if anything satisfactory turns up.

m.

2011/5/17 Mindaugas Žakšauskas :
> Hi,
>
> Let's say we have an index having few documents indexed using
> StopAnalyzer.ENGLISH_STOP_WORDS_SET. The user issues two queries:
> 1) foo:bar
> 2) baz:"there is"
>
> Let's assume that the first query yields some results because there
> are documents matching that query.
>
> The second query contains two stopwords ("there" and "is") and yields
> 0 results. The reason for this is because when baz:"there is" is
> parsed, it ends up as a void query as both "there" and "is" are
> stopwords (technically speaking, this is converted to an empty
> BooleanQuery having no clauses). So far so good.
>
> However, any of the following combined queries
>
> +foo:bar +baz:"there is"
> foo:bar AND baz:"there is"
>
> behave exactly the same way as query +foo:bar, that is, brings back
> some results. The second AND part which is supposed to yield no
> results is completely ignored.
>
> One might argue that when ANDing both conditions have to be met, that
> is, documents having foo=bar and baz being empty have to be retrieved,
> as when issued seperately, baz:"there is" yields 0 results.
>
> It seem contradictory as an atomic query component has different
> impact on the overall query depending on the context. Is there any
> logical explanation for this? Can this be addressed in any way,
> preferably without writing own QueryAnalyzer?
>
> If this makes any difference, observed behaviour happens under Lucene v3.0.2.
>
> Regards,
> Mindaugas
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: stop the search

2011-05-23 Thread liat oren
Thanks a lot.

I tried to debug a long query and see when it gets to the collector.

I thought it will be better to catch the "stop" action in the search itself
and not the top doc collector as I would assume the search action will take
long time to finish and once we get to the top doc collector, it will return
immediately (I take only the top 100 results)

I saw that it gets there after a long time - it first "gets stuck" on a wait
function.

I use MultiSearcher - any idea why that happens?

Many Thanks,
Liat

On 23 May 2011 02:48, Simon Willnauer wrote:

> The simplest way would be a CollectorDelegate that wraps an existing
> collector and checks a boolean before calling the delegates collect
> method.
>
> simon
>
> On Mon, May 23, 2011 at 8:09 AM, liat oren  wrote:
> > Thank you very much.
> >
> > So the best solution would be to implement the collector with a stop
> > function.
> > Do you happen to have an example for that?
> >
> > Many thanks,
> > Liat
> >
> > On 22 May 2011 13:19, Simon Willnauer 
> > wrote:
> >>
> >> On Sun, May 22, 2011 at 4:48 PM, Devon H. O'Dell  >
> >> wrote:
> >> > I have my own collector, but implemented this functionality by running
> >> > the search in a thread pool and terminating the FutureTask running the
> >> > job if it took longer than some configurable amount of time. That
> >> > seemed to do the trick for me. (In my case, the IndexReader is
> >> > explicitly opened readonly, so I'm not too worried about it).
> >>
> >> This can be super dangerous if you use Future. cancel() ie.
> >> Thread.interrupt(). If the interrupt is called while you are reading
> >> from a NIO FileDescriptor the channel will be closed and Lucene can
> >> not recover from that state if the file has already been merged away.
> >> Your Reader will get ChannelAlreadyClosed exceptions for any
> >> subsequent access. You should prevent this.
> >> see FSDirectory Javadoc
> >>
> >>
> http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/store/FSDirectory.html
> >>
> >> simon
> >> >
> >> > --dho
> >> >
> >> > 2011/5/22 Simon Willnauer :
> >> >> you can impl. you own collector and notify the collector to stop if
> you
> >> >> need to.
> >> >> simon
> >> >>
> >> >> On Sun, May 22, 2011 at 12:06 PM, liat oren 
> >> >> wrote:
> >> >>> Hi Everyone,
> >> >>>
> >> >>> Is there a way to stop a multi search in the middle?
> >> >>>
> >> >>> Thanks a lot,
> >> >>> Liat
> >> >>>
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> >
>