Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-09 Thread Aditya Varun Chadha
Thank you!

On Sun 7. Aug 2022 at 07:57, Tomoko Uchida 
wrote:

> Understood. I added your account to the mapping:
>
> https://github.com/apache/lucene-jira-archive/commit/bbaf23ddc329470346546f2acac5b80eda503aca
> If you open issues or make comments in Jira before the issue migration,
> your contribution will be associated with your GH account.
>
> Tomoko
>
>
> 2022年8月7日(日) 14:41 Aditya Varun Chadha :
>
> > Thanks Tomoko,
> > There is no activity in JIRA from me as far as I can recall. This is the
> > correct and only account though.
> >
> > On Sun 7. Aug 2022 at 05:50, Tomoko Uchida  >
> > wrote:
> >
> > > Hi Aditya,
> > > I found a Jira user "adichad" but this account has no activities in
> > Lucene
> > > Jira. See:
> > > https://issues.apache.org/jira/secure/ViewProfile.jspa?name=adichad
> > >
> > > I wonder if you have multiple Jira accounts and you use another account
> > for
> > > Lucene? For example, there is a Jira user "abakle
> > > <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=abakle>"
> -
> > > this has activities in LUCENE.
> > >
> > > Tomoko
> > >
> > >
> > > 2022年8月7日(日) 5:56 Aditya Varun Chadha :
> > >
> > > > JIRA: adichad
> > > > GitHub: adichad
> > > >
> > > > Thank you!
> > > >
> > > > On Sat 6. Aug 2022 at 20:37, Glen Newton 
> > wrote:
> > > >
> > > > > jira: gnewton
> > > > > github: gnewton  (github.com/gnewton)
> > > > >
> > > > > Thanks,
> > > > > Glen
> > > > >
> > > > >
> > > > >
> > > > > On Sat, 6 Aug 2022 at 14:11, Tomoko Uchida <
> > > tomoko.uchida.1...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi everyone.
> > > > > >
> > > > > > I wanted to let you know that we'll extend the deadline until the
> > > date
> > > > > the
> > > > > > migration is started (the date is not fixed yet).
> > > > > > Please let us know your Jira/Github usernames if you don't see
> > > > mapping(s)
> > > > > > for your account in this file:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified
> > > > > >
> > > > > > Tomoko
> > > > > >
> > > > > >
> > > > > > 2022年8月7日(日) 1:36 Baris Kazar :
> > > > > >
> > > > > > > Thank You Thank You
> > > > > > > Best regards
> > > > > > > 
> > > > > > > From: Michael McCandless 
> > > > > > > Sent: Saturday, August 6, 2022 11:29:25 AM
> > > > > > > To: Baris Kazar 
> > > > > > > Cc: java-user@lucene.apache.org 
> > > > > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub
> > account
> > > > ids
> > > > > > > before Thursday August 4 midnight (in your local time)
> > > > > > >
> > > > > > > OK done:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKR6GBFgnw$
> > > > > > > >
> > > > > > >
> > > > > > > Mike McCandless
> > > > > > >
> > > > > > > http://blog.mikemccandless.com<
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKQULWvYcw$
> > &

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Aditya Varun Chadha
Thanks Tomoko,
There is no activity in JIRA from me as far as I can recall. This is the
correct and only account though.

On Sun 7. Aug 2022 at 05:50, Tomoko Uchida 
wrote:

> Hi Aditya,
> I found a Jira user "adichad" but this account has no activities in Lucene
> Jira. See:
> https://issues.apache.org/jira/secure/ViewProfile.jspa?name=adichad
>
> I wonder if you have multiple Jira accounts and you use another account for
> Lucene? For example, there is a Jira user "abakle
> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=abakle>" -
> this has activities in LUCENE.
>
> Tomoko
>
>
> 2022年8月7日(日) 5:56 Aditya Varun Chadha :
>
> > JIRA: adichad
> > GitHub: adichad
> >
> > Thank you!
> >
> > On Sat 6. Aug 2022 at 20:37, Glen Newton  wrote:
> >
> > > jira: gnewton
> > > github: gnewton  (github.com/gnewton)
> > >
> > > Thanks,
> > > Glen
> > >
> > >
> > >
> > > On Sat, 6 Aug 2022 at 14:11, Tomoko Uchida <
> tomoko.uchida.1...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi everyone.
> > > >
> > > > I wanted to let you know that we'll extend the deadline until the
> date
> > > the
> > > > migration is started (the date is not fixed yet).
> > > > Please let us know your Jira/Github usernames if you don't see
> > mapping(s)
> > > > for your account in this file:
> > > >
> > > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified
> > > >
> > > > Tomoko
> > > >
> > > >
> > > > 2022年8月7日(日) 1:36 Baris Kazar :
> > > >
> > > > > Thank You Thank You
> > > > > Best regards
> > > > > 
> > > > > From: Michael McCandless 
> > > > > Sent: Saturday, August 6, 2022 11:29:25 AM
> > > > > To: Baris Kazar 
> > > > > Cc: java-user@lucene.apache.org 
> > > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account
> > ids
> > > > > before Thursday August 4 midnight (in your local time)
> > > > >
> > > > > OK done:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1
> > > > > <
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/commit/13fa4cb46a1a6d609448240e4f66c263da8b3fd1__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKR6GBFgnw$
> > > > > >
> > > > >
> > > > > Mike McCandless
> > > > >
> > > > > http://blog.mikemccandless.com<
> > > > >
> > > >
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!OJffdSKrjdfY7VYGcAVGsx4rKHPICvgac4eOcXOf1fnT7u9fJ2RSu9toYPgowHx72UC33Ixg1s1BLKQULWvYcw$
> > > > > >
> > > > >
> > > > >
> > > > > On Sat, Aug 6, 2022 at 10:29 AM Baris Kazar <
> baris.ka...@oracle.com
> > > > > <mailto:baris.ka...@oracle.com>> wrote:
> > > > > I think so.
> > > > > Best regards
> > > > > 
> > > > > From: Michael McCandless  > > > > luc...@mikemccandless.com>>
> > > > > Sent: Saturday, August 6, 2022 10:12 AM
> > > > > To: java-user@lucene.apache.org<mailto:java-user@lucene.apache.org
> >
> > <
> > > > > java-user@lucene.apache.org<mailto:java-user@lucene.apache.org>>
> > > > > Cc: Baris Kazar  > baris.ka...@oracle.com
> > > >>
> > > > > Subject: Re: [HELP] Link your Apache Lucene Jira and GitHub account
> > ids
> > > > > before Thursday August 4 midnight (in your local time)
> > > > >
> > > > > Thanks Baris,
> > > > >
> > > > > And your Jira ID is bkazar right?
> > > > >
> > > > > Mike
> > > > >
> > > > > On Sat, Aug 6, 2022 at 10:05 AM Baris Kazar <
> baris.ka...@oracle.com
> > > > > <mailto:baris.ka...@oracle.com>> wrote:
> > > > > My github username is bmk

Re: [HELP] Link your Apache Lucene Jira and GitHub account ids before Thursday August 4 midnight (in your local time)

2022-08-06 Thread Aditya Varun Chadha
aName,GitHubAccount,JiraDispName
> > > > shahrs87, shahrs87, Rushabh Shah
> > > >
> > > > Thank you Tomoko and Mike for all of your hard work.
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jul 31, 2022 at 3:08 AM Michael McCandless <
> > > > luc...@mikemccandless.com<mailto:luc...@mikemccandless.com>> wrote:
> > > >
> > > >> Hello Lucene users, contributors and developers,
> > > >>
> > > >> If you have used Lucene's Jira and you have a GitHub account as
> well,
> > > >> please check whether your user id mapping is in this file:
> > > >>
> > >
> >
> https://urldefense.com/v3/__https://github.com/apache/lucene-jira-archive/blob/main/migration/mappings-data/account-map.csv.20220722.verified__;!!ACWV5N9M2RV99hQ!KNwyR7RuqeuKpyzEemagEZzGRGtdqjpE-OWaDfjjyZVHJ-zgsGLyYJhZ7ZWJCI1NrWR6H4DYdMbB8nLjA_KarQ$
> > > >>
> > > >> If not, please reply to this email and we will try to add you.
> > > >>
> > > >> Please forward this email to anyone you know might be impacted and
> who
> > > >> might not be tracking the Lucene lists.
> > > >>
> > > >>
> > > >> Full details:
> > > >>
> > > >> The Lucene project will soon migrate from Jira to GitHub for issue
> > > >> tracking.
> > > >>
> > > >> There have been discussions, votes, a migration tool created /
> > iterated
> > > >> (thanks to Tomoko Uchida's incredibly hard work), all iterating on
> > > Lucene's
> > > >> dev list.
> > > >>
> > > >> When we run the migration, we would like to map Jira users to the
> > right
> > > >> GitHub users to properly @-mention the right person and make it
> easier
> > > for
> > > >> you to find issues you have engaged with.
> > > >>
> > > >> Mike McCandless
> > > >>
> > > >>
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!KNwyR7RuqeuKpyzEemagEZzGRGtdqjpE-OWaDfjjyZVHJ-zgsGLyYJhZ7ZWJCI1NrWR6H4DYdMbB8nIyHPa_wA$
> > > >>
> > > > --
> > > Mike McCandless
> > >
> > >
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!KNwyR7RuqeuKpyzEemagEZzGRGtdqjpE-OWaDfjjyZVHJ-zgsGLyYJhZ7ZWJCI1NrWR6H4DYdMbB8nIyHPa_wA$
> > > --
> > > Mike McCandless
> > >
> > > http://blog.mikemccandless.com<
> > >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!ACWV5N9M2RV99hQ!JIy9w3Oyvgxri_lPKzszX-rCz4T17oAvHWxs3gLwaxWQ3Ah7toRiMqu3hYT0YP-UnxPR1mSnuaqAoGbejVCNsw$
> > > >
> > >
> >
>
-- 
Aditya


Re: How to ignore a match if a given keyword is before/after another given keyword?

2021-04-14 Thread Aditya Varun Chadha
maybe you want (abstractly):

bool(must(term("f", "positive"), mustNot(phrase("f", "negative positive",
slop=1)))

On Thu, Apr 15, 2021 at 7:27 AM Jean Morissette 
wrote:

> Hi all,
>
> Does someone know if it's possible to search documents containing a given
> keyword only if this keyword is not followed or preceded or another given
> keyword?
>
> Thanks,
> Jean
>


-- 
Aditya


Re: Storing Json field in Lucene

2020-04-21 Thread Aditya Varun Chadha
during indexing, you can add the json string to a stored-only field (not
indexed, not doc-values) to each document.

at query time you can then retrieve the json field's value only for the top
K results. this field should not be used for matching or scoring.

the point is that if you do ever want to Lucene for its strengths
(text/multidimensional indexing and search), you should extract those
values from your json document (like you extract Type and id, i guess) and
_also_ add them as separate Fields with indexing/doc-values enabled,
depending on the use-cases for that field.

On Wed, Apr 22, 2020 at 7:01 AM ganesh m 
wrote:

> Hi
> I am currently storing indexed field and stored field in separate
> database. In stored field database, Document Id, Type and Json string of
> metadata will be stored. Basically i am using it as key-value pair
> database. For every document to be indexed, we have three different
> metadata structure to be stored. That is the reason, we have Document Id
> and Type, so that we can query and retrieve stored field based on type. We
> have to depend on Lucene as we don't have any other database to store data.
>
> Is it good idea to store complete Json as string to Lucene DB. If we store
> as separate fields then we have around 30 fields. There will be 30 seeks to
> get complete stored fields. If we store it as Json then it is a one seek to
> retrieve the data. Since it is Json, field name and its value will be
> stored for every record and it may bloat index size.
>
> Could you guide me what is the better approach. To store as Json or as
> individual fields.
>
> RegardsGanesh
>


-- 
Aditya Varun Chadha | http://www.adichad.net | +49 (0) 152 25914008 (M)


Re: Use custom score in ConstantScoreQuery

2019-12-09 Thread Aditya Varun Chadha
By wrapping the constant score query inside a BoostQuery?

That’s how elasticsearch handles boosts on arbitrary queries, for example.

On Mon 9. Dec 2019 at 10:42, Stamatis Zampetakis  wrote:

> Thanks for you reply Adrien!
> Can you clarify what is the second way?
> At the moment I haven't found a way (apart from creating my own Query
> classes) to say that a query will always return a score of 0.5 for each
> document.
>
> On Mon, Dec 9, 2019 at 8:16 AM Adrien Grand  wrote:
>
> > Hi Stamatis,
> >
> > I personally like the current way things work. If we added the ability
> > to set a custom score on ConstantScoreQuery, then we'd end up with two
> > ways to do the same thing, which I like to avoid whenever possible.
> >
> > On Sun, Dec 8, 2019 at 10:07 PM Stamatis Zampetakis 
> > wrote:
> > >
> > > Small reminder. Any input on this?
> > >
> > > Thanks,
> > > Stamatis
> > >
> > > On Mon, Dec 2, 2019 at 12:10 PM Stamatis Zampetakis  >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Currently ConstantScoreQuery [1] returns a constant score equal to 1
> > for
> > > > every document that matches the query.
> > > >
> > > > I would like to use the ConstantScoreQuery but with a different score
> > > > value that I can pass explicitly (via the constructor for instance).
> > > >
> > > > This change may also benefit some other parts of Lucene where a
> > > > ConstantScoreQuery is wrapped around a BoostQuery simply for
> returning
> > a
> > > > score of zero [2][3].
> > > >
> > > > Does this change make sense? Shall I create a JIRA for it?
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> >
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/ConstantScoreQuery.java
> > > > [2]
> > > >
> >
> https://github.com/apache/lucene-solr/blob/1d238c844e45f088a942aec14750c186c7a66d92/lucene/core/src/java/org/apache/lucene/search/BooleanQuery.java#L253
> > > > [3]
> > > >
> >
> https://github.com/apache/lucene-solr/blob/1d238c844e45f088a942aec14750c186c7a66d92/lucene/core/src/java/org/apache/lucene/search/BoostQuery.java#L97
> > > >
> >
> >
> >
> > --
> > Adrien
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
-- 
Aditya Varun Chadha | http://www.adichad.com | +49 (0) 152 25914008 (M)


FST Util's - Will TopNSearcher work for non-weighted FST with CharSequence Ouputs as weight.

2015-09-26 Thread Aditya Tripathi
Hi,

Question:
Looking at the code I got slightly confused if TopNSearcher would work for
non weighted CharSequence Output FST. I am trying to use a scale up model
and accommodate many tenants on one machine and hence was not planning to
use Pair output. It would have been great if the path output
could be considered as cost and TopNSearcher could use cost of the whole
Path while decding NO_OUTPUT arc. However it goes in infinte loop for the
code snippet given below.

Background: (X of XY problem)
I build an FST with input from one index field and output as
concatenation of two other stored fields.

I wanted to get suggestions from this FST using TopNSearcher search method.
As an experimental code I just wanted to see if shortestPaths would work on
this FST with CharSequence outputs as the cost.

It does not work. Goes in infinte loop.

I think (Haven't digged much though and not at all sure) the problem lies
in the fact that while finding minimum weight arc (no weight here - weight
is output) - the old path is not considered and only the current arc is
compared for NO_OUTPUT. And then it keeps copying this NO_OUTPUT arc back
into the current arc later. This spins it into an infinte loop.

In TopNSearcher search() method, the following line. Line 464 in
org/apache/lucene/util/fst/Util.java
if (comparator.compare(NO_OUTPUT, path.arc.output) == 0)

And then copying the NO_OUTPUT arc back to the current arc spoils the fun
here in line:490
path.arc.copyFrom

(scratchArc

);

The sample code to reproduce this along with Sysout's for seeing how the
FST is formed is given below.

 (If I use PositiveIntOutputs it works fine. Commented lines.)



public static void main(String[] args) throws IOException {

String inputValues[] = {"aafish4","abcat","abcmonkey6" , "abcdog",
"abcdogs"};
long outputValues[] = {14,5, 16, 7, 12};
CharsRef[] outputValuesString = {new CharsRef("pqrfish4"),new
CharsRef("pqcat"),new CharsRef("pqsmonkey6") ,new CharsRef( "pqrsdog"), new
CharsRef("pqrdogs")};

PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
CharSequenceOutputs outputsO = CharSequenceOutputs.getSingleton();
//Builder builder = new Builder(INPUT_TYPE.BYTE1,
outputs);
Builder builder = new Builder(INPUT_TYPE.BYTE4,
outputsO);
BytesRef scratchBytes = new BytesRef();

IntsRef scratchInts = new IntsRef();

for (int i = 0; i < inputValues.length; i++) {
  scratchBytes.copyChars(inputValues[i]);
  //builder.add(Util.toIntsRef(scratchBytes, scratchInts),
outputValues[i]);
  builder.add(Util.toIntsRef(scratchBytes, scratchInts),
outputValuesString[i]);

}
//FST fst = builder.finish();
FST fst = builder.finish();
Arc arc;
//Arc arc;

//Arc firstArc = fst.getFirstArc(new Arc());
Arc firstArc = fst.getFirstArc(new Arc());
arc = firstArc;
System.out.println("firstArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

BytesReader reader = fst.getBytesReader();

Arc firstTargetArc = fst.readFirstTargetArc(firstArc, new
Arc(), reader );
//Arc firstTargetArc = fst.readFirstTargetArc(firstArc, new
Arc(), reader );
arc = firstTargetArc;
System.out.println("firstTargetArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

//Arc lastTargetArc = fst.readLastTargetArc(firstArc, new
Arc(), reader );
Arc lastTargetArc = fst.readLastTargetArc(firstArc, new
Arc(), reader );
arc = lastTargetArc;
System.out.println("lastTargetArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

//Arc nextArc = fst.readNextArc(firstTargetArc, reader);
Arc nextArc = fst.readNextArc(firstTargetArc, reader);
arc = nextArc;
System.out.println("nextArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);


System.out.println("");
int a=0;
arc = firstTargetArc;
while(true) {
try {

System.out.println("nextArc: " +arc + "  isLastArch:"+
arc.isLast()+"   isFinal:"+arc.isFinal()+"  label:"+arc.label+"
 output:"+arc.output + "  target:"+arc.target);

Re: Migrating lucene index to Elastic Search

2014-09-26 Thread Aditya
Hi Akshay

It is better to post the question in Elastic Search group.

If you have the data, it is better to direclty create index from Elastic
Search.

Regards
Aditya
www.findbestopensource.com



On Fri, Sep 26, 2014 at 12:34 PM, akshay.jain akshay.j...@orkash.com
wrote:

 Hi,

 Is there any way to migrate a Lucene index to Elastic Search? Read
 somewhere that it is possible through a Java app which would read the
 documents from the Lucene index and then write it to the ES cluster. Is
 that possible and if yes, how? Still a newbie with all this :p

 Thanks in advance,
 Akshay Jain

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Is it wrong to create index writer on each query request.

2014-06-05 Thread Aditya
Hi Rajendra

You should NOT create index writer for every request.

Whether it is time consuming to update index writer when new document
will come.
No.

Regards
Aditya
www.findbestopensource.com



On Thu, Jun 5, 2014 at 12:24 PM, Rajendra Rao rajendra@launchship.com
wrote:

 I have system in which documents and Query comes  frequently  .I am
 creating index writer in memory every time for each query I request . I
 want to know Is it good to separate Index Writing and loading  and Query
 request ?  Whether It is good to save index writer on hard disk .Whether it
 is time consuming to update index writer when new document will come.



Re: How to approach indexing source code?

2014-06-05 Thread Aditya
Just keep it simple. Index the entire source file. One source file is one
document. While indexing preserve dot (.), Hypen(-) and other special
characters. You could use whitespace analyzer.

I hope it helps

Regards
Aditya
www.findbestopensource.com


On Wed, Jun 4, 2014 at 3:29 PM, Johan Tibell johan.tib...@gmail.com wrote:

 The the majority of queries will be look-ups of functions/types by fully
 qualified name. For example, the query [Data.Map.insert] will find the
 definition and all uses of the `insert` function defined in the `Data.Map`
 module. The corpus is all Haskell open source code on hackage.haskell.org.

 Being able to support qualified name queries is the main benefit of
 indexing the output of the compiler (which has resolved unqualified names
 to qualified names) rather than using a simple text-based indexing.

 There are three levels of name qualification I want to support in queries:

  * Unqualified: myFunction
  * Module qualified: MyModule.myFunction
  * Package and module qualified: mypackage-MyModule.myFunction

 I expect the middle one to be used the most. The last form is sometimes
 needed for disambiguation and the first is nice to support as a shorthand
 when the function name is unlikely to be ambiguous.

 For scoring I'd like to have a couple of attributes available. The most
 important one is whether a term represents a use site or a definition site.
 This would allow the definition of a function to appear as the first search
 result.

 Is this precise enough? Naturally the scope will grow over time, but this
 is the core of what I'm trying to do.

 -- Johan


 On Wed, Jun 4, 2014 at 8:02 AM, Aditya findbestopensou...@gmail.com
 wrote:

  Hi Johan,
 
  How you want to search, What is your search requirement and according to
  that you need to index. You could check duckduckgo or github code search.
 
  The easiest approach would be to have a parser which will read each
 source
  file and indexes as a single document. When you search, you will have a
  single search field which will search the index and retrieves the result.
  The search field accepts any text in the source file. It could be
 function
  name, class name, comments or variables etc.
 
  Another approach is to have different search fields for Functions,
 Classes,
  Package etc.  You need to parse the file, identify comments, function
 name,
  class name etc and index it in a separate field.
 
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
 
 
  On Wed, Jun 4, 2014 at 7:02 AM, Johan Tibell johan.tib...@gmail.com
  wrote:
 
   Hi,
  
   I'd like to index (Haskell) source code. I've run the source code
  through a
   compiler (GHC) to get rich information about each token (its type,
 fully
   qualified name, etc) that I want to index (and later use when ranking).
  
   I'm wondering how to approach indexing source code. I can see two
  possible
   approaches:
  
* Create a file containing all the metadata and write a custom
   tokenizer/analyzer that processes the file. The file could use a simple
   line-based format:
  
   myFunction,1:12-1:22,my-package,defined-here,more-metadata
   myFunction,5:11-5:21,my-package,used-here,more-metadata
   ...
  
   The tokenizer would use CharTermAttribute to write the function name,
   OffsetAttribute to write the source span, etc.
  
* Use and IndexWriter to create a Document directly, as done here:
  
  
 
 http://www.onjava.com/pub/a/onjava/2006/01/18/using-lucene-to-search-java-source.html?page=3
  
   I'm new to Lucene so I can't quite tell which approach is more likely
 to
   work well. Which way would you recommend?
  
   Other things I'd like to do that might influence the answer:
  
- Index several tokens at the same position, so I can index both the
  fully
   qualified name (e.g. module.myFunction) and unqualified name (e.g.
   myFunction) for a term.
  
   -- Johan
  
 



Re: How to approach indexing source code?

2014-06-05 Thread Aditya
It is up to your requirement. You could either index source file or
compiler output. Try doing some proof of concept. You will get some idea of
how to move forward.

Regards
Aditya
www.findbestopensource.com




On Thu, Jun 5, 2014 at 2:48 PM, Johan Tibell johan.tib...@gmail.com wrote:

 By index the entire source file do you mean don't index the compiler
 output? If so, that doesn't sound very appealing as it loses most of the
 benefit of having a search engine built for searching source code.


 On Thu, Jun 5, 2014 at 11:11 AM, Aditya findbestopensou...@gmail.com
 wrote:

  Just keep it simple. Index the entire source file. One source file is one
  document. While indexing preserve dot (.), Hypen(-) and other special
  characters. You could use whitespace analyzer.
 
  I hope it helps
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
  On Wed, Jun 4, 2014 at 3:29 PM, Johan Tibell johan.tib...@gmail.com
  wrote:
 
   The the majority of queries will be look-ups of functions/types by
 fully
   qualified name. For example, the query [Data.Map.insert] will find the
   definition and all uses of the `insert` function defined in the
  `Data.Map`
   module. The corpus is all Haskell open source code on
  hackage.haskell.org.
  
   Being able to support qualified name queries is the main benefit of
   indexing the output of the compiler (which has resolved unqualified
 names
   to qualified names) rather than using a simple text-based indexing.
  
   There are three levels of name qualification I want to support in
  queries:
  
* Unqualified: myFunction
* Module qualified: MyModule.myFunction
* Package and module qualified: mypackage-MyModule.myFunction
  
   I expect the middle one to be used the most. The last form is sometimes
   needed for disambiguation and the first is nice to support as a
 shorthand
   when the function name is unlikely to be ambiguous.
  
   For scoring I'd like to have a couple of attributes available. The most
   important one is whether a term represents a use site or a definition
  site.
   This would allow the definition of a function to appear as the first
  search
   result.
  
   Is this precise enough? Naturally the scope will grow over time, but
 this
   is the core of what I'm trying to do.
  
   -- Johan
  
  
   On Wed, Jun 4, 2014 at 8:02 AM, Aditya findbestopensou...@gmail.com
   wrote:
  
Hi Johan,
   
How you want to search, What is your search requirement and according
  to
that you need to index. You could check duckduckgo or github code
  search.
   
The easiest approach would be to have a parser which will read each
   source
file and indexes as a single document. When you search, you will
 have a
single search field which will search the index and retrieves the
  result.
The search field accepts any text in the source file. It could be
   function
name, class name, comments or variables etc.
   
Another approach is to have different search fields for Functions,
   Classes,
Package etc.  You need to parse the file, identify comments, function
   name,
class name etc and index it in a separate field.
   
   
Regards
Aditya
www.findbestopensource.com
   
   
   
   
On Wed, Jun 4, 2014 at 7:02 AM, Johan Tibell johan.tib...@gmail.com
 
wrote:
   
 Hi,

 I'd like to index (Haskell) source code. I've run the source code
through a
 compiler (GHC) to get rich information about each token (its type,
   fully
 qualified name, etc) that I want to index (and later use when
  ranking).

 I'm wondering how to approach indexing source code. I can see two
possible
 approaches:

  * Create a file containing all the metadata and write a custom
 tokenizer/analyzer that processes the file. The file could use a
  simple
 line-based format:

 myFunction,1:12-1:22,my-package,defined-here,more-metadata
 myFunction,5:11-5:21,my-package,used-here,more-metadata
 ...

 The tokenizer would use CharTermAttribute to write the function
 name,
 OffsetAttribute to write the source span, etc.

  * Use and IndexWriter to create a Document directly, as done here:


   
  
 
 http://www.onjava.com/pub/a/onjava/2006/01/18/using-lucene-to-search-java-source.html?page=3

 I'm new to Lucene so I can't quite tell which approach is more
 likely
   to
 work well. Which way would you recommend?

 Other things I'd like to do that might influence the answer:

  - Index several tokens at the same position, so I can index both
 the
fully
 qualified name (e.g. module.myFunction) and unqualified name (e.g.
 myFunction) for a term.

 -- Johan

   
  
 



Re: How to approach indexing source code?

2014-06-04 Thread Aditya
Hi Johan,

How you want to search, What is your search requirement and according to
that you need to index. You could check duckduckgo or github code search.

The easiest approach would be to have a parser which will read each source
file and indexes as a single document. When you search, you will have a
single search field which will search the index and retrieves the result.
The search field accepts any text in the source file. It could be function
name, class name, comments or variables etc.

Another approach is to have different search fields for Functions, Classes,
Package etc.  You need to parse the file, identify comments, function name,
class name etc and index it in a separate field.


Regards
Aditya
www.findbestopensource.com




On Wed, Jun 4, 2014 at 7:02 AM, Johan Tibell johan.tib...@gmail.com wrote:

 Hi,

 I'd like to index (Haskell) source code. I've run the source code through a
 compiler (GHC) to get rich information about each token (its type, fully
 qualified name, etc) that I want to index (and later use when ranking).

 I'm wondering how to approach indexing source code. I can see two possible
 approaches:

  * Create a file containing all the metadata and write a custom
 tokenizer/analyzer that processes the file. The file could use a simple
 line-based format:

 myFunction,1:12-1:22,my-package,defined-here,more-metadata
 myFunction,5:11-5:21,my-package,used-here,more-metadata
 ...

 The tokenizer would use CharTermAttribute to write the function name,
 OffsetAttribute to write the source span, etc.

  * Use and IndexWriter to create a Document directly, as done here:

 http://www.onjava.com/pub/a/onjava/2006/01/18/using-lucene-to-search-java-source.html?page=3

 I'm new to Lucene so I can't quite tell which approach is more likely to
 work well. Which way would you recommend?

 Other things I'd like to do that might influence the answer:

  - Index several tokens at the same position, so I can index both the fully
 qualified name (e.g. module.myFunction) and unqualified name (e.g.
 myFunction) for a term.

 -- Johan



Re: Associated values for a field and its value

2013-10-03 Thread Aditya
Hi

You need to expand the field as below. Store the document and its
associated values as one document.

Document  Field-A   Stored-Field
D1   a1 1,2
D2   a2  3,10

Other alternative approach is to store these fields external to Lucene, may
be in database or key-value-store and fetch it on demand.

Regards
Aditya
www.findbestopensource.com -- we have collection of more than 1 million
open source projects



On Thu, Oct 3, 2013 at 4:42 AM, Alice Wong airwayw...@gmail.com wrote:

 Hello,

 We would like to index some documents. Each field of a document may have
 multiple values. And for each (field,value) pair there are some associated
 values. These associated values are just for retrieving, not searching.

 For example, a document D could have a field named A. This field has two
 values a1 and a2.

 It is easy to index D, adding term a1 and a2 to field A, so either query
 A=a1 or A=a2 will return D.

 Assuming we have other values associated with (A,a1) and (A,a2) for D. We
 would like to retrieve these associated values depending on whether A=a1
 or A=a2 is queried.

 For example, if query A=a1 returns D, we would like to return values 1
 and 2. And if query A=a2 returns D, we want to return values 3 and 10.

 Is it possible to do this with Lucene? Initially we want to hack postings
 to return associated values, but this seems quite complex.

 Thanks!



Re: Lucene Concurrent Search

2013-09-05 Thread Aditya
Hi

You want to use REST service for your search, then my advice would be to
use Solr. As it has buitl-in functionality of REST API.

If you want to use Lucene then below are my comments:
1. In do search function, you are creating reader object. If this call is
invoked for every query then it would be very expensive. You need to create
it once globally and re opon it, if the index is modified. Its better use
SearchManager.

Regards
Aditya
www.findbestopensource.com - Search from 1 Million open source projects.



On Thu, Sep 5, 2013 at 6:46 AM, David Miranda david.b.mira...@gmail.comwrote:

 Hi,

 I'm developing a web application, that contains a REST service in the
 Tomcat, that receives several requests per second.
 The REST requests do research in a Lucene index, to do this i use the
 IndexSearch.

 My questions are:
 - There are concurrency problems in multiple research?
 - What the best design pattern to do this?

 public class IndexResearch(){
private static int MAX_HITS = 500;
private static String DIRECTORY = indexdir;
private IndexSearcher searcher;
private StandardAnalyzer analyzer;
 



public IndexResearch(){
}
public String doSearch(String text){
   analyzer = new StandardAnalyzer(Version.LUCENE_43);
   topic = QueryParser.escape(topic);
   Query q = new QueryParser(Version.LUCENE_43, field, analyzer
  ).parse(text);
   File indexDirectory = new File(DIRECTORY);
   IndexReader reader;
   reader = DirectoryReader.open(FSDirectory.open(indexDirectory));
   searcher = new IndexSearcher(reader);
 
 /*more code*/

 }
  }


 Can I create, in the servlet, one object of this class per client request
 (Is that the best design pattern)?

 Thanks in advance.



Re: Removing Indexed field data.

2012-10-22 Thread Aditya
You need to modify / re-index the document. You cannot delete a particular
field. Re-index / update the document with blank / NULL value for the field.

1. Retrieve the document,
2. Set Blank value for the particular field
3. Update the document

Regards
Aditya
www.findbestopensource.com


On Mon, Oct 22, 2012 at 12:27 PM, sagar.gole8 sagar.go...@gmail.com wrote:

 Hi,
 I want to remove the data from indexed field, not documents that containing
 that data.
 i.e.
 Suppose I have a field person containing some person names and I want to
 remove some un-named data from that person field.

 e.g.(O/p from luke file)

 No.RankField Text
 1   466 Person   Mahatma Gandhi
 2   080 Person   Zokgh
 3   069 Person   PARBHANI

 See the image for details:
 http://lucene.472066.n3.nabble.com/file/n4015054/Solr-Index.png

 From above data I want remove the entry for Zokgh value from Person
 field.

 Please anyone can suggest?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Removing-Indexed-field-data-tp4015054.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: short search terms

2012-09-27 Thread Aditya
Hi

You are searching with 3 characters but the items actually indexed has 4
characters. Use Luke and analyze the index.

If searching for ABC has to be matched with ABCD then you need to do a
wildcard search. Add * at the end of the search query (ABC*).

Regards
Aditya
www.findbestopensource.com


On Thu, Sep 27, 2012 at 3:05 AM, Edward W. Rouse ero...@comsquared.comwrote:

 I have an index and one of the items to search for is an identifier that
 will always be 3 characters, like ABC or XYZ. If I do a search for ABC I
 get
 no matches. If I add 1 more character so that ABC becomes ABCD and search
 for ABCD, it matches. I have been looking through the code (I inherited and
 the original coder is no longer with the company) to see if there is any
 place where he might have put a limitation in, but testing indicates that
 it
 is creating a query. Some code below:

 QueryParser parser = new QueryParser(Version.LUCENE_34,
 TaskRecord.BaseFields.PUBLIC_DEFAULT_FIELD.getName(), getAnalyzer());
 parser.setDefaultOperator(Operator.AND);
 Query query = parser.parse(qstring);

 qstring is the search text and getAnalyser returns a StandardAnalyzer.

 The Query is then used to search using the following code:

   public ListLong search(Query query) throws IOException
   {
 IndexReader reader = null;
 try
 {
   reader = IndexReader.open(getRoot(), true);
   IndexSearcher searcher = new IndexSearcher(reader);

   // Do the search with an artificial limit of 32 results
   TopDocs hits = searcher.search(query, 32);

   // If the search actually has more hits, then run it again with
 correct max
   if(hits.totalHits  32)
   {
 if(log.isDebugEnabled())
 {
   log.debug(Rerunning query with max size of  + hits.totalHits +
 
  + query);
 }

 hits = searcher.search(query, hits.totalHits);
   }

   // Create task ID list and return
   if(hits.totalHits  1)
   {
 if(log.isDebugEnabled())
   log.debug(Query has no hits  + query);

 return Collections.emptyList();
   }
   else
   {
 if(log.isDebugEnabled())
   log.debug(Query has  + hits.totalHits +  hits  + query);

 ListLong taskIds = new ArrayListLong(hits.totalHits);
 for(ScoreDoc doc: hits.scoreDocs)
 {
   taskIds.add(Long.valueOf(searcher.doc(doc.doc).get(task)));
 }
 return taskIds;
   }
 }
 finally
 {
   try
   {
 if(reader != null)
   reader.close();
   }
   catch(IOException e)
   {
   }
 }
   }


  -Original Message-
  From: Chris Hostetter [mailto:hossman_luc...@fucit.org]
  Sent: Wednesday, September 26, 2012 5:18 PM
  To: java-user@lucene.apache.org
  Subject: Re: short search terms
 
 
  : I have a key field that will only ever have a length of 3 characters.
  I am
  : using a StandardAnalyzer and a QueryParser to create the Query
  : (parser.parse(string)), and an IndexReader and IndexSearcher to
  execute the
  : query (searcher(query)). I can't seem to find a setter to allow for a
  3
  : character search string. There is one setMinWordLen, but it isn't
  applicable
 
  there's a lot of missing information here ... what do you mean allow
  for
  a 3 character search string .. the query parser doesn't have anything
  in
  it that would prevent a 3 (or 3, or 1) character search string, so i
  suspect that's not really the question you mean to ask.
 
  what is problem you are actaully seeing?  do you have a query that
  isn't
  matching the docs you think it should? what query? what docs? what does
  the code look like?
 
  can you explain more what this 3 character ifeld represents, and how
  you
  want to use it?
 
  https://people.apache.org/~hossman/#xyproblem
  Your question appears to be an XY Problem ... that is: you are
  dealing
  with X, you are assuming Y will help you, and you are asking about
  Y
  without giving more details about the X so that we can understand the
  full issue.  Perhaps the best solution doesn't involve Y at all?
  See Also: http://www.perlmonks.org/index.pl?node_id=542341
 
  -Hoss
 
  -
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Find documents contained in search term

2012-08-20 Thread Aditya
Hi

You need to use prefix query for your requirement. Below are my thoughts
and hope it helps.

Say Hello World is your phrase.

1. Do a phrase query with your phrase (Hello World)
2. If not found then strip the last character and then do prefix query
(Hello Worl)
3. Continue step 2 still you get the result or the pharse is empty.

If you give examples of some sample documents in the index and search
phrase then it will help others to give better response.

Regards
Aditya
www.findbestopensource.com - Search from more than 200,000 open source
projects.


On Fri, Aug 17, 2012 at 9:25 PM, davidbrai davidb...@gmail.com wrote:

 I was hoping I didn't have to iterate through the short documents.
 I have about ~1M of them currently and this process needs to be very fast.
 So I understand there is not such functionality available in lucene.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Find-documents-contained-in-search-term-tp4001663p4001867.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Getting terms from unstored fields, doc-wise

2012-07-27 Thread Aditya
Hi

If the data is not stored then it cannot be retrieved in the same format.
Using IndexReader as you listed you could retrieve the list of the terms
available in the doc. It may be analyzed. You may not be getting exact data.

Regards
Aditya
www.findbestopensource.com

On Fri, Jul 27, 2012 at 1:34 AM, Phanindra R phani...@gmail.com wrote:

 Thanks for the reply Abdul.

 I was exploring the API and I think we can retrieve all those words by
 using a brute-force approach.

 1) Get all the terms using indexReader.terms()

 2) Process the term only if it belongs to the target field.

 3) Get all the docs using indexReader.termDocs(term);

 4) So, we have the term-doc pairs at this point.

 Is there any better approach other than the above forever-taking procedure?

 Thanks,
 Phanindra



 On Thu, Jul 26, 2012 at 11:46 AM, in.abdul in.ab...@gmail.com wrote:

  No , it's not possible to get the data which not stored ..
  On Jul 26, 2012 10:27 PM, Phanindra R [via Lucene]
  ml-node+s472066n3997487h23@n3.nabble
  
   Hi,
I've an index to analyze (manually). Unfortunately, I cannot
 rebuild
   the index. Some of the fields are 'unstored'. I was wondering whether
   there's any way to get the terms from an unstored field for each doc.
   Positional information is not necessary. Lucene version is 3.5.
  
   The reason am trying to get those terms is that I can add that field to
  my
   own index for every doc. And, yes, there's another id-type-field which
   allows me to recognize the document in both indices.
  
   Any guidance is highly appeciated.
  
   Thanks,
   Phani
  
  
   --
If you reply to this email, your message will be added to the
 discussion
   below:
  
  
 
 http://lucene.472066.n3.nabble.com/Getting-terms-from-unstored-fields-doc-wise-tp3997487.html
To unsubscribe from Lucene, click here
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472066code=aW4uYWJkdWxAZ21haWwuY29tfDQ3MjA2NnwxMDczOTUyNDEw
  
   .
   NAML
 
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  
  
 
 
 
 
  -
  THANKS AND REGARDS,
  SYED ABDUL KATHER
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Getting-terms-from-unstored-fields-doc-wise-tp3997487p3997510.html
  Sent from the Lucene - Java Users mailing list archive at Nabble.com.



Re: Lucene vs SQL.

2012-07-27 Thread Aditya
Check out these articles on this topic. Hope it helps.
http://www.findbestopensource.com/article-detail/lucene-solr-as-nosql-db
http://www.lucidimagination.com/blog/2010/04/30/nosql-lucene-and-solr/

In nutshell, It is good to use Lucene as NoSQL but better have your data
stored in some persistent store like file system or a database.

Regards
Aditya
www.findbestopensource.com


On Thu, Jul 26, 2012 at 10:45 PM, Hank Williams hank...@gmail.com wrote:

 If I want to set up a database that is totally flat with no joins, is
 there any reason not to use lucene. The reasons I would be curious about
 are things like insert performance and whether there are any queries that
 either don't work in lucene or perform better in MySQL/postgres.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Auto commit when flush

2012-06-28 Thread Aditya
Hi Ram,

I guess IndexWriter.SetMaxBufferedDocs will help...

Regards
Aditya
www.findbestopensource.com


On Wed, Jun 27, 2012 at 11:25 AM, Ramprakash Ramamoorthy 
youngestachie...@gmail.com wrote:

 Dear,

I am using Lucene for my log search tool. Is there a way I can
 automatically perform a commit operation on my IndexWriter when a
 particular set of docs is flushed from memory to the disk. My RamBufferSize
 is 24Mb and MergeFactor is 10.

Or is calling commit in manually calculated frequent intervals
 irrespective of the flushes the only way? I wish the autocommit  feature
 was not deprecated.

 --
 With Thanks and Regards,
 Ramprakash Ramamoorthy,
 Engineer Trainee,
 Zoho Corporation.
 +91 9626975420



Re: ORM for Android + Lucene

2012-05-28 Thread Aditya
Hello GuenterR,

I guess you might be using Java in Android. You could very well use Lucene
Api. There is no ORM exist.

Regards
Aditya
www.findbestopensource.com

On Sun, May 27, 2012 at 2:26 PM, GuenterR gunt...@gmail.com wrote:

 Hello All!
 I would like to use Lucene library on any Android device with a geo DB in
 sqlite file. As ORM I want to use any ORM tool except Hibernate Search -
 this tool is perfect for PC, but too heavy for Android. Could you advice me
 any example of this usage of some ORM tool?  All examples that I have
 already found so far use HS :-(

 Thanks in advance.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/ORM-for-Android-Lucene-tp3986286.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Performance of storing data in Lucene vs other (No)SQL Databases

2012-05-23 Thread Aditya
Agreed.

Here the discussion is whether Lucene could be considered for storing data?
Whether Lucene could be used as NoSQL?  The Answer is YES.

Regards
Aditya
www.findbestopensource.com

On Tue, May 22, 2012 at 2:12 PM, Konstantyn Smirnov inject...@yahoo.comwrote:

 simple

 what is the speed of indexing of document with stored fields? what is the
 retrieval rate? how good can it scale? How good performs the MongoDB and
 other within the same discipline?

 Has anyone conducted such comparison-tests? To dump like 1 mio documents
 into the index (with the single indexed field) and into the mongo, and then
 read random 10k docs? what the performance would be?

 The bare idea to split the index from the storage is good, but without
 performance figures is not sufficient.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Performance-of-storing-data-in-Lucene-vs-other-No-SQL-Databases-tp3984704p3985331.html
 Sent from the Lucene - Java Users mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Accepted: Free Webinar - Apache Lucene 2.9: Technical Overview of New Features

2009-09-19 Thread Aditya
BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook 12.0 MIMEDIR//EN
VERSION:2.0
METHOD:REPLY
X-MS-OLK-FORCEINSPECTOROPEN:TRUE
BEGIN:VEVENT
ATTENDEE;PARTSTAT=ACCEPTED:mailto:aditya.kulka...@gmail.com
CLASS:PUBLIC
CREATED:20090919T073908Z
DTEND:20090924T19Z
DTSTAMP:20090919T073908Z
DTSTART:20090924T18Z
LAST-MODIFIED:20090919T073908Z
PRIORITY:5
SEQUENCE:0
SUMMARY:Accepted: Free Webinar - Apache Lucene 2.9: Technical Overview of N
	ew Features
TRANSP:OPAQUE
UID:E3143EC4F95E2C65852576350079297D-Lotus_Notes_Generated
X-MICROSOFT-CDO-BUSYSTATUS:BUSY
X-MICROSOFT-CDO-IMPORTANCE:1
X-MS-OLK-AUTOFILLLOCATION:TRUE
X-MS-OLK-CONFTYPE:0
END:VEVENT
END:VCALENDAR


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Using IN to retrieve data after lucene search.

2009-07-08 Thread Aditya R

Hi all,

I am new to lucene. In my sample application I have used lucene to index my
17 field db table. I have stored only the primary key of the table in lucene
index and indexed other 16 fields, without storing them. The primary keys of
the searched keyword is then retrieved. The primary key string is then
queried in the database like this 
 'String quer=from Doctors where id IN +primaryKeys;'  , where
'primaryKeys' will be something like this (23,32,44,56).
Is this the right way to use lucene? Or do you suggest me to store all the
fields in the lucene index and retrieve them.

Thanks,
Aditya
-- 
View this message in context: 
http://www.nabble.com/Using-IN-to-retrieve-data-after-lucene-search.-tp24404198p24404198.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: how to get the word before and the word after the matched Term?

2009-05-18 Thread Aditya
Continuing to what Matt said, answer to your question: there is no direct
library to give this.
Also try sandbox based highlight related code base.

Best Regards,
Aditya


-Original Message-
From: Matthew Hall [mailto:mh...@informatics.jax.org] 
Sent: Monday, May 18, 2009 6:58 PM
To: java-user@lucene.apache.org
Subject: Re: how to get the word before and the word after the matched Term?

Well, when you get the Document object, you have access to the fields in 
that document, including the text that was searched against.

You could simply retrieve this string, and then use simple java String 
manipulation to get what you want.

Matt

Kamal Najib wrote:
 Hi all,
 I want to  get the word before and the word after  the matched Term.For
Example if i have the Text  The drug was freshly prepared at 4-hour
intervals . Eleven courses were administered to seven patients at this dose
level and no patient experienced nausea or vomiting and the matched Term
for example patient i want to get the word level and the word
experienced(and and no are stop words, therefore i d'ont want to get
them.).I have looked at the Class Termposition but in this Class i can only
get the position of the matched Term, how can i get the word before and
after it, any suggestion?. 
 Thank you in advance.
 Kamal
   
 


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: Lucene index on iPhone

2009-05-06 Thread Aditya
Hi,

You can try Clucene @ http://sourceforge.net/projects/clucene/ based on
older version of Java Lucene but should be okay.
I was able to port it to symbian and Windows Mobile with some efforts.

Best Regards,
Aditya


-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Wednesday, May 06, 2009 7:16 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene index on iPhone

http://www.lucidimagination.com/search/?q=Objective+C+port+of+Lucene   
suggests there is an Objective C port.  Maybe that works?  I haven't  
done any iPhone dev.

On May 6, 2009, at 5:06 AM, Paul Libbrecht wrote:

 Shashi,

 the only java I know for iphone is with Cydia on jailbroken iphones.
 Is this the type of things you're looking at?

 paul


 Le 06-mai-09 à 12:08, Shashi Kant a écrit :
 I am working on an iPhone application where the Lucene index needs to
 reside on-device (for multiple reasons). Has anyone figured out a way
 to do that?
 As you might know the iPhone contains SQLite - could an index be
 embedded inside SQLite? or could it be resident separately as a file?




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



REPOST from another list: Question related to improving search results

2009-05-02 Thread Aditya
Hi,

 

New to this group.

 

Question:

 

Generally sites like wikipeadia have a template and every page follows it.
These templates contains the word that occurs in every page. 

 

For example wikipedia template has the list of language in the left panel.
Now these words gets indexed every time since they are not (cannot be) stop
words. 

if user for example search for Galego, every wikipedia page will be in the
search result which is wrong as every wikipedia page does not talk about
Galego 

 

Any takes on this one for how to solve this problem?

 

Best Regards,

Aditya

 



Using Lucene to index Meta-data from txt, html, PDF etc files.

2006-09-14 Thread Aditya Gollakota
Hi Guys,

 

Just wondering how you would go about indexing meta-data from files. I've
used the demo package IndexHTMLjava and have updated the HTMLDocument.java
with the following:

 

DataInput input = new DataInputStream(new BufferedInputStream(new
FileInputStream(f)));

Content content = Content.read(input);

Reader contentReader = new ArrayFile.Reader(new LocalFileSystem(null),new
File(f.getPath(), Content.DIR_NAME).toString(), null);



System.out.println(content);

ParseData parseData = ParseData.read(input);

Metadata metadata = parseData.getContentMeta();

 

doc.add(new Field(keywords, metadata.KEYWORDS, Field.Store.YES,
Field.Index.NO));

 

I'm using the nutch-0.8.jar for the Metadata Class and have used the jars of
nutch to resolve any exceptions and also Lucene-2.0.0

 

While compiling this code, I'm getting the following error:

 

A record version mismatch occurred. Expecting v1, found v118.

 

Any help would be much appreciated.

 

Regards,

 

Aditya Gollakota
Support Engineer | CustomWare Asia Pacific | www.customware.net
T: +61 2 9900 5742 | F: +61 2 9475 0100 | M: +61 405 033 951
E: [EMAIL PROTECTED]

 



RE: search pdf

2006-04-17 Thread Aditya Liviandi

Please take a moment to learn java and how to use java APIs.

After that, re-read the emails you just sent us, and answer your own
question.

-Original Message-
From: Shajahan [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 17, 2006 2:22 PM
To: java-user@lucene.apache.org
Subject: Re: search pdf


Hi,
thankyou for your replay.
i am very sorry for asking again, but i am new to this Lucene. please
tell
me how to run this code. i downloaded this LuceneInAction zip file. and
i
didnot find any readme file for instructions. and i am also downloaded
the
lucene-1.4.3 also.

so please tell me how to run this code.

thanking you,
Shajahan
--
View this message in context:
http://www.nabble.com/search-pdf-t1457831.html#a3946467
Sent from the Lucene - Java Users forum at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 Institute For Infocomm Research - Disclaimer -
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Hi Experts

2006-03-29 Thread Aditya Liviandi
Well you'll have to index the internet.
Then when you've done that then you can try going against google.
Oh, and you'll have to update that index every now and then to keep your
index of the internet updated.

Good luck.


--- I²R Disclaimer 
--
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.
-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Hi Experts

2006-03-28 Thread Aditya Liviandi
The way lucene works is you need to have the index first.
Only then you can search it.

So if you want to search within a given URL, you need to somehow create
the index of all the webpages within that URL. If the webserver linked
to that URL is also yours, then that would not be a big deal.

But if it is an external URL, then you would need to have a crawler
(which basically collects all the linked documents in the URL). However
you will not be able to get all the documents in the URL (those that are
not linked by any other document, will not be reached by the crawler,
unless you manually supply the URL of that document to the crawler,
otherwise I don't see how you can figure out the existence of that
document.).


--- I²R Disclaimer 
--
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.
-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: serializable RAMDirectory

2006-03-20 Thread Aditya Liviandi
Because I'm embedding the index inside another file...
So that file is self-contained, containing both the payload (which might
not be text) and the index...

But I figured out how to do it already... I just made RAMDirectory and
RAMFile Serializable and create my own build of lucene...

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, March 20, 2006 7:21 PM
To: java-user@lucene.apache.org
Subject: Re: serializable RAMDirectory


On Mar 20, 2006, at 1:05 AM, Aditya Liviandi wrote:

 Is there any implementation of lucene that allows the index to be
 portable? It seems pointless that I have to do the indexing 
 operation to
 a directory with FSDirectory, and then copy the directory over to the
 portable file, and unpack the file whenever I want to search the
 directory at another place...

Could you be more specific about what you want that Lucene does not 
already provide?

FSDirectory is essentially a serialized RAMDirectory.  What do you 
mean by unpack the file?  There is nothing special needed to move 
an index from one machine to another, simply copy the entire 
directory and use your searching code to refer to its location.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--- I²R Disclaimer 
--
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.
-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



serializable RAMDirectory

2006-03-19 Thread Aditya Liviandi
Is there any implementation of lucene that allows the index to be
portable? It seems pointless that I have to do the indexing operation to
a directory with FSDirectory, and then copy the directory over to the
portable file, and unpack the file whenever I want to search the
directory at another place...

Can anyone help me?


--- I²R Disclaimer 
--
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.
-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



question...

2006-03-13 Thread Aditya Liviandi
---BeginMessage---








Hi all,



If I want to embed the index files into another file (say of
extension *.luc, so now all the index files are flattened inside this new file),
can I still use the index without having to extract out the index files to a
temp folder?



aditya






---End Message---
--- I²R Disclaimer 
--
This email is confidential and may be privileged.  If you are not the intended 
recipient, please delete it and notify us immediately. Please do not copy or 
use it for any purpose, or disclose its contents to any other person. Thank you.
-
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]