Query did not return results

2009-04-24 Thread blazingwolf7

Hi,

I created a query that will find a match inside documents. Example of text
match "terror india"
And documents with this exact match does exists.

My query generated is like this: (title:"terror india"^4 content:"terror
india"^3 site:"terror india")
But why does it not return any results?
can anyone help me with this? Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/Query-did-not-return-results-tp23227963p23227963.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Query did not return results

2009-04-24 Thread blazingwolf7

I am using the standard analyzer. 

This problem only happen when I set the query to BooleanClause.Occur.SHOULD
instead of BooleanClause.Occur.MUST while creating the query

John Wang wrote:
> 
> What analyzers are you using for both query and indexing?Can you also post
> some code on you indexed?
> 
> -John
> 
> On Fri, Apr 24, 2009 at 8:02 PM, blazingwolf7
> wrote:
> 
>>
>> Hi,
>>
>> I created a query that will find a match inside documents. Example of
>> text
>> match "terror india"
>> And documents with this exact match does exists.
>>
>> My query generated is like this: (title:"terror india"^4 content:"terror
>> india"^3 site:"terror india")
>> But why does it not return any results?
>> can anyone help me with this? Thanks in advance
>> --
>> View this message in context:
>> http://www.nabble.com/Query-did-not-return-results-tp23227963p23227963.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Query-did-not-return-results-tp23227963p23229158.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Why Lucene phrase searching fail?

2009-04-27 Thread blazingwolf7

hi,

I am trying to perform a search using Lucene. The keyword : "national india"
This phrase exists inside the content. I try searching it using Lucene and
it fail to return any results. Then I try to search it using Luke, with the
quotes and it also fail to return results.

Why is that happening? Can anyone advise me on this?
-- 
View this message in context: 
http://www.nabble.com/Why-Lucene-phrase-searching-fail--tp23253549p23253549.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Why Lucene phrase searching fail?

2009-04-27 Thread blazingwolf7

When i print out the query, it will be like this:
(url:"terror india"^2.0 anchor:"terror india"^0.0 content:"terror india"
title:"terror india"^1.5 host:"terror india"^2.0 site:"terror india"^10.0)

I dont understand at all, only phrase query got problem, even my sloop has
no problem at all. I have exact match in my content, so it should return at
least 1 document
-- 
View this message in context: 
http://www.nabble.com/Why-Lucene-phrase-searching-fail--tp23253549p23268494.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Generating Query for Multiple Clauses in a Single Field

2009-07-28 Thread blazingwolf7

Hi,

I am currently creating a search engine and will need to generate a query
like the following:
title:(+chemistry +"national curriculum")

its mention that it can be done using the QueryParser but unfortunately I
can't find any reference in how to used it. Can anyone help me with this?

Thanks
-- 
View this message in context: 
http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24694748.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query for Multiple Clauses in a Single Field

2009-07-29 Thread blazingwolf7

I am trying to create a query, that first will return a set of results, then
it will give a boost to the results that have all the keyword entered by the
user.


Ahmet Arslan wrote:
> 
> 
>> generate a query like the following:
>> title:(+chemistry +"national curriculum")
> 
> I didn't understand what exactly you are asking but the query string is
> already well-formatted. You can pass this string directly to the parse
> method of QueryParser. The following four examples yields the same Query
> object.
> 
> String[] ar = {"title:(+chemistry +\"national curriculum\")"};
> org.apache.lucene.queryParser.QueryParser.main(ar);
> 
> String[] ar1 = {"title:(chemistry AND \"national curriculum\")"};
> org.apache.lucene.queryParser.QueryParser.main(ar1);
> 
> QueryParser qp = new QueryParser("title", new StandardAnalyzer());
> Query q = qp.parse("chemistry AND \"national curriculum\"");
> System.out.println(q.toString());
> 
> qp.setDefaultOperator(QueryParser.AND_OPERATOR);
> q = qp.parse("chemistry \"national curriculum\"");
> System.out.println(q.toString());
> 
>> its mention that it can be done using the QueryParser but
>> unfortunately I can't find any reference in how to used it. 
> 
> http://lucene.apache.org/java/2_4_1/queryparsersyntax.html
> Just prepare a String according to descriptions in here, and pass it to
> the parse method of QueryParser.
> 
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24733084.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread blazingwolf7

yah, before this i used default lucene...but i dont know what end up
wrong...some results with only single word matching when to the top of the
results. 

This i assumed is due to the score of the result being to high. Tat's why i
am trying to add additional boost


Ahmet Arslan wrote:
> 
> 
> : I am trying to create a query, that first will return a set
> : of results, then
> : it will give a boost to the results that have all the
> : keyword entered by the user.
> 
> If I understand you correctly: User will enter multiple keywords. Lets say
> a b c d. And you want documents - that contains/have all of the keywords
> (a b c d) - get higher scores (boosted). In other words if there are some
> documents in the collection that have all (a b c d), you want to see them
> at the top of the result set. And result set may contain/retrieve
> documents that have one or two of the keywords at the end of list. Am i
> correct?
> 
> If that's you want, you don't need to do anything special. Lucene does it
> by default. Use default operator OR. The more query terms appears in a
> document, the more relevant that document is to the query.
> 
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24734379.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Generating Query for Multiple Clauses in a Single Field

2009-07-30 Thread blazingwolf7

Thanks a lotit is truly cause by the length normalization there. I follow
your suggestion and change it to 1.0f. Now it works properly. 

Thanks again


Ahmet Arslan wrote:
> 
> 
>> yah, before this i used default lucene...but i dont know
>> what end up wrong...some results with only single word matching when to
>> the top of the results. 
> 
> Hmm. Interesting. It seems that length normalization causing this. Very
> short documents with only single word matching getting high score due to
> length normalization. The documents containing all of the query terms are
> probably very long and getting lower score. Lucene punishes long
> documents, and favors short documents.
> 
> Can you verify/confirm my guess looking at the document lengths of the
> result set? Also org.apache.lucene.search.Explanation describes the score
> computation for document and query.
> 
> There is an excellent publication [1] [2] (in section 4.1 and 4.2) about
> lucene score modification. SweetSpotSimilarity [3] with the appropriate
> parameters (steepness, min, and max) can solve your problem.
> 
> Alternatively if your requirement is very important (you don't care about
> long documents taking over) then you can try to extend the
> DefaultSimilarity so that it will ignore the document length. Just return
> 1.
> 
> public float lengthNorm(String fieldName, int numTerms) {
> return 1.0f;
>   }
> 
> 
>> This i assumed is due to the score of the result being to
>> high. Tat's why i am trying to add additional boost
> 
> I don't think there exists such a boosting mechanism.
> 
> Ahmet
> 
> [1]
> http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team
> [2]http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
> [3]http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/misc/SweetSpotSimilarity.html
> 
> 
> 
> 
>   
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-tp24694748p24750660.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Storing image with Lucene

2009-12-02 Thread blazingwolf7

Hi,

As per title...is it possible to store image using Lucene? And if its
possible...how can I do that?

Thanks
-- 
View this message in context: 
http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26620107.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing image with Lucene

2009-12-02 Thread blazingwolf7

Thanks for the reply...yes i am trying to create an Image Search. And I did
create something similar to your
suggestion on only storing the links. But due to some limitations being set
on me...I have to find a way to 
store the image..

Maybe I could try the transform idea. 

Anshum-2 wrote:
> 
> Hi,
> Lucene supports string/int literals for indexing and searching. In other
> words, anything that can be transformed into a string/int can be consumed
> by
> the lucene api. Moreover, so are you trying to implement an image search?
> In
> that case perhaps you'd have to either figure out a transform else try
> something else.
> If it is mere storage of an image in lucene (for a particular doc, to be
> fetched) you may as well do what is done by other similar engines behind
> the
> scenes e.g. for blob objects. Just store a path/link to the actual image
> instead of image in the index and fetch it at runtime (in the wrapper
> code).
> 
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
> 
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw
> 
> 
> On Thu, Dec 3, 2009 at 8:02 AM, blazingwolf7 
> wrote:
> 
>>
>> Hi,
>>
>> As per title...is it possible to store image using Lucene? And if its
>> possible...how can I do that?
>>
>> Thanks
>> --
>> View this message in context:
>> http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26620107.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26620344.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing image with Lucene

2009-12-02 Thread blazingwolf7

oh...thanks for the suggestion. I will try using the idea.If it works I will
let u all know..


Anshum-2 wrote:
> 
> Hi Vaijanath,
> Just wanted to know if you can perform a search on the binary field (as I
> haven't tried this ever) ?
> --Original Message--
> From: Rao, Vaijanath
> To: java-user@lucene.apache.org
> ReplyTo: java-user@lucene.apache.org
> Subject: RE: Storing image with Lucene
> Sent: Dec 3, 2009 08:27
> 
> Hi,
> 
> Yes you can, create a binary field which you can use to store the image
> in.
> Field(String name, Reader reader)  Use this to store your image and use
> binaryValue() to get the image back.
> 
> You can also look at storing the features of the image into the index in
> similar way.
> 
> --Thanks and Regards
> Vaijanath N. Rao
> 
> -Original Message-
> From: blazingwolf7 [mailto:blazingwo...@gmail.com] 
> Sent: Thursday, December 03, 2009 8:02 AM
> To: java-user@lucene.apache.org
> Subject: Storing image with Lucene
> 
> 
> Hi,
> 
> As per title...is it possible to store image using Lucene? And if its
> possible...how can I do that?
> 
> Thanks
> --
> View this message in context:
> http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26620107.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
> 
> 
> Sent from BlackBerry® on Airtel
> 

-- 
View this message in context: 
http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26621546.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing image with Lucene

2009-12-02 Thread blazingwolf7

I found a solution already.

That is to convert the image from byte array of the image >> string
Then the string will be stored in the index. But beware, the byte will have
to be encoded to Base64
or the image retrieved will be mess up(meaning the pic is totally ruin)



blazingwolf7 wrote:
> 
> oh...thanks for the suggestion. I will try using the idea.If it works I
> will let u all know..
> 
> 
> Anshum-2 wrote:
>> 
>> Hi Vaijanath,
>> Just wanted to know if you can perform a search on the binary field (as I
>> haven't tried this ever) ?
>> --Original Message--
>> From: Rao, Vaijanath
>> To: java-user@lucene.apache.org
>> ReplyTo: java-user@lucene.apache.org
>> Subject: RE: Storing image with Lucene
>> Sent: Dec 3, 2009 08:27
>> 
>> Hi,
>> 
>> Yes you can, create a binary field which you can use to store the image
>> in.
>> Field(String name, Reader reader)  Use this to store your image and use
>> binaryValue() to get the image back.
>> 
>> You can also look at storing the features of the image into the index in
>> similar way.
>> 
>> --Thanks and Regards
>> Vaijanath N. Rao
>> 
>> -Original Message-
>> From: blazingwolf7 [mailto:blazingwo...@gmail.com] 
>> Sent: Thursday, December 03, 2009 8:02 AM
>> To: java-user@lucene.apache.org
>> Subject: Storing image with Lucene
>> 
>> 
>> Hi,
>> 
>> As per title...is it possible to store image using Lucene? And if its
>> possible...how can I do that?
>> 
>> Thanks
>> --
>> View this message in context:
>> http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26620107.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
>> 
>> Sent from BlackBerry® on Airtel
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Storing-image-with-Lucene-tp26620107p26621710.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



How Lucene Search

2008-06-26 Thread blazingwolf7

hi, 

I am fairly new to Lucene and is currently going over its source code. I had
read through the code for a few times, mapping it and all but I seems to be
facing a problem. I could go all the way to the calculation of score for
each result obtain, but strangely I did not managed to locate the part where
Lucene open the index and check for the matching term.

What I mean is that, I want to check on how Lucene actually open the index
and perform the search. I went through all the methods in IndexReader,
IndexSearcher and some other related class but still fail to locate the
method responsible.

Could anyone help me with this? Thanks 
-- 
View this message in context: 
http://www.nabble.com/How-Lucene-Search-tp18127970p18127970.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How Lucene Search

2008-06-26 Thread blazingwolf7

Thanks for the reply. I had try to start a new project already. Like I had
mention, I actually go through the code from the start of the application
and till the end where the scoring is done. 

But unfortunately, I still fail to locate the part where Lucene open the
index to perform the search. Anyone has any idea on how to do this?
-- 
View this message in context: 
http://www.nabble.com/How-Lucene-Search-tp18127970p18146253.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



.fdt file

2008-07-09 Thread blazingwolf7

Hi, 

I had recently found out that Lucene will retrieve the content of a document
from a file ".fdt". I am trying to retrieve the entire file in one go
instead of retrieving it based on document number. can it be done?
-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373913p18373913.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



.fdt file

2008-07-09 Thread blazingwolf7

Hi, 

I had recently found out that Lucene will retrieve the content of a document
from a file ".fdt". I am trying to retrieve the entire file in one go
instead of retrieving it based on document number. can it be done?
-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373925p18373925.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: .fdt file

2008-07-09 Thread blazingwolf7

Sorry,but I am still quite new to Lucene. What exactly is "cp"?


Yonik Seeley wrote:
> 
> On Wed, Jul 9, 2008 at 9:01 PM, blazingwolf7 <[EMAIL PROTECTED]>
> wrote:
>> I had recently found out that Lucene will retrieve the content of a
>> document
>> from a file ".fdt". I am trying to retrieve the entire file in one go
>> instead of retrieving it based on document number. can it be done?
> 
> "cp" can retrieve the file on one go ;-)
> 
> Other than that, the format is documented here:
> http://lucene.apache.org/java/docs/fileformats.html
> But I'm not sure why retrieving by document number won't work for you.
> 
> -Yonik
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373913p18375077.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: .fdt file

2008-07-09 Thread blazingwolf7

Well, I am trying to extract the URL and contentLength from the ".fdt" file.
I am planning to use both of these values in a filter to remove certain
links to be display in the search result. The problem is, I am told not to
use the IndexReader to retrieve these values for each document found
matching with the query.

So now, instead, I will have to retrieve the entire .fdt file, extract both
the values and store it into an arraylist which will be use later.  I am
having problem extracting the entire file without using all the seek()
method to determine the position of the document.

Any suggestion?


Yonik Seeley wrote:
> 
> On Wed, Jul 9, 2008 at 11:13 PM, blazingwolf7 <[EMAIL PROTECTED]>
> wrote:
>> Sorry,but I am still quite new to Lucene. What exactly is "cp"?
> 
> The unix command for copy (hence the smiley).
> 
> Some of your recent questions seem to be suffering from an XY problem:
> http://www.perlmonks.org/index.pl?node_id=542341
> You may get more help by explaining what you are trying to do.
> 
> -Yonik
> 
>> Yonik Seeley wrote:
>>>
>>> On Wed, Jul 9, 2008 at 9:01 PM, blazingwolf7 <[EMAIL PROTECTED]>
>>> wrote:
>>>> I had recently found out that Lucene will retrieve the content of a
>>>> document
>>>> from a file ".fdt". I am trying to retrieve the entire file in one go
>>>> instead of retrieving it based on document number. can it be done?
>>>
>>> "cp" can retrieve the file on one go ;-)
>>>
>>> Other than that, the format is documented here:
>>> http://lucene.apache.org/java/docs/fileformats.html
>>> But I'm not sure why retrieving by document number won't work for you.
>>>
>>> -Yonik
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373913p18376301.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: .fdt file

2008-07-10 Thread blazingwolf7

Thanks. I think I will follow the advice. But just for the sack of curiosity,
can what I suggest be done ?


Yonik Seeley wrote:
> 
> On Thu, Jul 10, 2008 at 1:42 AM, blazingwolf7 <[EMAIL PROTECTED]>
> wrote:
>> Well, I am trying to extract the URL and contentLength from the ".fdt"
>> file.
>> I am planning to use both of these values in a filter to remove certain
>> links to be display in the search result. The problem is, I am told not
>> to
>> use the IndexReader to retrieve these values for each document found
>> matching with the query.
>>
>> So now, instead, I will have to retrieve the entire .fdt file, extract
>> both
>> the values and store it into an arraylist which will be use later.  I am
>> having problem extracting the entire file without using all the seek()
>> method to determine the position of the document.
>>
>> Any suggestion?
> 
> You're trying to do things at too low of a level (bypassing Lucene's
> public APIs)
> I suggested earlier that you index the URL untokenized, and then use
> the FieldCache.  That will allow you to easily retrieve a String[] of
> all the URLs.
> 
> -Yonik
> 
> 
>> Yonik Seeley wrote:
>>>
>>> On Wed, Jul 9, 2008 at 11:13 PM, blazingwolf7 <[EMAIL PROTECTED]>
>>> wrote:
>>>> Sorry,but I am still quite new to Lucene. What exactly is "cp"?
>>>
>>> The unix command for copy (hence the smiley).
>>>
>>> Some of your recent questions seem to be suffering from an XY problem:
>>> http://www.perlmonks.org/index.pl?node_id=542341
>>> You may get more help by explaining what you are trying to do.
>>>
>>> -Yonik
>>>
>>>> Yonik Seeley wrote:
>>>>>
>>>>> On Wed, Jul 9, 2008 at 9:01 PM, blazingwolf7 <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>> I had recently found out that Lucene will retrieve the content of a
>>>>>> document
>>>>>> from a file ".fdt". I am trying to retrieve the entire file in one go
>>>>>> instead of retrieving it based on document number. can it be done?
>>>>>
>>>>> "cp" can retrieve the file on one go ;-)
>>>>>
>>>>> Other than that, the format is documented here:
>>>>> http://lucene.apache.org/java/docs/fileformats.html
>>>>> But I'm not sure why retrieving by document number won't work for you.
>>>>>
>>>>> -Yonik
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/.fdt-file-tp18373913p18376301.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373913p18394786.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: .fdt file

2008-07-10 Thread blazingwolf7

Well, according to him, using the reader to access the index everytime a
document is found to retrieve certain values is inefficient. Meaning if
there is 500k document, the index will be access 500k times. It might affect
the performance of the search.

So I am instructed to retrieve all the necessary values at the beginning of
the search and store it. Later the values will be retrieve from there. I am
cracking my head trying to do that%-|


Grant Ingersoll-6 wrote:
> 
> 
> On Jul 10, 2008, at 1:42 AM, blazingwolf7 wrote:
> 
>>
>> Well, I am trying to extract the URL and contentLength from the  
>> ".fdt" file.
>> I am planning to use both of these values in a filter to remove  
>> certain
>> links to be display in the search result. The problem is, I am told  
>> not to
>> use the IndexReader to retrieve these values for each document found
>> matching with the query.
> 
> Are you implying that using the IR would solve your problem, but for  
> some reason you're architect (or whatever you call the person making  
> the decisions) told you not to?  If so, can you explain more the  
> reasoning?
> 
>>
>>
>> So now, instead, I will have to retrieve the entire .fdt file,  
>> extract both
>> the values and store it into an arraylist which will be use later.   
>> I am
>> having problem extracting the entire file without using all the seek()
>> method to determine the position of the document.
>>
>> Any suggestion?
>>
>>
>> Yonik Seeley wrote:
>>>
>>> On Wed, Jul 9, 2008 at 11:13 PM, blazingwolf7  
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>> Sorry,but I am still quite new to Lucene. What exactly is "cp"?
>>>
>>> The unix command for copy (hence the smiley).
>>>
>>> Some of your recent questions seem to be suffering from an XY  
>>> problem:
>>> http://www.perlmonks.org/index.pl?node_id=542341
>>> You may get more help by explaining what you are trying to do.
>>>
>>> -Yonik
>>>
>>>> Yonik Seeley wrote:
>>>>>
>>>>> On Wed, Jul 9, 2008 at 9:01 PM, blazingwolf7 <[EMAIL PROTECTED] 
>>>>> >
>>>>> wrote:
>>>>>> I had recently found out that Lucene will retrieve the content  
>>>>>> of a
>>>>>> document
>>>>>> from a file ".fdt". I am trying to retrieve the entire file in  
>>>>>> one go
>>>>>> instead of retrieving it based on document number. can it be done?
>>>>>
>>>>> "cp" can retrieve the file on one go ;-)
>>>>>
>>>>> Other than that, the format is documented here:
>>>>> http://lucene.apache.org/java/docs/fileformats.html
>>>>> But I'm not sure why retrieving by document number won't work for  
>>>>> you.
>>>>>
>>>>> -Yonik
>>>
>>> -
>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/.fdt-file-tp18373913p18376301.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/.fdt-file-tp18373913p18395519.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ArrayList or HashMap

2008-07-14 Thread blazingwolf7

Hi, 

I am working on extracting information from around 2 to 3 million document
and place it into the memory to retrieve it for filtering search results.
The application will have to extract the information and store it for every
search. I am wondering what will be the best way to store this information.

ArrayList or hashmap? Which is the fastest among these two? I will have to
directly access information in a particular position by using some
identifier which both of this is capable of doing. So I just need to know
which one will be faster. And what is the maximum size of the arraylist and
hashmap?

thanks
-- 
View this message in context: 
http://www.nabble.com/ArrayList-or-HashMap-tp18440507p18440507.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Storing information

2008-07-21 Thread blazingwolf7

Hi, 

I am using Lucene to perform searching. I have certain information that will
be loaded everytime a search is run. This means, if there are multiple user
running the search at the same time, the information will be loaded multiple
times. 

This is not effecient at all, so I was wondering is there anyway I can load
those values only once, and every search perform after that no matter how
many user will simply refer to the preloaded values?

Thanks
-- 
View this message in context: 
http://www.nabble.com/Storing-information-tp18581695p18581695.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index of Lucene

2008-08-17 Thread blazingwolf7

Hi,

I am currently using Lucene for indexing. After a index a file, I will use
LUKE to open it and check the index. And there is 1 part that I am curious
about. In Luke, under the Document tab, I randomly select a document and
display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
String Value. 

I am wondering, what is Norm for? And where is it created during indexing
time? Which method calculates it? 

Could anyone advise me on this? Thanks for the help
-- 
View this message in context: 
http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Index of Lucene

2008-08-17 Thread blazingwolf7

Thanks for the info. But do you know where this is actually perform in
Lucene? I mean the method involved, that will calculate the value before
storing it into the index. I track it to one method known as lengthNorm() in
DefaultSimilarity.java, but the value is different from what is stored in
the index


Doron Cohen-2 wrote:
> 
> Norms information comes mainly from lengths of documents - allowing the
> search time scoring to take into account the effect of document lengths
> (actually
> field length within a document). In practice, norms stored within the
> index
> may include
> other information, such as index time boosts - for a document, for a
> field.
> A single
> byte is stored for each field, - so for this the actual value is
> compressed.
> At search
> time, norms are loaded into memory, and so consume 1 byte for each
> document.
> It is possible to disable norms for a field while indexing. This is
> explained
> better in the javadoc for Similarity, and here:
>  http://lucene.apache.org/java/2_3_2/scoring.html
> 
> Doron
> 
> On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7
> <[EMAIL PROTECTED]>wrote:
> 
>>
>> Hi,
>>
>> I am currently using Lucene for indexing. After a index a file, I will
>> use
>> LUKE to open it and check the index. And there is 1 part that I am
>> curious
>> about. In Luke, under the Document tab, I randomly select a document and
>> display it. At the bottom will be 4 columns, Field, ITSVopLBC, Norm and
>> String Value.
>>
>> I am wondering, what is Norm for? And where is it created during indexing
>> time? Which method calculates it?
>>
>> Could anyone advise me on this? Thanks for the help
>> --
>> View this message in context:
>> http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-of-Lucene-tp19025490p19025890.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Score Boosting

2008-08-18 Thread blazingwolf7

Hi, 

I am currently working on the calculation of score part in Lucene. And I
encounter a part that I do not understand. 
return raw * Similarity.decodeNorm(norms[doc]); // normalize for field

As can be seen from the code above, the Similarity method decodeNorm() will
be called to decode the byte formatted value to change it back into float
value. This value actually represent the normalization value for the fields;
title, url, content, host and anchor. 

I would like to know how it actually select the field to be included in the
calculation. When I print out the value, I noticed only one of the field
will be selected. Can anyone advise me on this? How the field is selected,
and why not all the field is used in the calculation?

Thanks
-- 
View this message in context: 
http://www.nabble.com/Score-Boosting-tp19043489p19043489.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]