Re: Doubt Regarding Lucene matchscore

2014-07-21 Thread Rajendra Rao
+Priyanka Tufchi

Hello Kumaran

We are using version 4.1

we have passed 1900  document and set paging 1900
we are expecting score of 1900 but we are getting only 1400 records in
ScoreDoc[] hits
So where are remaining 500 records .? should we consider it no matched ?

and 900 records have matchIndex 0  what should we consider it?   No Match.

Hope  you will get it more clear now.


sample code is below

-
String QueryStr = query;

 // return array list of ranked result

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);

// 1. create the index
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_41,
 analyzer);

IndexWriter w = new IndexWriter(index, config);
int counter = 0;

// define array of field name which s active
int arrIndex = 0;

///add near 1900 document using while
while (counter  lstDocBean.size()) {

String criteria=lstDocBean.getCriteria();
  String docID=lstDocBean.getDocID();


Document doc = new Document();
 doc.add(new TextField(DocID, docID, shouldStore.YES));
   doc.add(new TextField(Criteria, criteria,
shouldStore.YES));



w.addDocument(doc);

counter++;

}

w.close();

// here add weight in query

// here write file write code

 // Query queryparser = new QueryParser(Version.LUCENE_41, ,
// analyzer).parse(QueryParser.escape(QueryStr));

Query queryparser = new MultiFieldQueryParser(Version.LUCENE_41,
Criteria, analyzer).parse(QueryParser.escape(QueryStr));

int hitsperpage = 1900;
IndexReader reader = IndexReader.open(index);

 IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(
 hitsperpage, true);
searcher.search(queryparser, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

// refineLuceneTextSearch(Requirement, hits, searcher, ReqId);

// old code //Display

for (int i = 0; i  hits.length; ++i) {
int docId = hits[i].doc; //


Document d = searcher.doc(docId);
 String name1 = d.get(DocID + );
 system.out.println(DocID+)

}






On Fri, Jul 18, 2014 at 11:45 PM, Kumaran R kums@gmail.com wrote:

 Provide some more information like lucene version, sample code,
 parameters involved in indexing and searching.

 --
 Kumaran R

  On 18-Jul-2014, at 6:52 pm, Priyanka Tufchi 
 priyanka.tuf...@launchship.com wrote:
 
  Hi All
 
  I am matching and ranking two set of Docs using apache lucene and I
 passes
  page hits 1000. But in the result it shows 200 only why?
 
  It means that rest 800 are not matched and if so then what we should
  consider if we are getting 0.00 score for any match .
 
  Waiting for reply
 
  Thanks
  Priyanka

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: More Like This query is not working.

2014-07-21 Thread Rajendra Rao
Hello Ian,

I am using version 4.1


I am expecting id of SearchedText which is similar of   QueryStr
but i am getting   0 sizeScoreDoc[] hits

Below is code I am using.

___

My Parameter is
ArrayList whic contain DocID  and Criteria Text  (collection of Documents
to be passed for indexing)
Query  text  in QueryStr  .


StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);

// 1. create the index
 Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_41,
 analyzer);

IndexWriter w = new IndexWriter(index, config);
int counter = 0;

 while (counter  lstDocBean.size()) {

String searchedText=lstDocBean.getText();
  String docID=lstDocBean.getDocID();


Document doc = new Document();
 doc.add(new TextField(DocID, docID, shouldStore.YES));
   doc.add(new TextField(searchedText, searchedText,
shouldStore.YES));



w.addDocument(doc);

counter++;

}



w.close();


int hitsperpage = 10;
IndexReader reader = IndexReader.open(index);

 IndexSearcher searcher = new IndexSearcher(reader);

// get similar doc
// Reader sReader = new StringReader();
 MoreLikeThis mlt = new MoreLikeThis(reader);
// 
mlt.setAnalyzer(analyzer);

mlt.setFieldNames(SearchedText);

Reader reader1 = new StringReader(queryStr);
 Query Searchedquery = mlt.like(reader1, null);



// --

TopDocs results = searcher.search(Searchedquery, 10);
 ScoreDoc[] hits = results.scoreDocs;


for (int i = 0; i  hits.length; ++i) {
int docId = hits[i].doc; //

 Document d = searcher.doc(docId);
 int sys_DocID=d.get(DocID);
   double Score   = hits[i].score



}










On Fri, Jul 18, 2014 at 7:34 PM, Ian Lea ian@gmail.com wrote:

 You need to supply more info.  Tell us what version of lucene you are
 using and provide a very small completely self-contained example or
 test case showing exactly what you expect to happen and what is
 happening instead.


 --
 Ian.


 On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao
 rajendra@launchship.com wrote:
  Hello
 
 
  I am using more like this query .But size of Score Docs i am getting is 0
  I found that it
  In Query Searchedquery = mlt.like(reader1, criteria);
 
  query object contain following value
  boost 1.0
  all clauses element data is null
 
 
  I used following code
  MoreLikeThis mlt = new MoreLikeThis(reader);
   // 
  mlt.setAnalyzer(analyzer);
 
   Reader reader1 = new StringReader(Requirement);
  Query Searchedquery = mlt.like(reader1, criteria);
 
 
  please guide me.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: More Like This query is not working.

2014-07-21 Thread Ian Lea
That's not completely self-contained which means that we can't compile
and test it ourselves.

This looks very dodgy:

doc.add(new TextField(searchedText, ...

mlt.setFieldNames(SearchedText);


--
Ian.


On Mon, Jul 21, 2014 at 12:41 PM, Rajendra Rao
rajendra@launchship.com wrote:
 Hello Ian,

 I am using version 4.1


 I am expecting id of SearchedText which is similar of   QueryStr
 but i am getting   0 sizeScoreDoc[] hits

 Below is code I am using.

 ___

 My Parameter is
 ArrayList whic contain DocID  and Criteria Text  (collection of Documents
 to be passed for indexing)
 Query  text  in QueryStr  .


 StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);

 // 1. create the index
  Directory index = new RAMDirectory();
 IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_41,
  analyzer);

 IndexWriter w = new IndexWriter(index, config);
 int counter = 0;

  while (counter  lstDocBean.size()) {

 String searchedText=lstDocBean.getText();
   String docID=lstDocBean.getDocID();


 Document doc = new Document();
  doc.add(new TextField(DocID, docID, shouldStore.YES));
doc.add(new TextField(searchedText, searchedText,
 shouldStore.YES));



 w.addDocument(doc);

 counter++;

 }



 w.close();


 int hitsperpage = 10;
 IndexReader reader = IndexReader.open(index);

  IndexSearcher searcher = new IndexSearcher(reader);

 // get similar doc
 // Reader sReader = new StringReader();
  MoreLikeThis mlt = new MoreLikeThis(reader);
 // 
 mlt.setAnalyzer(analyzer);

 mlt.setFieldNames(SearchedText);

 Reader reader1 = new StringReader(queryStr);
  Query Searchedquery = mlt.like(reader1, null);



 // --

 TopDocs results = searcher.search(Searchedquery, 10);
  ScoreDoc[] hits = results.scoreDocs;


 for (int i = 0; i  hits.length; ++i) {
 int docId = hits[i].doc; //

  Document d = searcher.doc(docId);
  int sys_DocID=d.get(DocID);
double Score   = hits[i].score



 }










 On Fri, Jul 18, 2014 at 7:34 PM, Ian Lea ian@gmail.com wrote:

 You need to supply more info.  Tell us what version of lucene you are
 using and provide a very small completely self-contained example or
 test case showing exactly what you expect to happen and what is
 happening instead.


 --
 Ian.


 On Fri, Jul 18, 2014 at 11:50 AM, Rajendra Rao
 rajendra@launchship.com wrote:
  Hello
 
 
  I am using more like this query .But size of Score Docs i am getting is 0
  I found that it
  In Query Searchedquery = mlt.like(reader1, criteria);
 
  query object contain following value
  boost 1.0
  all clauses element data is null
 
 
  I used following code
  MoreLikeThis mlt = new MoreLikeThis(reader);
   // 
  mlt.setAnalyzer(analyzer);
 
   Reader reader1 = new StringReader(Requirement);
  Query Searchedquery = mlt.like(reader1, criteria);
 
 
  please guide me.

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: More Like This query is not working.

2014-07-21 Thread Rajendra Rao
Hello lan,

code is below  ,please let us know where we are mistaken.



public  void FindSimilarDocExe()
 throws IOException, ParseException, ClassNotFoundException {

// ResultSet rs=null;
// Specify the analyzer for tokenizing text.
 // The same analyzer should be used for indexing and searching

// return array list of ranked result
 // ArrayListScoredDocumentBean retval = new
// ArrayListScoredDocumentBean();

 StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);

// 1. create the index
 Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_41,
 analyzer);

IndexWriter w = new IndexWriter(index, config);
int counter = 0;

// define array of field name which s active
int arrIndex = 0;

String queryStr = Airgas is the United States largest distributor of
industrial, 
 + medical, and specialty gases and related equipment, 
+ safety supplies and MRO products and services to industrial and
commercial markets. 
 + The client had the need to build a production planning module in order
to have one 
+ comprehensive system for specialty gas production and to consolidate the
other multiple 
 + production related systems.;

Document doc1 = new Document();

 doc1.add(new TextField(DocID, 1, Field.Store.YES));
doc1.add(new TextField(
 discription,
Airgas is the United States largest distributor of industrial, 
 + medical, and specialty gases and related equipment, 
+ safety supplies and MRO products and services to industrial and
commercial markets. 
 + The client had the need to build a production planning module in order
to have one 
+ comprehensive system for specialty gas production and to consolidate the
other multiple 
 + production related systems., Field.Store.YES));

w.addDocument(doc1);

doc1 = new Document();

doc1.add(new TextField(DocID, 2, Field.Store.YES));
 doc1.add(new TextField(
discription,
Form based document retrieval addresses the exact 
 + syntactic properties of a text, comparable to substring matching in
string searches. 
+ The text is generally unstructured and not necessarily in a natural
language, 
 + the system could for example be used to process large sets of chemical
representations 
+ in molecular biology. A suffix tree algorithm is an example for 
 + form based indexing., Field.Store.YES));

w.addDocument(doc1);

 doc1.add(new TextField(DocID, 3, Field.Store.YES));
doc1.add(new TextField(
 discription,
A signature file is a technique that creates a quick 
+ and dirty filter, for example a Bloom filter, that will keep all the
documents 
 + that match to the query and hopefully a few ones that do not. The way
this is done is 
+ by creating for each file a signature, typically a hash coded version.
One method is 
 + superimposed coding. A post-processing step is done to discard the
false alarms. Since in most 
+ cases this structure is inferior to inverted files in terms of speed,
size and functionality, it 
 + is not used widely. However, with proper parameters it can beat the
inverted files in certain 
+ environments., Field.Store.YES));

w.addDocument(doc1);

doc1.add(new TextField(DocID, 4, Field.Store.YES));
 doc1.add(new TextField(
discription,
Novell is the leading global provider of security information 
 + management and compliance monitoring solutions. This module includes
development of components 
+ called Collectors. Collectors are used to collect and normalize events
from security devices 
 + and programs. These normalized events are then sent to sentinel for use
in correlation, 
+ reporting, and incident response. Different types of collectors are
developed to read 
 + events/logs from different types of network devices using different
types of connection 
+ methods.  Documentation is also provided with each collector which
describes how to configure 
 + the collector and the corresponding network device/product. Reports are
also developed 
+ specific to each collector which can be used to analyze the data pumped
by a collector. 
 + And correlation rules are also shipped with each collector to correlate
the events pumped by 
+ a collector., Field.Store.YES));

w.addDocument(doc1);

doc1.add(new TextField(DocID, 5, Field.Store.YES));
 doc1.add(new TextField(
discription,
Airgas is the United States largest distributor of industrial, 
 + medical, and specialty gases and related equipment, 
+ safety supplies and MRO products and services to industrial and
commercial markets. 
 + The client had the need to build a production planning module in order
to have one 
+ comprehensive system for specialty gas production and to consolidate the
other multiple 
 + production related systems., Field.Store.YES));

w.addDocument(doc1);

w.close();

// here add weight in query

// here write file write code

int hitsperpage = 10;
IndexReader reader = IndexReader.open(index);

IndexSearcher searcher = new IndexSearcher(reader);

// get similar doc
// Reader sReader = new StringReader(query);