RE: Test failure question

2006-06-16 Thread Pasha Bizhan
Hi,

 testBarelyCloseEnough(), testExact(), testMulipleTerms(), 
 etc?  If so, then the NUnit is not doing this.  I tested by 
 outputing to stdout.

NUnit calls setUp before each test and calls tearDown after each test.
Add Console.WriteLine and see the result.

Let me show:
--
[TestFixture]
public class TestPhraseQuery{
[SetUp]
protected void SetUp()  {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true);
...
Console.WriteLine(set up);
}


[TearDown]
protected void TearDown()   {
searcher.Close(); directory.Close();
Console.WriteLine(tear down);
}


[Test]
public void TestNotCloseEnough()  {
query.SetSlop(2);
.
MockAssert.AreEqual(0, hits.Length());
Console.WriteLine(not close);
}
--
The output:
---
set up
barely
tear down

set up
tear down
...


Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Benchmarking results

2006-04-04 Thread Pasha Bizhan
Hi, 

 From: Marvin Humphrey [mailto:[EMAIL PROTECTED] 


 The test corpus was Reuters-21578, Distribution 1.0.  
 Reuters-21578 is available from David D. Lewis' professional 
 home page, currently:
 
  http://www.research.att.com/~lewis

The correct link is
http://www.daviddlewis.com/resources/testcollections/reuters21578/
 
Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Test corpus

2006-04-01 Thread Pasha Bizhan
Hi, 

 From: Marvin Humphrey [mailto:[EMAIL PROTECTED] 
 
 I'm looking for a test corpus to use for some benchmarking 
 and parsing tests.  I can whip one up myself, but it would be 
 nice to use something standardized.  I'd like something that 
 doesn't require a license/fee, so that other people can run 
 the same tests.  At least 1000 docs, a few hundred words 
 each.  Any suggestions?

See Corpora section at http://wiki.apache.org/jakarta-lucene/Resources

Pasha Bizhan



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LUCENE-460

2005-12-23 Thread Pasha Bizhan
Hi,

Question about latest cvs changes and hashcodes.
http://issues.apache.org/jira/browse/LUCENE-460

Could anybody explain the magic numbers? 0x6634D93C,0x2742E74A and other.
Any special meaning? Is this documented anywhere?

Pasha 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Advanced query language

2005-12-03 Thread Pasha Bizhan
Hi, 

 From: Erik Hatcher [mailto:[EMAIL PROTECTED] 

 MoreLikeThis minNumberShouldMatch=3
  maxQueryTerms=30
 
 We're back to MoreLikeThis - it's not currently a Query subclass.   
 How do you envision this sort of thing fitting in if it's not a Query?

But MoreLikeThis class produces a Query. It's similar to google define:
search. 
I think goolge handle such queries and then redirect search to somewhere. 
And QueryParser can handle such searches too and use an alternative logic to
create Query.

For example, we can extend the QueryParser by special (syntax) handlers
which will be create the Query.

Something lke this:
--
class LikeHandler {};
LikeHandler likeHandler = new LikeHandler(...); 
string queryString = like:(red quick fox); 
Query q = QueryParser.parse(queryString, analyzer, likeHandler);
--

QueryParser scan the input, find special command (like:) and then find the
handler for this command.
If the handler exists the QP call it to create the Query.

Disadvantages are present.

Pasha Bizhan



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Advanced query language

2005-12-03 Thread Pasha Bizhan
Hi, 

 From: markharw00d [mailto:[EMAIL PROTECTED] 
 Re: MoreLikeThis queries.
 Yes, they can be usefully wrapped as queries (see attached simple 
 example). In fact it was  my attempts at bastardising QueryParser to 
 support them that brought home it's limitations. I ended up with a 
 subclass hack that (mis)used the field name to parse a query string 
 like:123 where 123 was a doc id. With the QueryParser 
 syntax I was not  able to pass other parameters which MoreLikeThis could 
 usefully use to  control the behaviour of this query type eg choice of 
 fieldname(s) used,  max number of terms generated, minNumberShouldTerms to
match etc etc.

With the _current_ QP syntax. 

In refer to my previous letter about syntax handlers you would be able to
pass the parameters to handler.

string query = like(param1, param2,...): (bla-bla-bla);

A syntax of parameters isn't signifant to QP. QP do not need to know
anything about parameter's syntax.

string query=like(percentTermsToMatch=0.25f,docId=44,...):...
;
Or
string query=like(0.25f,44): ...


 This is not unusual, each query type has potentially multiple 
 optional 
 parameters that tweak it's behaviour. If I don't have a query 
 language 
 that names the parameters explicitly (say, XML) I end up having to 
 define what looks like a function with a long list of 
 parameters: like 
 (123,,,4,,,). Ack.
 
Exactly. 
 
Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-474) High Frequency Terms/Phrases at the Index level

2005-11-28 Thread Pasha Bizhan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-474?page=comments#action_12358629 ] 

Pasha Bizhan commented on LUCENE-474:
-

Look for the HighFreqTerms package in contib area:
http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/miscellaneous/src/java/org/apache/lucene/misc/HighFreqTerms.java?rev=164963view=log

 High Frequency Terms/Phrases at the Index level
 ---

  Key: LUCENE-474
  URL: http://issues.apache.org/jira/browse/LUCENE-474
  Project: Lucene - Java
 Type: New Feature
 Versions: 1.4
 Reporter: Suri Babu B


 We should be able to find the all the high frequncy terms/phrases ( where 
 frequency  is the search criteria / benchmark)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-474) High Frequency Terms/Phrases at the Index level

2005-11-28 Thread Pasha Bizhan (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-474?page=comments#action_12358643 ] 

Pasha Bizhan commented on LUCENE-474:
-

I understand what is high freq terms. But what is high freq phrases?
Could you please explain your index structure?


 High Frequency Terms/Phrases at the Index level
 ---

  Key: LUCENE-474
  URL: http://issues.apache.org/jira/browse/LUCENE-474
  Project: Lucene - Java
 Type: New Feature
 Versions: 1.4
 Reporter: Suri Babu B


 We should be able to find the all the high frequncy terms/phrases ( where 
 frequency  is the search criteria / benchmark)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: class for delete/add access to an index

2005-05-27 Thread Pasha Bizhan
Hi, 

 From: Daniel Naber [mailto:[EMAIL PROTECTED] 

 What do you think? If this gets accepted, it also needs a better name.

Please also add an api for searching like this:
http://searchblackbox.com/sdk/api/SearchBlackBox.SearchEngine.ExecuteSearch.
html

Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Questions about DeleteFile method

2005-05-03 Thread Pasha Bizhan
Hi,
 George Aroush [EMAIL PROTECTED] wrote:
All: Speaking of my port work for 1.9 RC1, I don't have 
a clear idea what to
do about java.util.zip.  There is no equivalent in .NET 
and it is being used in Lucene 1.9 RC1 for Index.FieldsWriter and 
Index.FieldsReader.  Any  suggestion?
SharpZLib. We use it for our port :)) Current tests for 
compatibility works well but we have not the final results 
at present.

Pasha Bizhan
http://lucenedotnet.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: java.util.zip (was Questions about DeleteFile method)

2005-05-03 Thread Pasha Bizhan
Hi,
 Monsur Hossain [EMAIL PROTECTED] wrote:
Hmm, but upon first look I don't see a direct analog to 
the Inflater/Deflater methods.
using ICSharpCode.SharpZipLib.Zip;
using ICSharpCode.SharpZipLib.Zip.Compression;
// Create the compressor with highest level of compression
 Deflater compressor = new Deflater();
 compressor.SetLevel(Deflater.BEST_COMPRESSION);
and etc
Pasha Bizhan
http://lucenedotnet.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]