Re: Controlling Hits

2006-11-26 Thread Nadav Har'El
On Fri, Nov 24, 2006, Otis Gospodnetic wrote about Controlling Hits:
 Hi,
 
 Could we make Hits non-final, or at least expose something in Hits to control 
 the number of Documents it reads from disk?
...
 Or maybe the answer is: Use the search method that returns TopDocs if you 
 want more control...?

In an application I was writing, I was facing similar issues: Hits was fine
for a short Demo in Lucene, but when it came to a real application, it didn't
give me enough control: it reran the search too many times when you wanted
to see, e.g, the 20th result page, and wouldn't allow me adding a HitCollector
which I needed. I started by modifying Hits (which wasn't just final - much
of its functionality was private), but then realized: there's simply no
reason to use Hits! IndexSearcher.search() which returns TopDocs already
gives you full control, and frankly isn't that much harder to use.

In fact, I fail to see a situation where Hits's concept of random access
to the results (you can ask for result #30 and then #70) even makes sense.
In all search applications I'm familar with, at the time you call search(),
you already know how many results you want to display - and you don't need
someone to guess for you that you need 50 results, and if that's not enough
then you need 100 results, and then 200, and so on.
And since this concept of random access is what differenciates Hits from
TopDocs, perhaps we don't need Hits at all?

So, how about deprecating Hits altogether, and recommending the TopDocs
alternatives instead?

-- 
Nadav Har'El|   Sunday, Nov 26 2006, 5 Kislev 5767
IBM Haifa Research Lab  |-
|God created the world out of nothing, but
http://nadav.harel.org.il   |the nothingness still shows through.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing GData Feeds -- need your help!

2006-11-26 Thread Simon Willnauer

On 11/25/06, Vic Bancroft [EMAIL PROTECTED] wrote:

Simon Willnauer wrote:

 I'm actually looking for alternatives and suggestions about this
 topic, I know you guys have your own schedule but I would really
 appreciate some help with that.

Perhaps the derby would be appropriate,


I was looking at derby already, but storing xml in RDBMS is either
quiet tricky if you wanna get a reasonable performance as RDBMS store
data in tables not in a tree like xml does. Fetching data should be
really fast and storing the whole xml as a BLOB would afford parsing
the xml after the sql query returns. Parsing is supposed to be a
bottleneck in this scenario. I will give otis suggestion another go.
Berkley DB could offer quiet good performance for that case.

cheers simon



http://db.apache.org/derby/

It is fairly fast, somewhat scalable and looks like a relational store .
. .

 if anyone of you have any idea about the license stuff
 (http://www.gossamer-threads.com/lists/lucene/java-dev/42253), who I
 have to contact and if it is ok for the PMC a short answer would be
 great.

Derby is available under the Apache License, Version 2.0
http://www.apache.org/licenses/ . . .

more,
l8r,
v

--
The future is here. It's just not evenly distributed yet.
 -- William Gibson, quoted by Whitfield Diffie


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-669) finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close already closed file

2006-11-26 Thread Michael McCandless (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-669?page=comments#action_12453434 ] 

Michael McCandless commented on LUCENE-669:
---

Hmmm.  Michael, how does the exception in this unit test tie into this issue?  
Ie, I thought this issue was that only finalize would be doing a double-close?  
I'm confused how the two are connected (it's awesome that your patch fixes 
this, but I'd like to understand why!).

 finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput 
 try to close already closed file
 -

 Key: LUCENE-669
 URL: http://issues.apache.org/jira/browse/LUCENE-669
 Project: Lucene - Java
  Issue Type: Bug
  Components: Store
Reporter: Michael Busch
 Assigned To: Michael Busch
Priority: Trivial
 Attachments: FSDirectory_close_file2.patch


 Hi all,
 I found a small problem in FSDirectory: The finalize()-methods of 
 FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close the 
 underlying file. This is not a problem unless the file has been closed before 
 by calling the close() method. If it has been closed before, the finalize 
 method throws an IOException saying that the file is already closed. Usually 
 this IOException would go unnoticed, because the GarbageCollector, which 
 calls finalize(), just eats it. However, if I use the Eclipse debugger the 
 execution of my code will always be suspended when this exception is thrown.
 Even though this exception probably won't cause problems during normal 
 execution of Lucene, the code becomes cleaner if we apply this small patch. 
 Might this IOException also have a performance impact, if it is thrown very 
 frequently?
 I attached the patch which applies cleanly on the current svn HEAD. All 
 testcases pass and I verfied with the Eclipse debugger that the IOException 
 is not longer thrown.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Storing GData Feeds -- need your help!

2006-11-26 Thread karl wettin


22 nov 2006 kl. 15.17 skrev Simon Willnauer:


Is it that what you pointing to?!


Indeed. I must have missunderstood your question.


There is still the problem with saving the data. Any further ideas?!


Could you plain old use the file system?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-728) Remove or deprecate contrib/similarity

2006-11-26 Thread Otis Gospodnetic (JIRA)
Remove or deprecate contrib/similarity
--

 Key: LUCENE-728
 URL: http://issues.apache.org/jira/browse/LUCENE-728
 Project: Lucene - Java
  Issue Type: Task
  Components: Search
Reporter: Otis Gospodnetic
 Assigned To: Otis Gospodnetic
Priority: Minor
 Fix For: 2.0.1


Classes under contrib/similarity seem to be duplicates of classes under 
contrib/queries.
I'd like to remove *.java from contrib/similarity without bothering with 
deprecation, since the same functionality exists in contrib/queries.
Anyone minds?


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Duplicate MoreLikeThis.java

2006-11-26 Thread Otis Gospodnetic
I'll remove stuff in contrib/similarity later this week, to give people time to 
object, should they feel like it.  It was Thanksgiving in the U.S., so a lot of 
people are out chasing turkeys and not staring at the black box.

Otis

- Original Message 
From: markharw00d [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Friday, November 24, 2006 7:36:28 PM
Subject: Re: Duplicate MoreLikeThis.java

I believe they are the same but the one to keep is in contrib/queries.

The queries directory was suggested as a better location for 
organising contrib code - see here:
http://www.gossamer-threads.com/lists/lucene/java-dev/32872#32872

I chose to copy MoreLikeThis to contrib/queries and not remove 
contrib/similarity at the time to avoid breaking any dependencies but I 
suspect the time is right to remove contrib/similarity now.

Any objections?



Send instant messages to your online friends http://uk.messenger.yahoo.com 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-721) Code coverage reports

2006-11-26 Thread Grant Ingersoll (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-721?page=all ]

Grant Ingersoll reassigned LUCENE-721:
--

Assignee: Grant Ingersoll  (was: Michael Busch)

 Code coverage reports
 -

 Key: LUCENE-721
 URL: http://issues.apache.org/jira/browse/LUCENE-721
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Other
Reporter: Michael Busch
 Assigned To: Grant Ingersoll
Priority: Minor
 Attachments: clover.patch, code_coverage.patch, emma_report.zip


 Hi all,
 We should be able to measure the code coverage of our unit testcases. I 
 believe it would be very helpful for the committers, if they could verify 
 before committing a patch if it does not reduce the coverage. 
 Furthermore people could take a look in the code coverage reports to figure 
 out where work needs to be done, i. e. where additional testcases are 
 neccessary. It would be nice if we could add a page to the Lucene website 
 showing the report, generated by the nightly build. Maybe you could add that 
 to your preview page (LUCENE-707), Grant?
 I attach a patch here that uses the tool EMMA to generate the code coverage 
 reports. EMMA is a very nice open-source tool released under the CPL (same 
 license as junit). The patch adds three targets to common-build.xml: 
 - emma-check: verifys if both emma.jar and emma_ant.jar are in the ant 
 classpath 
 - emma-instrument: instruments the compiled code 
 - generate-emma-report: generates an html code coverage report 
 The following steps are neccessary in order to generate a code coverage 
 report:
 - add emma.jar and emma_ant.jar to your ant classpath (download emma from 
 http://emma.sourceforge.net/)
 - execute ant target 'emma-instrument' (depends on compile-test, so it will 
 compile all core and test classes)
 - execute ant target 'test' to run the unit tests
 - execute ant target 'generate-emma-report'
 To view the emma report open build/test/emma/index.html

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-707) Lucene Java Site docs

2006-11-26 Thread Grant Ingersoll (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-707?page=comments#action_12453461 ] 

Grant Ingersoll commented on LUCENE-707:


OK, this has been committed and people.apache.org has been updated (but give it 
30-60 minutes to update).   I will leave this issue open for a few days so 
people can post any problems w/ the new site w/o creating a new issue.

I deprecated ant docs in the build.xml (the target still exists, but it now 
does nothing, so if anyone has automated dependencies on this, they will want 
to update them).  I removed lucene/site in favor of keeping the docs 
directory b/c I didn't know how people.apache.org would slurp up the website 
from a different directory and I didn't want to confuse people updating the 
website.  The instructions at 
http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite have been modified 
accordingly.

We now have a favicon (yipee) that is checked in under the site src 
(lucene/src/site/src/documentation/content/xdocs/images) which is just the L 
part of the Lucene image file.

I have not hooked in code coverage (Issue 721) or nightly builds (Issue 708) 
yet but hope to get to that soon.  I have not reverted the api docs to the last 
release, but will also try to get to that soon, as well.

As always, let me know of any issues.  Future changes should be in the form of 
patches.

Thanks,
Grant

 Lucene Java Site docs
 -

 Key: LUCENE-707
 URL: http://issues.apache.org/jira/browse/LUCENE-707
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Website
 Environment: N/A
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor

 It would be really nice if the Java site docs where consistent with the rest 
 of the Lucene family (namely, with navigation tabs, etc.) so that one can 
 easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-726) Remove usage of deprecated method Document.fields()

2006-11-26 Thread Michael Busch (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-726?page=all ]

Michael Busch closed LUCENE-726.


Fix Version/s: 2.1
   Resolution: Fixed

Thanks Otis for committing this!

 Remove usage of deprecated method Document.fields()
 ---

 Key: LUCENE-726
 URL: http://issues.apache.org/jira/browse/LUCENE-726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
 Assigned To: Michael Busch
Priority: Trivial
 Fix For: 2.1

 Attachments: deprecation.patch


 The classes DocumentWriter, FieldsWriter, and ParallelReader use the 
 deprecated method Document.fields(). This simple patch changes these three 
 classes to use Document.getFields() instead.
 All unit tests pass.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-721) Code coverage reports

2006-11-26 Thread Chris Hostetter

: Here it is, Grant. This new patch uses Clover to generate code coverage
: reports. Simply add clover.jar to the ant classpath, do a clean and
: run the target test. During compiling Clover will automatically
: instrument all classes under src/java.

haven't had a chance to look at the patch, but i have two questions baout
this:

1) is there any way to explicitly disable the instumentation (ie: with a
system property set in the build.properties, or on the command line) in
case people get into a situation where they are suspicious of hte
instrumentation and what to run the test without it?

2) what is the beahvior of the report generatation after a test failure?
DOes Clover know baout Ant failures?  would the report reflect the fact
that the tests failed in it's summary info?




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]