Re: Controlling Hits
On Fri, Nov 24, 2006, Otis Gospodnetic wrote about Controlling Hits: Hi, Could we make Hits non-final, or at least expose something in Hits to control the number of Documents it reads from disk? ... Or maybe the answer is: Use the search method that returns TopDocs if you want more control...? In an application I was writing, I was facing similar issues: Hits was fine for a short Demo in Lucene, but when it came to a real application, it didn't give me enough control: it reran the search too many times when you wanted to see, e.g, the 20th result page, and wouldn't allow me adding a HitCollector which I needed. I started by modifying Hits (which wasn't just final - much of its functionality was private), but then realized: there's simply no reason to use Hits! IndexSearcher.search() which returns TopDocs already gives you full control, and frankly isn't that much harder to use. In fact, I fail to see a situation where Hits's concept of random access to the results (you can ask for result #30 and then #70) even makes sense. In all search applications I'm familar with, at the time you call search(), you already know how many results you want to display - and you don't need someone to guess for you that you need 50 results, and if that's not enough then you need 100 results, and then 200, and so on. And since this concept of random access is what differenciates Hits from TopDocs, perhaps we don't need Hits at all? So, how about deprecating Hits altogether, and recommending the TopDocs alternatives instead? -- Nadav Har'El| Sunday, Nov 26 2006, 5 Kislev 5767 IBM Haifa Research Lab |- |God created the world out of nothing, but http://nadav.harel.org.il |the nothingness still shows through. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Storing GData Feeds -- need your help!
On 11/25/06, Vic Bancroft [EMAIL PROTECTED] wrote: Simon Willnauer wrote: I'm actually looking for alternatives and suggestions about this topic, I know you guys have your own schedule but I would really appreciate some help with that. Perhaps the derby would be appropriate, I was looking at derby already, but storing xml in RDBMS is either quiet tricky if you wanna get a reasonable performance as RDBMS store data in tables not in a tree like xml does. Fetching data should be really fast and storing the whole xml as a BLOB would afford parsing the xml after the sql query returns. Parsing is supposed to be a bottleneck in this scenario. I will give otis suggestion another go. Berkley DB could offer quiet good performance for that case. cheers simon http://db.apache.org/derby/ It is fairly fast, somewhat scalable and looks like a relational store . . . if anyone of you have any idea about the license stuff (http://www.gossamer-threads.com/lists/lucene/java-dev/42253), who I have to contact and if it is ok for the PMC a short answer would be great. Derby is available under the Apache License, Version 2.0 http://www.apache.org/licenses/ . . . more, l8r, v -- The future is here. It's just not evenly distributed yet. -- William Gibson, quoted by Whitfield Diffie - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-669) finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close already closed file
[ http://issues.apache.org/jira/browse/LUCENE-669?page=comments#action_12453434 ] Michael McCandless commented on LUCENE-669: --- Hmmm. Michael, how does the exception in this unit test tie into this issue? Ie, I thought this issue was that only finalize would be doing a double-close? I'm confused how the two are connected (it's awesome that your patch fixes this, but I'd like to understand why!). finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close already closed file - Key: LUCENE-669 URL: http://issues.apache.org/jira/browse/LUCENE-669 Project: Lucene - Java Issue Type: Bug Components: Store Reporter: Michael Busch Assigned To: Michael Busch Priority: Trivial Attachments: FSDirectory_close_file2.patch Hi all, I found a small problem in FSDirectory: The finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close the underlying file. This is not a problem unless the file has been closed before by calling the close() method. If it has been closed before, the finalize method throws an IOException saying that the file is already closed. Usually this IOException would go unnoticed, because the GarbageCollector, which calls finalize(), just eats it. However, if I use the Eclipse debugger the execution of my code will always be suspended when this exception is thrown. Even though this exception probably won't cause problems during normal execution of Lucene, the code becomes cleaner if we apply this small patch. Might this IOException also have a performance impact, if it is thrown very frequently? I attached the patch which applies cleanly on the current svn HEAD. All testcases pass and I verfied with the Eclipse debugger that the IOException is not longer thrown. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Storing GData Feeds -- need your help!
22 nov 2006 kl. 15.17 skrev Simon Willnauer: Is it that what you pointing to?! Indeed. I must have missunderstood your question. There is still the problem with saving the data. Any further ideas?! Could you plain old use the file system? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-728) Remove or deprecate contrib/similarity
Remove or deprecate contrib/similarity -- Key: LUCENE-728 URL: http://issues.apache.org/jira/browse/LUCENE-728 Project: Lucene - Java Issue Type: Task Components: Search Reporter: Otis Gospodnetic Assigned To: Otis Gospodnetic Priority: Minor Fix For: 2.0.1 Classes under contrib/similarity seem to be duplicates of classes under contrib/queries. I'd like to remove *.java from contrib/similarity without bothering with deprecation, since the same functionality exists in contrib/queries. Anyone minds? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Duplicate MoreLikeThis.java
I'll remove stuff in contrib/similarity later this week, to give people time to object, should they feel like it. It was Thanksgiving in the U.S., so a lot of people are out chasing turkeys and not staring at the black box. Otis - Original Message From: markharw00d [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Friday, November 24, 2006 7:36:28 PM Subject: Re: Duplicate MoreLikeThis.java I believe they are the same but the one to keep is in contrib/queries. The queries directory was suggested as a better location for organising contrib code - see here: http://www.gossamer-threads.com/lists/lucene/java-dev/32872#32872 I chose to copy MoreLikeThis to contrib/queries and not remove contrib/similarity at the time to avoid breaking any dependencies but I suspect the time is right to remove contrib/similarity now. Any objections? Send instant messages to your online friends http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-721) Code coverage reports
[ http://issues.apache.org/jira/browse/LUCENE-721?page=all ] Grant Ingersoll reassigned LUCENE-721: -- Assignee: Grant Ingersoll (was: Michael Busch) Code coverage reports - Key: LUCENE-721 URL: http://issues.apache.org/jira/browse/LUCENE-721 Project: Lucene - Java Issue Type: New Feature Components: Other Reporter: Michael Busch Assigned To: Grant Ingersoll Priority: Minor Attachments: clover.patch, code_coverage.patch, emma_report.zip Hi all, We should be able to measure the code coverage of our unit testcases. I believe it would be very helpful for the committers, if they could verify before committing a patch if it does not reduce the coverage. Furthermore people could take a look in the code coverage reports to figure out where work needs to be done, i. e. where additional testcases are neccessary. It would be nice if we could add a page to the Lucene website showing the report, generated by the nightly build. Maybe you could add that to your preview page (LUCENE-707), Grant? I attach a patch here that uses the tool EMMA to generate the code coverage reports. EMMA is a very nice open-source tool released under the CPL (same license as junit). The patch adds three targets to common-build.xml: - emma-check: verifys if both emma.jar and emma_ant.jar are in the ant classpath - emma-instrument: instruments the compiled code - generate-emma-report: generates an html code coverage report The following steps are neccessary in order to generate a code coverage report: - add emma.jar and emma_ant.jar to your ant classpath (download emma from http://emma.sourceforge.net/) - execute ant target 'emma-instrument' (depends on compile-test, so it will compile all core and test classes) - execute ant target 'test' to run the unit tests - execute ant target 'generate-emma-report' To view the emma report open build/test/emma/index.html -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-707) Lucene Java Site docs
[ http://issues.apache.org/jira/browse/LUCENE-707?page=comments#action_12453461 ] Grant Ingersoll commented on LUCENE-707: OK, this has been committed and people.apache.org has been updated (but give it 30-60 minutes to update). I will leave this issue open for a few days so people can post any problems w/ the new site w/o creating a new issue. I deprecated ant docs in the build.xml (the target still exists, but it now does nothing, so if anyone has automated dependencies on this, they will want to update them). I removed lucene/site in favor of keeping the docs directory b/c I didn't know how people.apache.org would slurp up the website from a different directory and I didn't want to confuse people updating the website. The instructions at http://wiki.apache.org/jakarta-lucene/HowToUpdateTheWebsite have been modified accordingly. We now have a favicon (yipee) that is checked in under the site src (lucene/src/site/src/documentation/content/xdocs/images) which is just the L part of the Lucene image file. I have not hooked in code coverage (Issue 721) or nightly builds (Issue 708) yet but hope to get to that soon. I have not reverted the api docs to the last release, but will also try to get to that soon, as well. As always, let me know of any issues. Future changes should be in the form of patches. Thanks, Grant Lucene Java Site docs - Key: LUCENE-707 URL: http://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-726) Remove usage of deprecated method Document.fields()
[ http://issues.apache.org/jira/browse/LUCENE-726?page=all ] Michael Busch closed LUCENE-726. Fix Version/s: 2.1 Resolution: Fixed Thanks Otis for committing this! Remove usage of deprecated method Document.fields() --- Key: LUCENE-726 URL: http://issues.apache.org/jira/browse/LUCENE-726 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assigned To: Michael Busch Priority: Trivial Fix For: 2.1 Attachments: deprecation.patch The classes DocumentWriter, FieldsWriter, and ParallelReader use the deprecated method Document.fields(). This simple patch changes these three classes to use Document.getFields() instead. All unit tests pass. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Updated: (LUCENE-721) Code coverage reports
: Here it is, Grant. This new patch uses Clover to generate code coverage : reports. Simply add clover.jar to the ant classpath, do a clean and : run the target test. During compiling Clover will automatically : instrument all classes under src/java. haven't had a chance to look at the patch, but i have two questions baout this: 1) is there any way to explicitly disable the instumentation (ie: with a system property set in the build.properties, or on the command line) in case people get into a situation where they are suspicious of hte instrumentation and what to run the test without it? 2) what is the beahvior of the report generatation after a test failure? DOes Clover know baout Ant failures? would the report reflect the fact that the tests failed in it's summary info? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]