[jira] Updated: (LUCENE-559) Turkish Analyzer for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-559?page=all ] Emre Bayram updated LUCENE-559: --- Attachment: IndexFiles.java Turkish Analyzer for Lucene --- Key: LUCENE-559 URL: http://issues.apache.org/jira/browse/LUCENE-559 Project: Lucene - Java Type: Improvement Components: Analysis Reporter: Emre Bayram Attachments: IndexFiles.java, SearchFiles.java, TurkishAnalyzer.java, TurkishAnalyzer.java, TurkishStemFilter.java, TurkishStemFilter.java, TurkishStemmer.java, TurkishStemmer.java I have developed an Analyzer for Turkish, thanks to German Language Analyzer and Brazillian Language Analyzers. This Turkish Analyzer supports iso-8859-9 character set(Turkish) and have a nice stop words set. I hope it can help to Turkish developers who use lucene(i searched many hours for a turkish analyzer for lucene but couldnt find, so i coded and sending it here.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader
+1 Do you want to post it on the user list? It might also be good to put it up on the main website. Otis Gospodnetic wrote: Grant: how to poll users? How about this: http://www.quimble.com/poll/view/2156 ? If you think that's ok, we can send that to java-user tomorrow and see. Hey, how about some bets? I'll put a $10 for a beer on 1.5. Wow, $10 for a beer? That must be some pretty good beer. Either that or you live in New York City and that is a cheap beer! Anyway, I am betting it is 1.5 as well. Maybe we can get together at ApacheCon or something for one... Otis - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Tuesday, June 13, 2006 5:01:30 PM Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader In addition to performance, productivity and functionality benefits, my main argument for 1.5 is that it is used by the vast majority of lucene community members. I am not so sure about this. Perhaps we should take a poll on the user list? Not even sure how that would be managed or counted, but... Everything I write is in 1.5 and I don't have time to backport. I have a significant body of code from which to extract and contribute patches that others would likely find useful. How many others are in a similar position? I definitely would prefer to make future contributions in 1.5 (even the patch we just contributed (issue 545) could have been better given 1.5, but it is fine with 1.4 as well). I tend to think if people don't want the new functionality or if it breaks their app. then they need not upgrade, or they can contribute patches against the branches for prior releases and we can support that as needed. To me, this is what major releases are about. I know that when a major release comes out that I should expect library changes that may break my code. If I don't want that pain, then I don't upgrade. On the side, not leaving valued community members behind is important. I think the pmc / committers just need to make a decision which will impact one group or the other. Chuck Grant Ingersoll wrote on 06/13/2006 03:35 AM: Well, we have our first Java 1.5 patch... Now that we have had a week or two to digest the comments, do we want to reopen the discussion? Chuck Williams (JIRA) wrote: [ http://issues.apache.org/jira/browse/LUCENE-600?page=all ] Chuck Williams updated LUCENE-600: -- Attachment: ParallelWriter.patch Patch to create and integrate ParallelWriter, Writable and TestParallelWriter -- also modifies build to use java 1.5. ParallelWriter companion to ParallelReader -- Key: LUCENE-600 URL: http://issues.apache.org/jira/browse/LUCENE-600 Project: Lucene - Java Type: Improvement Components: Index Versions: 2.1 Reporter: Chuck Williams Attachments: ParallelWriter.patch A new class ParallelWriter is provided that serves as a companion to ParallelReader. ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject to: 1. ParallelWriter.addDocument() is synchronized, which might have an adverse effect on performance. The writes to the sub-indexes are, however, done in parallel. 2. The application must ensure that the ParallelReader is never reopened inside ParallelWriter.addDocument(), else it might find the sub-indexes out of sync. 3. The application must deal with recovery from ParallelWriter.addDocument() exceptions. Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s) in one sub-index that were not successfully added to all sub-indexes, and then optimizing all sub-indexes. A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter. This is in the same spirit as the existing Searchable and Fieldable classes. This implementation uses java 1.5. The patch applies against today's svn head. All tests pass, including the new TestParallelWriter. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader
I'll just send it to java-user in a bit in order to get the answers only from Lucene users (and not peeps just passing by lucene.apache.org). Otis - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Friday, June 16, 2006 6:53:57 AM Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader +1 Do you want to post it on the user list? It might also be good to put it up on the main website. Otis Gospodnetic wrote: Grant: how to poll users? How about this: http://www.quimble.com/poll/view/2156 ? If you think that's ok, we can send that to java-user tomorrow and see. Hey, how about some bets? I'll put a $10 for a beer on 1.5. Wow, $10 for a beer? That must be some pretty good beer. Either that or you live in New York City and that is a cheap beer! Anyway, I am betting it is 1.5 as well. Maybe we can get together at ApacheCon or something for one... Otis - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Tuesday, June 13, 2006 5:01:30 PM Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader In addition to performance, productivity and functionality benefits, my main argument for 1.5 is that it is used by the vast majority of lucene community members. I am not so sure about this. Perhaps we should take a poll on the user list? Not even sure how that would be managed or counted, but... Everything I write is in 1.5 and I don't have time to backport. I have a significant body of code from which to extract and contribute patches that others would likely find useful. How many others are in a similar position? I definitely would prefer to make future contributions in 1.5 (even the patch we just contributed (issue 545) could have been better given 1.5, but it is fine with 1.4 as well). I tend to think if people don't want the new functionality or if it breaks their app. then they need not upgrade, or they can contribute patches against the branches for prior releases and we can support that as needed. To me, this is what major releases are about. I know that when a major release comes out that I should expect library changes that may break my code. If I don't want that pain, then I don't upgrade. On the side, not leaving valued community members behind is important. I think the pmc / committers just need to make a decision which will impact one group or the other. Chuck Grant Ingersoll wrote on 06/13/2006 03:35 AM: Well, we have our first Java 1.5 patch... Now that we have had a week or two to digest the comments, do we want to reopen the discussion? Chuck Williams (JIRA) wrote: [ http://issues.apache.org/jira/browse/LUCENE-600?page=all ] Chuck Williams updated LUCENE-600: -- Attachment: ParallelWriter.patch Patch to create and integrate ParallelWriter, Writable and TestParallelWriter -- also modifies build to use java 1.5. ParallelWriter companion to ParallelReader -- Key: LUCENE-600 URL: http://issues.apache.org/jira/browse/LUCENE-600 Project: Lucene - Java Type: Improvement Components: Index Versions: 2.1 Reporter: Chuck Williams Attachments: ParallelWriter.patch A new class ParallelWriter is provided that serves as a companion to ParallelReader. ParallelWriter meets all of the doc-id synchronization requirements of ParallelReader, subject to: 1. ParallelWriter.addDocument() is synchronized, which might have an adverse effect on performance. The writes to the sub-indexes are, however, done in parallel. 2. The application must ensure that the ParallelReader is never reopened inside ParallelWriter.addDocument(), else it might find the sub-indexes out of sync. 3. The application must deal with recovery from ParallelWriter.addDocument() exceptions. Recovery must restore the synchronization of doc-ids, e.g. by deleting any trailing document(s) in one sub-index that were not successfully added to all sub-indexes, and then optimizing all sub-indexes. A new interface, Writable, is provided to abstract IndexWriter and ParallelWriter. This is in the same spirit as the existing Searchable and Fieldable classes. This implementation uses java 1.5. The patch applies against today's svn head. All tests pass, including the new TestParallelWriter. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll Sr. Software Engineer Center for Natural
GData - Milestone 2
Hello everyone, it was quiet the last week, well I had a bad cold so Milestone 2 starts a bit late... Milestone 2 is about client authentication. GData client auth is also defined (well kind of) in the gdata protocol reference on code.google.com. The client is supposed to support either a cookie base auth or just an auth token send back as an post response. The client authenticates itself via a post request to the servers auth interface sending following parameters: [EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05 the email represents the account name which is associated with a service provided by the server. Each server can provides m services with n feeds. Each feed belongs to one account. As it is quiet hard to figure out whether a client does support plain token or cookie auth I will send both back to the client. after the client has received the auth token or cookie it will call some restricted resource on the server sending either the cookie or the auth token. The cookie contains only the auth token. So these are facts, I will generate a MD5 key as the auth token using the email, password and a timestamp or something similar and save it on the server in a kind of a session storage. the session storage will hold the sessions for a certain time and will invalidate it if it is timed out. Additionally i will save the client ip (at least the first 32 bits) within the session and check it on subsequent requests. So this is fine as long as the server is a stand alone server. What happens if there is a load balancer and a server farm with more than one gdata server instances?! I could define all gdata servers in the cluster / farm in each config file and if a session is created or modified the current server sends a notice to all other servers to replicate the session. (Session is not the HTTPSession). But this could be quiet a lot of work so synchronize all hosts and register / unregister them if the crash... I guess this should be done in a later state of development, I just have 2 month left... So this might be a task for development after the SoC program has finished. Any Ideas about that? yours simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Test failure question
Hi Simon and all, It's not clear to me when setUp()/tearDown() is called. Are they called before/after each call to testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc? If so, then the NUnit is not doing this. I tested by outputing to stdout. I don't have JUnit setup to see what it does, so if someone who has it setup can test and post here I would really appreciate it. Regards, -- George -Original Message- From: Simon Willnauer [mailto:[EMAIL PROTECTED] Sent: Friday, June 16, 2006 12:39 PM To: java-dev@lucene.apache.org Subject: Re: Test failure question On 6/16/06, George Aroush [EMAIL PROTECTED] wrote: Hi folks, I realize this question is not directly related to Lucene, but I believe it's worth asking. With Lucene.Net (for those who don't know, is a port of Jakarta Lucene from Java to C#) I use NUnit to test the same test code (ported to C#) that JUnit test. When I run the NUnit test there are 3 separate test cases where the test is failing if the test is run as a group but will pass if each of those tests run individually. For example, the tests in TestPhraseQuery, which has testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc. When I run the entire test cases by selecting TestPhraseQuery node, the test starts from the top to bottom and testMulipleTerms() will fail. But if I run testMulipleTerms() by itself it will pass. The fail point is on the first assert line in testMulipleTerms() -- which is (in the NUnit world): Assert.AreEqual(1, hits.Length(), two total moves); My question to you is this: does anyone know if JUnit will call setUP() and tearDown() before and after each test method call or is setUp()/tearDown() are only called once at test startup and shutdown? The fail is, I am getting back a 0, where the expected value should be 1. setUp and tearDown will be called before and after each test runs! http://www.junit.org/junit/javadoc/3.8.1/junit/framework/TestCase.html#setUp () and I bet it is the same in NUnit simon This is rather a junit questing Knowing this will help me diagnoses the problem. Regards, -- George - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Test failure question
Hi, testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc? If so, then the NUnit is not doing this. I tested by outputing to stdout. NUnit calls setUp before each test and calls tearDown after each test. Add Console.WriteLine and see the result. Let me show: -- [TestFixture] public class TestPhraseQuery{ [SetUp] protected void SetUp() { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); ... Console.WriteLine(set up); } [TearDown] protected void TearDown() { searcher.Close(); directory.Close(); Console.WriteLine(tear down); } [Test] public void TestNotCloseEnough() { query.SetSlop(2); . MockAssert.AreEqual(0, hits.Length()); Console.WriteLine(not close); } -- The output: --- set up barely tear down set up tear down ... Pasha Bizhan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher
do we need a flag to check open status for IndexWriter and IndexSearcher Key: LUCENE-604 URL: http://issues.apache.org/jira/browse/LUCENE-604 Project: Lucene - Java Type: Wish Versions: 2.0.0 Reporter: Dedian Guo since it is recommended to use IndexWriter and IndexSearcher once, I am not sure if we need a function such as boolean IsOpen() to check the open status of Writer and Searcher. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Created: (LUCENE-603) index optimize problem
Hi Dedian, Can you write a self-contained test case that reproduces the problem? Thanks, Grant Dedian Guo (JIRA) wrote: index optimize problem -- Key: LUCENE-603 URL: http://issues.apache.org/jira/browse/LUCENE-603 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: CentOS 4.0 , Lucene 1.9, Eclipse 3.1 Reporter: Dedian Guo have a function whichi is loop to index batches of documents, after each indexing, the function IndexWriter.optimize will be applied. for several times (not sure how many, but should be many), following exception was thrown out. Exception in thread Thread-0 java.lang.IllegalStateException: docs out of order at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:335) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:298) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:272) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:236) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) -- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 335 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Test failure question
Hi Pasha, That is defiantly not happening in my case. Here is an output: Setup() TestBarelyCloseEnough() TestExact() TestMulipleTerms() TestNotCloseEnough() TestOrderDoesntMatter() TestPhraseQueryInConjunctionScorer() TestPhraseQueryWithStopAnalyzer() TestSlop1() TestSlopScoring() TestWrappedPhrase() TearDown() Which version of NUnit are you using? I am using 2.2.8. Regards, -- George -Original Message- From: Pasha Bizhan [mailto:[EMAIL PROTECTED] Sent: Friday, June 16, 2006 2:07 PM To: java-dev@lucene.apache.org Subject: RE: Test failure question Hi, testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc? If so, then the NUnit is not doing this. I tested by outputing to stdout. NUnit calls setUp before each test and calls tearDown after each test. Add Console.WriteLine and see the result. Let me show: -- [TestFixture] public class TestPhraseQuery{ [SetUp] protected void SetUp() { directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true); ... Console.WriteLine(set up); } [TearDown] protected void TearDown() { searcher.Close(); directory.Close(); Console.WriteLine(tear down); } [Test] public void TestNotCloseEnough() { query.SetSlop(2); . MockAssert.AreEqual(0, hits.Length()); Console.WriteLine(not close); } -- The output: --- set up barely tear down set up tear down ... Pasha Bizhan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher
If you look for a nice way to do that have a look at the solr source http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/util/RefCounted.java?view=markup this is 1.5 source but you can realize that with 1.4 as well ;) simon On 6/16/06, Otis Gospodnetic (JIRA) [EMAIL PROTECTED] wrote: [ http://issues.apache.org/jira/browse/LUCENE-604?page=comments#action_12416572 ] Otis Gospodnetic commented on LUCENE-604: - IW and IS will only get closed if you call close() on them, so you should be able to track their status in your application, no? do we need a flag to check open status for IndexWriter and IndexSearcher Key: LUCENE-604 URL: http://issues.apache.org/jira/browse/LUCENE-604 Project: Lucene - Java Type: Wish Versions: 2.0.0 Reporter: Dedian Guo since it is recommended to use IndexWriter and IndexSearcher once, I am not sure if we need a function such as boolean IsOpen() to check the open status of Writer and Searcher. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index
[ http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12416583 ] Karl Wettin commented on LUCENE-550: There is a bug with phrase queries. Possible term positions. Low priority for me. InstanciatedIndex - faster but memory consuming index - Key: LUCENE-550 URL: http://issues.apache.org/jira/browse/LUCENE-550 Project: Lucene - Java Type: New Feature Components: Store Versions: 1.9 Reporter: Karl Wettin Attachments: InstanciatedIndexTermEnum.java, class_diagram.png, class_diagram.png, instanciated_20060527.tar, lucene.1.9-karl1.jpg After fixing the bugs, it's now 4.5 - 5 times the speed. This is true for both at index and query time. Sorry if I got your hopes up too much. There are still things to be done though. Might not have time to do anything with this until next month, so here is the code if anyone wants a peek. Not good enough for Jira yet, but if someone wants to fool around with it, here it is. The implementation passes a TermEnum - TermDocs - Fields - TermVector comparation against the same data in a Directory. When it comes to features, offsets don't exists and positions are stored ugly and has bugs. You might notice that norms are float[] and not byte[]. That is me who refactored it to see if it would do any good. Bit shifting don't take many ticks, so I might just revert that. I belive the code is quite self explaining. InstanciatedIndex ii = .. ii.new InstanciatedIndexReader(); ii.addDocument(s).. replace IndexWriter for now. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)
It looks like I would have won a beer had anyone wagered me. 1.5 IS the Java version that the majority Lucene users use, not 1.4! Does this mean we can now start accepting 1.5 code? Otis - Original Message From: Otis Gospodnetic [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, June 16, 2006 11:48:15 AM Subject: Survey: Lucene and Java 1.4 vs. 1.5 Hello everyone, If you have 15 seconds to spare, please let us (Lucene developers) know which version of Java you are using with Lucene: 1.4 or 1.5 All it takes is 1 click on one of the two choices: http://www.quimble.com/poll/view/2156 No cheating, please. Thanks! Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GData - Milestone 2
On 17/06/2006, at 6:36 AM, Otis Gospodnetic wrote: Hi Simon, - GData oversion page describes the auth with send a cookie/token, save in server-side, and then expect it from the client on subsequent requests (paraphrased). That sounds fine to me. I don't think you need to worry about the client IP, as long as your cookie/token is long and random enough (please correct me if I'm wrong about this), although you might want to add the IP to the string you base your MD5 checksum on. If you store the token in the session, it will automatically get the TTL of the HttpSession. if you are going to use the IP, and you only use the first 3 quartets (ie 218.214.209 instead of 218.214.209.232) there are several proxy servers out there which load balance HTTP requests through different ip's. regards Ian Otis - Original Message From: Simon Willnauer [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Friday, June 16, 2006 12:48:59 PM Subject: GData - Milestone 2 Hello everyone, it was quiet the last week, well I had a bad cold so Milestone 2 starts a bit late... Milestone 2 is about client authentication. GData client auth is also defined (well kind of) in the gdata protocol reference on code.google.com. The client is supposed to support either a cookie base auth or just an auth token send back as an post response. The client authenticates itself via a post request to the servers auth interface sending following parameters: [EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gu lp-CalGulp-1.05 the email represents the account name which is associated with a service provided by the server. Each server can provides m services with n feeds. Each feed belongs to one account. As it is quiet hard to figure out whether a client does support plain token or cookie auth I will send both back to the client. after the client has received the auth token or cookie it will call some restricted resource on the server sending either the cookie or the auth token. The cookie contains only the auth token. So these are facts, I will generate a MD5 key as the auth token using the email, password and a timestamp or something similar and save it on the server in a kind of a session storage. the session storage will hold the sessions for a certain time and will invalidate it if it is timed out. Additionally i will save the client ip (at least the first 32 bits) within the session and check it on subsequent requests. So this is fine as long as the server is a stand alone server. What happens if there is a load balancer and a server farm with more than one gdata server instances?! I could define all gdata servers in the cluster / farm in each config file and if a session is created or modified the current server sends a notice to all other servers to replicate the session. (Session is not the HTTPSession). But this could be quiet a lot of work so synchronize all hosts and register / unregister them if the crash... I guess this should be done in a later state of development, I just have 2 month left... So this might be a task for development after the SoC program has finished. Any Ideas about that? yours simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GData - Milestone 2
On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Simon, I have a bit of experience with REST and authentication from my work on http://simpy.com . If you look at http://groups.yahoo.com/group/simpy-dev/messages you will see several recent messages about different authentication options that may give you some food for thought. -- good stuff, thanks for the link! As for GData auth: - GData oversion page describes the auth with send a cookie/token, save in server-side, and then expect it from the client on subsequent requests (paraphrased). That sounds fine to me. I don't think you need to worry about the client IP, as long as your cookie/token is long and random enough (please correct me if I'm wrong about this), although you might want to add the IP to the string you base your MD5 checksum on. If you store the token in the session, it will automatically get the TTL of the HttpSession. I already tried the HttpSession approach. Using the http session would solve all my problems. The Session can be replicated as the most containers support session repl. But how do i get the session id from the client. The client sends a request parameter name: Auth value: sessionid but the container does not recognize the session in this case. As far as I know does the session parameter name has to be jsessionid and I only get the session via the HttpServletRequest. Any Idea about this? simon - Running GData server in a cluster might require session replication. It sounds like a big bite for SoC, but ... I never used WADI, but I _think_ that might be easiest way to get session replication going: http://incubator.apache.org/projects/wadi.html On the other hand, WADI might be an overkill if all you want is to share this token. If that's all you need, perhaphs, is JavaSpaces (e.g. http://www.dancres.org/blitz/ ). Otis - Original Message From: Simon Willnauer [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Friday, June 16, 2006 12:48:59 PM Subject: GData - Milestone 2 Hello everyone, it was quiet the last week, well I had a bad cold so Milestone 2 starts a bit late... Milestone 2 is about client authentication. GData client auth is also defined (well kind of) in the gdata protocol reference on code.google.com. The client is supposed to support either a cookie base auth or just an auth token send back as an post response. The client authenticates itself via a post request to the servers auth interface sending following parameters: [EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05 the email represents the account name which is associated with a service provided by the server. Each server can provides m services with n feeds. Each feed belongs to one account. As it is quiet hard to figure out whether a client does support plain token or cookie auth I will send both back to the client. after the client has received the auth token or cookie it will call some restricted resource on the server sending either the cookie or the auth token. The cookie contains only the auth token. So these are facts, I will generate a MD5 key as the auth token using the email, password and a timestamp or something similar and save it on the server in a kind of a session storage. the session storage will hold the sessions for a certain time and will invalidate it if it is timed out. Additionally i will save the client ip (at least the first 32 bits) within the session and check it on subsequent requests. So this is fine as long as the server is a stand alone server. What happens if there is a load balancer and a server farm with more than one gdata server instances?! I could define all gdata servers in the cluster / farm in each config file and if a session is created or modified the current server sends a notice to all other servers to replicate the session. (Session is not the HTTPSession). But this could be quiet a lot of work so synchronize all hosts and register / unregister them if the crash... I guess this should be done in a later state of development, I just have 2 month left... So this might be a task for development after the SoC program has finished. Any Ideas about that? yours simon - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)
go tiger go! everybody not using 1.5 should visite java.sun.com downloading the 1.5 vm!! On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: It looks like I would have won a beer had anyone wagered me. 1.5 IS the Java version that the majority Lucene users use, not 1.4! Does this mean we can now start accepting 1.5 code? gdata already does ;) Otis - Original Message From: Otis Gospodnetic [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent: Friday, June 16, 2006 11:48:15 AM Subject: Survey: Lucene and Java 1.4 vs. 1.5 Hello everyone, If you have 15 seconds to spare, please let us (Lucene developers) know which version of Java you are using with Lucene: 1.4 or 1.5 All it takes is 1 click on one of the two choices: http://www.quimble.com/poll/view/2156 No cheating, please. Thanks! Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GData - Milestone 2
Simon, I don't fully understand your question, but if sessions are replicated, then the GData cluster doesn't care which GData server the client contacts, as they will all already have the token that was given to the client. On subsequent requests, the client will have to send the token. I am not sure if GData protocol specifies how that should be sent - via a query string param or perhaps even a HTTP request header (e.g. X-gdata-auth: SomeTokenHere). The jsessionId carries the HttpSession ID if the client doesn't support, and thus doesn't send back, cookies. If it does suppose cookies, they will be sent via Set-Cookie or some such HTTP request header. I think what you need to do is: - client makes a request - server says you are not authenticated, here is a 401 - client provides credentials - server checks credentials, creates token, saves it to session, and says to client: OK, eat this token. The client saves it - client makes a new request and sends the token (via HTTP request header or via query string param) - server takes the token and compares it to the one stored in the current session [1] - if the tokens match, the server responds with the data, else goto line with 401 above [1] In order for your server (Jetty or Tomcat or whatever) to be able to associate a client with a session, the client must send back the session Id from the first request. This is normal Java webapp behaviour. The client will send it either as a cookie via HTTP headers, or via jsessionid (aka URL rewriting... not to be mixed with mod_rewrite). Regardless of the method, the server (Jetty/Tomcat) will know how to associate the request with an existing insntance of HttpSession, and that's that you'll get from request.getSession(). Otis - Original Message From: Simon Willnauer [EMAIL PROTECTED] To: java-dev@lucene.apache.org; Otis Gospodnetic [EMAIL PROTECTED] Sent: Friday, June 16, 2006 4:53:21 PM Subject: Re: GData - Milestone 2 On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi Simon, I have a bit of experience with REST and authentication from my work on http://simpy.com . If you look at http://groups.yahoo.com/group/simpy-dev/messages you will see several recent messages about different authentication options that may give you some food for thought. -- good stuff, thanks for the link! As for GData auth: - GData oversion page describes the auth with send a cookie/token, save in server-side, and then expect it from the client on subsequent requests (paraphrased). That sounds fine to me. I don't think you need to worry about the client IP, as long as your cookie/token is long and random enough (please correct me if I'm wrong about this), although you might want to add the IP to the string you base your MD5 checksum on. If you store the token in the session, it will automatically get the TTL of the HttpSession. I already tried the HttpSession approach. Using the http session would solve all my problems. The Session can be replicated as the most containers support session repl. But how do i get the session id from the client. The client sends a request parameter name: Auth value: sessionid but the container does not recognize the session in this case. As far as I know does the session parameter name has to be jsessionid and I only get the session via the HttpServletRequest. Any Idea about this? simon - Running GData server in a cluster might require session replication. It sounds like a big bite for SoC, but ... I never used WADI, but I _think_ that might be easiest way to get session replication going: http://incubator.apache.org/projects/wadi.html On the other hand, WADI might be an overkill if all you want is to share this token. If that's all you need, perhaphs, is JavaSpaces (e.g. http://www.dancres.org/blitz/ ). Otis - Original Message From: Simon Willnauer [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Friday, June 16, 2006 12:48:59 PM Subject: GData - Milestone 2 Hello everyone, it was quiet the last week, well I had a bad cold so Milestone 2 starts a bit late... Milestone 2 is about client authentication. GData client auth is also defined (well kind of) in the gdata protocol reference on code.google.com. The client is supposed to support either a cookie base auth or just an auth token send back as an post response. The client authenticates itself via a post request to the servers auth interface sending following parameters: [EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05 the email represents the account name which is associated with a service provided by the server. Each server can provides m services with n feeds. Each feed belongs to one account. As it is quiet hard to figure out whether a client does support plain token or cookie auth I will send both back to the client. after the client has received the auth token or cookie it will call some
Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)
1.5 IS the Java version that the majority Lucene users use, not 1.4! Does this mean we can now start accepting 1.5 code? This isn't simply about which JVM gets used the most wins. This is about how many Lucene users will we inconvenience or lose by moving to 1.5? Right now the survey sample tells me roughly a third which doesn't seem like a good thing. Maybe the question is more usefully who can't/won't move to 1.5 in the immediate future? I believe we shouldn't select the minimum platform based on the coding convenience it may offer us which seems to be the major objective behind 1.5 adoption. When developing a library deployed in many applications/environments over which you have no control and where careful consideration of runtime performance not coding convenience/speed of development is the primary concern my preference would be to choose 1.4. Not all deployment environments can be upgraded easily. Take my current application at work. It's applet-based and rolled out to hundreds of corporate desktops which are stuck on 1.4 (this won't change anytime soon). Lucene isn't on the client but all client and server code in the app has been written in 1.4 to avoid any issues of any 1.5 code leaking onto the 1.4 client. All of the many 3rd party libraries in use (Spring, database drivers etc) are 1.4 compatible in their latest versions. I'd like to stick with the latest Lucene codebase but mandating 1.5 for Lucene would introduce a code management headache to this app with the mixed JVMs Unless there are *really* good runtime benefits that are solely based on 1.5 libraries or source code I would prefer to see Lucene stick with 1.4 as a base rather than limit Lucene's deployment options simply because of code-time benefits the new 1.5 syntax offers. I see that the Spring framework recognise this dilemma and still seek to support as far back as 1.3 (see http://www.springframework.org/node/220). Simon said everyone should download 1.5. It's nice to think you can accelerate the global adoption of 1.5 by changing projects like Lucene but the reality is corporates do not change platforms overnight because of such a change. That's a long-winded way of saying -1 unless I hear of any arguments which are based on something much more substantial than 1.5 makes coding easier. Cheers, Mark ___ The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-605) Make Explanation include information about match/non-match
Make Explanation include information about match/non-match -- Key: LUCENE-605 URL: http://issues.apache.org/jira/browse/LUCENE-605 Project: Lucene - Java Type: Improvement Components: Search Reporter: Hoss Man Assigned to: Hoss Man As discussed, I'm looking into the possibility of improving the Explanation class to include some basic info about the match status of the Explanation -- independent of the value... http://www.nabble.com/BooleanWeight.normalize%28float%29-doesn%27t-normalize-prohibited-clauses--t1596471.html#a4347644 This is neccesary to deal with things like LUCENE-451 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-605) Make Explanation include information about match/non-match
[ http://issues.apache.org/jira/browse/LUCENE-605?page=all ] Hoss Man updated LUCENE-605: Attachment: demo-fix.patch Demo of the basic direction I'm going. This patch inlcudes some changes to the Explanation class to include the new information, as well as some tweaks to TermQuery and BooleanQuery to take advantage of it. NOTE: the BooleanQuery changes in this patch overlap with he patches in LUCENE-557 Make Explanation include information about match/non-match -- Key: LUCENE-605 URL: http://issues.apache.org/jira/browse/LUCENE-605 Project: Lucene - Java Type: Improvement Components: Search Reporter: Hoss Man Assignee: Hoss Man Attachments: demo-fix.patch As discussed, I'm looking into the possibility of improving the Explanation class to include some basic info about the match status of the Explanation -- independent of the value... http://www.nabble.com/BooleanWeight.normalize%28float%29-doesn%27t-normalize-prohibited-clauses--t1596471.html#a4347644 This is neccesary to deal with things like LUCENE-451 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-451) BooleanQuery explain with boost==0
[ http://issues.apache.org/jira/browse/LUCENE-451?page=all ] Hoss Man reassigned LUCENE-451: --- Assign To: Hoss Man BooleanQuery explain with boost==0 -- Key: LUCENE-451 URL: http://issues.apache.org/jira/browse/LUCENE-451 Project: Lucene - Java Type: Bug Components: Search Versions: CVS Nightly - Specify date in submission Reporter: Yonik Seeley Assignee: Hoss Man Priority: Minor BooleanWeight.explain() uses the returned score of subweights to determine if a clause matched. If any required clause has boost==0, the returned score will be zero and the explain for the entire BooleanWeight will be simply Explanation(0.0f, match required). I'm not sure what the correct fix is here. I don't think it can be done based on score alone, since that isn't how scorers work. Perhaps we need a new method boolean Explain.matched() that returns true on a match, regardless of what the score may be? Related to the problem above, even if no boosts are zero, it it sometimes nice to know *why* a particular query failed to match. It would mean a longer explanation, but maybe we should include non matching explains too? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]