date:20060616

[jira] Updated: (LUCENE-559) Turkish Analyzer for Lucene

2006-06-16 Thread Emre Bayram (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-559?page=all ]

Emre Bayram updated LUCENE-559:
---

Attachment: IndexFiles.java

 Turkish Analyzer for Lucene
 ---

  Key: LUCENE-559
  URL: http://issues.apache.org/jira/browse/LUCENE-559
  Project: Lucene - Java
 Type: Improvement

   Components: Analysis
 Reporter: Emre Bayram
  Attachments: IndexFiles.java, SearchFiles.java, TurkishAnalyzer.java, 
 TurkishAnalyzer.java, TurkishStemFilter.java, TurkishStemFilter.java, 
 TurkishStemmer.java, TurkishStemmer.java

 I have developed an Analyzer for Turkish, thanks to German Language Analyzer 
 and Brazillian Language Analyzers.
 This Turkish Analyzer supports iso-8859-9 character set(Turkish) and have a 
 nice stop words set. I hope it can help to Turkish developers who use 
 lucene(i searched many hours for a turkish analyzer for lucene but couldnt 
 find, so i coded and sending it here.)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader

2006-06-16 Thread Grant Ingersoll


+1

Do you want to post it on the user list?  It might also be good to put 
it up on the main website.


Otis Gospodnetic wrote:

Grant: how to poll users?  How about this: 
http://www.quimble.com/poll/view/2156 ?  If you think that's ok, we can send 
that to java-user tomorrow and see.  Hey, how about some bets?  I'll put a $10 
for a beer on 1.5.

  
Wow, $10 for a beer?  That must be some pretty good beer.  Either that 
or you live in New York City and that is a cheap beer!  Anyway, I am 
betting it is 1.5 as well.  Maybe we can get together at ApacheCon or 
something for one...





Otis

- Original Message 
From: Grant Ingersoll [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Tuesday, June 13, 2006 5:01:30 PM
Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion 
to ParallelReader


  

In addition to performance, productivity and functionality benefits, my
main argument for 1.5 is that it is used by the vast majority of lucene
community members.  



I am not so sure about this. Perhaps we should take a poll on the user 
list?  Not even sure how that would be managed or counted, but...


  

Everything I write is in 1.5 and I don't have time
to backport.  I have a significant body of code from which to extract
and contribute patches that others would likely find useful.  How many
others are in a similar position?
  

I definitely would prefer to make future contributions in 1.5 (even the 
patch we just contributed (issue 545) could have been better given 1.5, 
but it is fine with 1.4 as well).  I tend to think if people don't want 
the new functionality or if it breaks their app. then they need not 
upgrade, or they can contribute patches against the branches for prior 
releases and we can support that as needed.   To me, this is what major 
releases are about.  I know that when a major release comes out that I 
should expect library changes that may break my code.  If I don't want 
that pain, then I don't upgrade.
  

On the side, not leaving valued community members behind is important.

I think the pmc / committers just need to make a decision which will
impact one group or the other.

Chuck


Grant Ingersoll wrote on 06/13/2006 03:35 AM:
  


Well, we have our first Java 1.5 patch...  Now that we have had a week
or two to digest the comments, do we want to reopen the discussion?

Chuck Williams (JIRA) wrote:

  

 [ http://issues.apache.org/jira/browse/LUCENE-600?page=all ]

Chuck Williams updated LUCENE-600:
--

Attachment: ParallelWriter.patch

Patch to create and integrate ParallelWriter, Writable and
TestParallelWriter -- also modifies build to use java 1.5.


 
  


ParallelWriter companion to ParallelReader
--

 Key: LUCENE-600
 URL: http://issues.apache.org/jira/browse/LUCENE-600
 Project: Lucene - Java
Type: Improvement


  
 
  


  Components: Index
Versions: 2.1
Reporter: Chuck Williams
 Attachments: ParallelWriter.patch

A new class ParallelWriter is provided that serves as a companion to
ParallelReader.  ParallelWriter meets all of the doc-id
synchronization requirements of ParallelReader, subject to:
1.  ParallelWriter.addDocument() is synchronized, which might
have an adverse effect on performance.  The writes to the
sub-indexes are, however, done in parallel.
2.  The application must ensure that the ParallelReader is never
reopened inside ParallelWriter.addDocument(), else it might find the
sub-indexes out of sync.
3.  The application must deal with recovery from
ParallelWriter.addDocument() exceptions.  Recovery must restore the
synchronization of doc-ids, e.g. by deleting any trailing
document(s) in one sub-index that were not successfully added to all
sub-indexes, and then optimizing all sub-indexes.
A new interface, Writable, is provided to abstract IndexWriter and
ParallelWriter.  This is in the same spirit as the existing
Searchable and Fieldable classes.
This implementation uses java 1.5.  The patch applies against
today's svn head.  All tests pass, including the new
TestParallelWriter.


  
  
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


  



  


--

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader

2006-06-16 Thread Otis Gospodnetic

I'll just send it to java-user in a bit in order to get the answers only from 
Lucene users (and not peeps just passing by lucene.apache.org).

Otis

- Original Message 
From: Grant Ingersoll [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Friday, June 16, 2006 6:53:57 AM
Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion 
to ParallelReader

+1

Do you want to post it on the user list?  It might also be good to put 
it up on the main website.

Otis Gospodnetic wrote:
 Grant: how to poll users?  How about this: 
 http://www.quimble.com/poll/view/2156 ?  If you think that's ok, we can send 
 that to java-user tomorrow and see.  Hey, how about some bets?  I'll put a 
 $10 for a beer on 1.5.

Wow, $10 for a beer?  That must be some pretty good beer.  Either that 
or you live in New York City and that is a cheap beer!  Anyway, I am 
betting it is 1.5 as well.  Maybe we can get together at ApacheCon or 
something for one...

 Otis

 - Original Message 
 From: Grant Ingersoll [EMAIL PROTECTED]
 To: java-dev@lucene.apache.org
 Sent: Tuesday, June 13, 2006 5:01:30 PM
 Subject: Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter 
 companion to ParallelReader

 In addition to performance, productivity and functionality benefits, my
 main argument for 1.5 is that it is used by the vast majority of lucene
 community members.  

 I am not so sure about this. Perhaps we should take a poll on the user 
 list?  Not even sure how that would be managed or counted, but...

 Everything I write is in 1.5 and I don't have time
 to backport.  I have a significant body of code from which to extract
 and contribute patches that others would likely find useful.  How many
 others are in a similar position?

 I definitely would prefer to make future contributions in 1.5 (even the 
 patch we just contributed (issue 545) could have been better given 1.5, 
 but it is fine with 1.4 as well).  I tend to think if people don't want 
 the new functionality or if it breaks their app. then they need not 
 upgrade, or they can contribute patches against the branches for prior 
 releases and we can support that as needed.   To me, this is what major 
 releases are about.  I know that when a major release comes out that I 
 should expect library changes that may break my code.  If I don't want 
 that pain, then I don't upgrade.

 On the side, not leaving valued community members behind is important.

 I think the pmc / committers just need to make a decision which will
 impact one group or the other.

 Chuck

 Grant Ingersoll wrote on 06/13/2006 03:35 AM:

 Well, we have our first Java 1.5 patch...  Now that we have had a week
 or two to digest the comments, do we want to reopen the discussion?

 Chuck Williams (JIRA) wrote:

  [ http://issues.apache.org/jira/browse/LUCENE-600?page=all ]

 Chuck Williams updated LUCENE-600:
 --

 Attachment: ParallelWriter.patch

 Patch to create and integrate ParallelWriter, Writable and
 TestParallelWriter -- also modifies build to use java 1.5.

 ParallelWriter companion to ParallelReader
 --

  Key: LUCENE-600
  URL: http://issues.apache.org/jira/browse/LUCENE-600
  Project: Lucene - Java
 Type: Improvement

   Components: Index
 Versions: 2.1
 Reporter: Chuck Williams
  Attachments: ParallelWriter.patch

 A new class ParallelWriter is provided that serves as a companion to
 ParallelReader.  ParallelWriter meets all of the doc-id
 synchronization requirements of ParallelReader, subject to:
 1.  ParallelWriter.addDocument() is synchronized, which might
 have an adverse effect on performance.  The writes to the
 sub-indexes are, however, done in parallel.
 2.  The application must ensure that the ParallelReader is never
 reopened inside ParallelWriter.addDocument(), else it might find the
 sub-indexes out of sync.
 3.  The application must deal with recovery from
 ParallelWriter.addDocument() exceptions.  Recovery must restore the
 synchronization of doc-ids, e.g. by deleting any trailing
 document(s) in one sub-index that were not successfully added to all
 sub-indexes, and then optimizing all sub-indexes.
 A new interface, Writable, is provided to abstract IndexWriter and
 ParallelWriter.  This is in the same spirit as the existing
 Searchable and Fieldable classes.
 This implementation uses java 1.5.  The patch applies against
 today's svn head.  All tests pass, including the new
 TestParallelWriter.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]

-- 

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural

GData - Milestone 2

2006-06-16 Thread Simon Willnauer


Hello everyone,

it was quiet the last week, well I had a bad cold so Milestone 2
starts a bit late...
Milestone 2 is about client authentication. GData client auth is also
defined (well kind of) in the gdata protocol reference on
code.google.com. The client is supposed to support either a cookie
base auth or just an auth token send back as an post response. The
client authenticates itself via a post request to the servers auth
interface sending following parameters:

[EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05

the email represents the account name which is associated with a
service provided by the server. Each server can provides m services
with n feeds. Each feed belongs to one account.
As it is quiet hard to figure out whether a client does support plain
token or cookie auth I will send both back to the client. after the
client has received the auth token or cookie it will call some
restricted resource on the server sending either the cookie or the
auth token. The cookie contains  only the auth token.
So these are facts, I will generate a MD5 key as the auth token using
the email, password and a timestamp or something similar and save it
on the server in a kind of a session storage. the session storage will
hold the sessions for a certain time and will invalidate it if it is
timed out. Additionally i will save  the client ip (at least the first
32 bits) within the session and check it on subsequent  requests. So
this is fine as long as the server is a stand alone server. What
happens if there is a load balancer and a server farm with more than
one gdata server instances?!
I could define all gdata servers in the cluster / farm in each config
file and if a session is created or modified the current server sends
a notice to all other servers to replicate the session. (Session is
not the HTTPSession). But this could be quiet a lot of work so
synchronize all hosts and register / unregister them if the crash...
I guess this should be done in a later state of development, I just
have 2 month left... So this might be a task for development after the
SoC program has finished.

Any Ideas about that?

yours simon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Test failure question

2006-06-16 Thread George Aroush

Hi Simon and all,

It's not clear to me when setUp()/tearDown() is called.  Are they called
before/after each call to testBarelyCloseEnough(), testExact(),
testMulipleTerms(), etc?  If so, then the NUnit is not doing this.  I tested
by outputing to stdout.

I don't have JUnit setup to see what it does, so if someone who has it setup
can test and post here I would really appreciate it.

Regards,

-- George

-Original Message-
From: Simon Willnauer [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 16, 2006 12:39 PM
To: java-dev@lucene.apache.org
Subject: Re: Test failure question

On 6/16/06, George Aroush [EMAIL PROTECTED] wrote:
 Hi folks,

 I realize this question is not directly related to Lucene, but I 
 believe it's worth asking.

 With Lucene.Net (for those who don't know, is a port of Jakarta Lucene 
 from Java to C#) I use NUnit to test the same test code (ported to C#) 
 that JUnit test.  When I run the NUnit test there are 3 separate test 
 cases where the test is failing if the test is run as a group but will 
 pass if each of those tests run individually.

 For example, the tests in TestPhraseQuery, which has 
 testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc.  When I 
 run the entire test cases by selecting TestPhraseQuery node, the test 
 starts from the top to bottom and testMulipleTerms() will fail.  But 
 if I run
 testMulipleTerms() by itself it will pass.  The fail point is on the 
 first assert line in testMulipleTerms() -- which is (in the NUnit world):

 Assert.AreEqual(1, hits.Length(), two total moves);

 My question to you is this: does anyone know if JUnit will call 
 setUP() and
 tearDown() before and after each test method call or is 
 setUp()/tearDown() are only called once at test startup and shutdown?  
 The fail is, I am getting back a 0, where the expected value should be 1.

setUp and tearDown will be called before and after each test runs!
http://www.junit.org/junit/javadoc/3.8.1/junit/framework/TestCase.html#setUp
()
and I bet it is the same in NUnit

simon
This is rather a junit questing
 Knowing this will help me diagnoses the problem.

 Regards,

 -- George


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Test failure question

2006-06-16 Thread Pasha Bizhan

Hi,

 testBarelyCloseEnough(), testExact(), testMulipleTerms(), 
 etc?  If so, then the NUnit is not doing this.  I tested by 
 outputing to stdout.

NUnit calls setUp before each test and calls tearDown after each test.
Add Console.WriteLine and see the result.

Let me show:
--
[TestFixture]
public class TestPhraseQuery{
[SetUp]
protected void SetUp()  {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true);
...
Console.WriteLine(set up);
}


[TearDown]
protected void TearDown()   {
searcher.Close(); directory.Close();
Console.WriteLine(tear down);
}


[Test]
public void TestNotCloseEnough()  {
query.SetSlop(2);
.
MockAssert.AreEqual(0, hits.Length());
Console.WriteLine(not close);
}
--
The output:
---
set up
barely
tear down

set up
tear down
...


Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher

2006-06-16 Thread Dedian Guo (JIRA)

do we need a flag to check open status for IndexWriter and IndexSearcher


 Key: LUCENE-604
 URL: http://issues.apache.org/jira/browse/LUCENE-604
 Project: Lucene - Java
Type: Wish

Versions: 2.0.0
Reporter: Dedian Guo


since it is recommended to use IndexWriter and IndexSearcher once, I am not 
sure if we need a function such as boolean IsOpen() to check the open status of 
Writer and Searcher.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-603) index optimize problem

2006-06-16 Thread Grant Ingersoll


Hi Dedian,

Can you write a self-contained test case that reproduces the problem?

Thanks,
Grant

Dedian Guo (JIRA) wrote:

index optimize problem
--

 Key: LUCENE-603
 URL: http://issues.apache.org/jira/browse/LUCENE-603
 Project: Lucene - Java
Type: Bug

  Components: Index  
Versions: 1.9
 Environment: CentOS 4.0 , Lucene 1.9, Eclipse 3.1

Reporter: Dedian Guo


have a function whichi is loop to index batches of documents, after each 
indexing, the function IndexWriter.optimize will be applied. for several times 
(not sure how many, but should be many), following exception was thrown out.

Exception in thread Thread-0 java.lang.IllegalStateException: docs out of 
order
at 
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:335)
at 
org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:298)
at 
org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:272)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:236)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89)
at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681)
at 
org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517)

  


--

Grant Ingersoll 
Sr. Software Engineer 
Center for Natural Language Processing 
Syracuse University 
School of Information Studies 
335 Hinds Hall 
Syracuse, NY 13244 

http://www.cnlp.org 
Voice:  315-443-5484 
Fax: 315-443-6886 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Test failure question

2006-06-16 Thread George Aroush

Hi Pasha,

That is defiantly not happening in my case.  Here is an output:

Setup()
TestBarelyCloseEnough()
TestExact()
TestMulipleTerms()
TestNotCloseEnough()
TestOrderDoesntMatter()
TestPhraseQueryInConjunctionScorer()
TestPhraseQueryWithStopAnalyzer()
TestSlop1()
TestSlopScoring()
TestWrappedPhrase()
TearDown()

Which version of NUnit are you using?  I am using 2.2.8.

Regards,

-- George

-Original Message-
From: Pasha Bizhan [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 16, 2006 2:07 PM
To: java-dev@lucene.apache.org
Subject: RE: Test failure question

Hi,

 testBarelyCloseEnough(), testExact(), testMulipleTerms(), etc?  If so, 
 then the NUnit is not doing this.  I tested by outputing to stdout.

NUnit calls setUp before each test and calls tearDown after each test.
Add Console.WriteLine and see the result.

Let me show:
--
[TestFixture]
public class TestPhraseQuery{
[SetUp]
protected void SetUp()  {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(), true);
...
Console.WriteLine(set up);
}


[TearDown]
protected void TearDown()   {
searcher.Close(); directory.Close();
Console.WriteLine(tear down);
}


[Test]
public void TestNotCloseEnough()  {
query.SetSlop(2);
.
MockAssert.AreEqual(0, hits.Length());
Console.WriteLine(not close);
}
--
The output:
---
set up
barely
tear down

set up
tear down
...


Pasha Bizhan


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Commented: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher

2006-06-16 Thread Simon Willnauer


If you look for a nice way to do that have a look at the solr source

http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/util/RefCounted.java?view=markup

this is 1.5 source but you can realize that with 1.4 as well ;)

simon

On 6/16/06, Otis Gospodnetic (JIRA) [EMAIL PROTECTED] wrote:

[ 
http://issues.apache.org/jira/browse/LUCENE-604?page=comments#action_12416572 ]

Otis Gospodnetic commented on LUCENE-604:
-

IW and IS will only get closed if you call close() on them, so you should be 
able to track their status in your application, no?

 do we need a flag to check open status for IndexWriter and IndexSearcher
 

  Key: LUCENE-604
  URL: http://issues.apache.org/jira/browse/LUCENE-604
  Project: Lucene - Java
 Type: Wish

 Versions: 2.0.0
 Reporter: Dedian Guo


 since it is recommended to use IndexWriter and IndexSearcher once, I am not 
sure if we need a function such as boolean IsOpen() to check the open status of 
Writer and Searcher.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

2006-06-16 Thread Karl Wettin (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12416583 ] 

Karl Wettin commented on LUCENE-550:


There is a bug with phrase queries. Possible term positions. Low priority for 
me.

 InstanciatedIndex - faster but memory consuming index
 -

  Key: LUCENE-550
  URL: http://issues.apache.org/jira/browse/LUCENE-550
  Project: Lucene - Java
 Type: New Feature

   Components: Store
 Versions: 1.9
 Reporter: Karl Wettin
  Attachments: InstanciatedIndexTermEnum.java, class_diagram.png, 
 class_diagram.png, instanciated_20060527.tar, lucene.1.9-karl1.jpg

 After fixing the bugs, it's now 4.5 - 5 times the speed. This is true for 
 both at index and query time. Sorry if I got your hopes up too much. There 
 are still things to be done though. Might not have time to do anything with 
 this until next month, so here is the code if anyone wants a peek.
 Not good enough for Jira yet, but if someone wants to fool around with it, 
 here it is. The implementation passes a TermEnum - TermDocs - Fields - 
 TermVector comparation against the same data in a Directory.
 When it comes to features, offsets don't exists and positions are stored ugly 
 and has bugs.
 You might notice that norms are float[] and not byte[]. That is me who 
 refactored it to see if it would do any good. Bit shifting don't take many 
 ticks, so I might just revert that.
 I belive the code is quite self explaining.
 InstanciatedIndex ii = ..
 ii.new InstanciatedIndexReader();
 ii.addDocument(s).. replace IndexWriter for now.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-16 Thread Otis Gospodnetic

It looks like I would have won a beer had anyone wagered me.

1.5 IS the Java version that the majority Lucene users use, not 1.4!

Does this mean we can now start accepting 1.5 code?

Otis

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Friday, June 16, 2006 11:48:15 AM
Subject: Survey: Lucene and Java 1.4 vs. 1.5

Hello everyone,

If you have 15 seconds to spare, please let us (Lucene developers) know which 
version of Java you are using with Lucene: 1.4 or 1.5

All it takes is 1 click on one of the two choices:
  http://www.quimble.com/poll/view/2156

No cheating, please.  Thanks!
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData - Milestone 2

2006-06-16 Thread Ian Holsman



On 17/06/2006, at 6:36 AM, Otis Gospodnetic wrote:


Hi Simon,

- GData oversion page describes the auth with send a cookie/token,  
save in server-side, and then expect it from the client on  
subsequent requests (paraphrased).  That sounds fine to me.  I  
don't think you need to worry about the client IP, as long as your  
cookie/token is long and random enough (please correct me if I'm  
wrong about this), although you might want to add the IP to the  
string you base your MD5 checksum on.
If you store the token in the session, it will automatically get  
the TTL of the HttpSession.


if you are going to use the IP, and you only use the first 3 quartets  
(ie  218.214.209 instead of  218.214.209.232) there are several proxy  
servers out there which load balance HTTP requests through different  
ip's.


regards
Ian


Otis


- Original Message 
From: Simon Willnauer [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Friday, June 16, 2006 12:48:59 PM
Subject: GData - Milestone 2

Hello everyone,

it was quiet the last week, well I had a bad cold so Milestone 2
starts a bit late...
Milestone 2 is about client authentication. GData client auth is also
defined (well kind of) in the gdata protocol reference on
code.google.com. The client is supposed to support either a cookie
base auth or just an auth token send back as an post response. The
client authenticates itself via a post request to the servers auth
interface sending following parameters:

[EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gu 
lp-CalGulp-1.05


the email represents the account name which is associated with a
service provided by the server. Each server can provides m services
with n feeds. Each feed belongs to one account.
As it is quiet hard to figure out whether a client does support plain
token or cookie auth I will send both back to the client. after the
client has received the auth token or cookie it will call some
restricted resource on the server sending either the cookie or the
auth token. The cookie contains  only the auth token.
So these are facts, I will generate a MD5 key as the auth token using
the email, password and a timestamp or something similar and save it
on the server in a kind of a session storage. the session storage will
hold the sessions for a certain time and will invalidate it if it is
timed out. Additionally i will save  the client ip (at least the first
32 bits) within the session and check it on subsequent   
requests. So

this is fine as long as the server is a stand alone server. What
happens if there is a load balancer and a server farm with more than
one gdata server instances?!
I could define all gdata servers in the cluster / farm in each config
file and if a session is created or modified the current server sends
a notice to all other servers to replicate the session. (Session is
not the HTTPSession). But this could be quiet a lot of work so
synchronize all hosts and register / unregister them if the crash...
I guess this should be done in a later state of development, I just
have 2 month left... So this might be a task for development after the
SoC program has finished.

Any Ideas about that?

yours simon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData - Milestone 2

2006-06-16 Thread Simon Willnauer


On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:

Hi Simon,

I have a bit of experience with REST and authentication from my work on 
http://simpy.com .
If you look at http://groups.yahoo.com/group/simpy-dev/messages you will see 
several recent messages about different authentication options that may give 
you some food for thought.


-- good stuff, thanks for the link!


As for GData auth:
- GData oversion page describes the auth with send a cookie/token, save in 
server-side, and then expect it from the client on subsequent requests 
(paraphrased).  That sounds fine to me.  I don't think you need to worry about the client 
IP, as long as your cookie/token is long and random enough (please correct me if I'm 
wrong about this), although you might want to add the IP to the string you base your MD5 
checksum on.
If you store the token in the session, it will automatically get the TTL of the 
HttpSession.


I already tried the HttpSession approach. Using the http session would
solve all my problems. The Session can be replicated as the most
containers support session repl. But how do i get the session id from
the client. The client sends a request parameter name: Auth value:
sessionid but the container does not recognize the session in this
case. As far as I know does the session parameter name has to be
jsessionid and I only get the session via the HttpServletRequest.
Any Idea about this?

simon


- Running GData server in a cluster might require session replication.  It 
sounds like a big bite for SoC, but ... I never used WADI, but I _think_ that 
might be easiest way to get session replication going: 
http://incubator.apache.org/projects/wadi.html
On the other hand, WADI might be an overkill if all you want is to share this 
token.  If that's all you need, perhaphs, is JavaSpaces (e.g. 
http://www.dancres.org/blitz/ ).

Otis


- Original Message 
From: Simon Willnauer [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Friday, June 16, 2006 12:48:59 PM
Subject: GData - Milestone 2

Hello everyone,

it was quiet the last week, well I had a bad cold so Milestone 2
starts a bit late...
Milestone 2 is about client authentication. GData client auth is also
defined (well kind of) in the gdata protocol reference on
code.google.com. The client is supposed to support either a cookie
base auth or just an auth token send back as an post response. The
client authenticates itself via a post request to the servers auth
interface sending following parameters:

[EMAIL PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05

the email represents the account name which is associated with a
service provided by the server. Each server can provides m services
with n feeds. Each feed belongs to one account.
As it is quiet hard to figure out whether a client does support plain
token or cookie auth I will send both back to the client. after the
client has received the auth token or cookie it will call some
restricted resource on the server sending either the cookie or the
auth token. The cookie contains  only the auth token.
So these are facts, I will generate a MD5 key as the auth token using
the email, password and a timestamp or something similar and save it
on the server in a kind of a session storage. the session storage will
hold the sessions for a certain time and will invalidate it if it is
timed out. Additionally i will save  the client ip (at least the first
32 bits) within the session and check it on subsequent  requests. So
this is fine as long as the server is a stand alone server. What
happens if there is a load balancer and a server farm with more than
one gdata server instances?!
I could define all gdata servers in the cluster / farm in each config
file and if a session is created or modified the current server sends
a notice to all other servers to replicate the session. (Session is
not the HTTPSession). But this could be quiet a lot of work so
synchronize all hosts and register / unregister them if the crash...
I guess this should be done in a later state of development, I just
have 2 month left... So this might be a task for development after the
SoC program has finished.

Any Ideas about that?

yours simon

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-16 Thread Simon Willnauer


go tiger go!

everybody not using 1.5 should visite java.sun.com downloading the 1.5 vm!!


On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:

It looks like I would have won a beer had anyone wagered me.

1.5 IS the Java version that the majority Lucene users use, not 1.4!

Does this mean we can now start accepting 1.5 code?


gdata already does ;)


Otis

- Original Message 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Friday, June 16, 2006 11:48:15 AM
Subject: Survey: Lucene and Java 1.4 vs. 1.5

Hello everyone,

If you have 15 seconds to spare, please let us (Lucene developers) know which 
version of Java you are using with Lucene: 1.4 or 1.5

All it takes is 1 click on one of the two choices:
  http://www.quimble.com/poll/view/2156

No cheating, please.  Thanks!
Otis




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData - Milestone 2

2006-06-16 Thread Otis Gospodnetic

Simon,

I don't fully understand your question, but if sessions are replicated, then 
the GData cluster doesn't care which GData server the client contacts, as they 
will all already have the token that was given to the client.  On subsequent 
requests, the client will have to send the token.  I am not sure if GData 
protocol specifies how that should be sent - via a query string param or 
perhaps even a HTTP request header (e.g. X-gdata-auth: SomeTokenHere).  The 
jsessionId carries the HttpSession ID if the client doesn't support, and thus 
doesn't send back, cookies.  If it does suppose cookies, they will be sent via 
Set-Cookie or some such HTTP request header.

I think what you need to do is:
- client makes a request
- server says you are not authenticated, here is a 401
- client provides credentials
- server checks credentials, creates token, saves it to session, and says to 
client: OK, eat this token.  The client saves it
- client makes a new request and sends the token (via HTTP request header or 
via query string param)
- server takes the token and compares it to the one stored in the current 
session [1]
- if the tokens match, the server responds with the data, else goto line with 
401 above


[1] In order for your server (Jetty or Tomcat or whatever) to be able to 
associate a client with a session, the client must send back the session Id 
from the first request.  This is normal Java webapp behaviour.  The client will 
send it either as a cookie via HTTP headers, or via jsessionid (aka URL 
rewriting... not to be mixed with mod_rewrite).  Regardless of the method, the 
server (Jetty/Tomcat) will know how to associate the request with an existing 
insntance of HttpSession, and that's that you'll get from request.getSession().

Otis

- Original Message 
From: Simon Willnauer [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; Otis Gospodnetic [EMAIL PROTECTED]
Sent: Friday, June 16, 2006 4:53:21 PM
Subject: Re: GData - Milestone 2

On 6/16/06, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 Hi Simon,

 I have a bit of experience with REST and authentication from my work on 
 http://simpy.com .
 If you look at http://groups.yahoo.com/group/simpy-dev/messages you will see 
 several recent messages about different authentication options that may give 
 you some food for thought.

-- good stuff, thanks for the link!

 As for GData auth:
 - GData oversion page describes the auth with send a cookie/token, save in 
 server-side, and then expect it from the client on subsequent requests 
 (paraphrased).  That sounds fine to me.  I don't think you need to worry 
 about the client IP, as long as your cookie/token is long and random enough 
 (please correct me if I'm wrong about this), although you might want to add 
 the IP to the string you base your MD5 checksum on.
 If you store the token in the session, it will automatically get the TTL of 
 the HttpSession.

I already tried the HttpSession approach. Using the http session would
solve all my problems. The Session can be replicated as the most
containers support session repl. But how do i get the session id from
the client. The client sends a request parameter name: Auth value:
sessionid but the container does not recognize the session in this
case. As far as I know does the session parameter name has to be
jsessionid and I only get the session via the HttpServletRequest.
Any Idea about this?

simon

 - Running GData server in a cluster might require session replication.  It 
 sounds like a big bite for SoC, but ... I never used WADI, but I _think_ that 
 might be easiest way to get session replication going: 
 http://incubator.apache.org/projects/wadi.html
 On the other hand, WADI might be an overkill if all you want is to share this 
 token.  If that's all you need, perhaphs, is JavaSpaces (e.g. 
 http://www.dancres.org/blitz/ ).

 Otis


 - Original Message 
 From: Simon Willnauer [EMAIL PROTECTED]
 To: java-dev@lucene.apache.org
 Sent: Friday, June 16, 2006 12:48:59 PM
 Subject: GData - Milestone 2

 Hello everyone,

 it was quiet the last week, well I had a bad cold so Milestone 2
 starts a bit late...
 Milestone 2 is about client authentication. GData client auth is also
 defined (well kind of) in the gdata protocol reference on
 code.google.com. The client is supposed to support either a cookie
 base auth or just an auth token send back as an post response. The
 client authenticates itself via a post request to the servers auth
 interface sending following parameters:

 [EMAIL 
 PROTECTED]Passwd=north23AZservice=servicenamesource=Gulp-CalGulp-1.05

 the email represents the account name which is associated with a
 service provided by the server. Each server can provides m services
 with n feeds. Each feed belongs to one account.
 As it is quiet hard to figure out whether a client does support plain
 token or cookie auth I will send both back to the client. after the
 client has received the auth token or cookie it will call some

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

2006-06-16 Thread markharw00d

1.5 IS the Java version that the majority Lucene users use, not 1.4!
Does this mean we can now start accepting 1.5 code?

This isn't simply about which JVM gets used the most wins.
This is about how many Lucene users will we inconvenience or lose by
moving to 1.5?

Right now the survey sample tells me roughly a third which doesn't seem
like a good thing. Maybe the question is more usefully who can't/won't
move to 1.5 in the immediate future?

I believe we shouldn't select the minimum platform based on the coding
convenience it may offer us which seems to be the major objective behind
1.5 adoption. When developing a library deployed in many
applications/environments over which you have no control and where
careful consideration of runtime performance not coding
convenience/speed of development is the primary concern my preference
would be to choose 1.4.

Not all deployment environments can be upgraded easily. Take my current
application at work. It's applet-based and rolled out to hundreds of
corporate desktops which are stuck on 1.4 (this won't change anytime
soon). Lucene isn't on the client but all client and server code in the
app has been written in 1.4 to avoid any issues of any 1.5 code leaking
onto the 1.4 client. All of the many 3rd party libraries in use (Spring,
database drivers etc) are 1.4 compatible in their latest versions. I'd
like to stick with the latest Lucene codebase but mandating 1.5 for
Lucene would introduce a code management headache to this app with the
mixed JVMs

Unless there are *really* good runtime benefits that are solely based on
1.5 libraries or source code I would prefer to see Lucene stick with 1.4
as a base rather than limit Lucene's deployment options simply because
of code-time benefits the new 1.5 syntax offers.
I see that the Spring framework recognise this dilemma and still seek to
support as far back as 1.3 (see http://www.springframework.org/node/220).

Simon said everyone should download 1.5. It's nice to think you can
accelerate the global adoption of 1.5 by changing projects like Lucene
but the reality is corporates do not change platforms overnight because
of such a change.

That's a long-winded way of saying -1 unless I hear of any arguments
which are based on something much more substantial than 1.5 makes
coding easier.

Cheers,
Mark

___
The all-new Yahoo! Mail goes wherever you go - free your email address from your Internet provider. http://uk.docs.yahoo.com/nowyoucan.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-605) Make Explanation include information about match/non-match

2006-06-16 Thread Hoss Man (JIRA)

Make Explanation include information about match/non-match
--

 Key: LUCENE-605
 URL: http://issues.apache.org/jira/browse/LUCENE-605
 Project: Lucene - Java
Type: Improvement

  Components: Search  
Reporter: Hoss Man
 Assigned to: Hoss Man 


As discussed, I'm looking into the possibility of improving the Explanation 
class to include some basic info about the match status of the Explanation -- 
independent of the value...

http://www.nabble.com/BooleanWeight.normalize%28float%29-doesn%27t-normalize-prohibited-clauses--t1596471.html#a4347644

This is neccesary to deal with things like LUCENE-451

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-605) Make Explanation include information about match/non-match

2006-06-16 Thread Hoss Man (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-605?page=all ]

Hoss Man updated LUCENE-605:


Attachment: demo-fix.patch

Demo of the basic direction I'm going.  This patch inlcudes some changes to the 
Explanation class to include the new information, as well as some tweaks to 
TermQuery and BooleanQuery to take advantage of it.

NOTE: the BooleanQuery changes in this patch overlap with he patches in 
LUCENE-557

 Make Explanation include information about match/non-match
 --

  Key: LUCENE-605
  URL: http://issues.apache.org/jira/browse/LUCENE-605
  Project: Lucene - Java
 Type: Improvement

   Components: Search
 Reporter: Hoss Man
 Assignee: Hoss Man
  Attachments: demo-fix.patch

 As discussed, I'm looking into the possibility of improving the Explanation 
 class to include some basic info about the match status of the Explanation 
 -- independent of the value...
 http://www.nabble.com/BooleanWeight.normalize%28float%29-doesn%27t-normalize-prohibited-clauses--t1596471.html#a4347644
 This is neccesary to deal with things like LUCENE-451

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Assigned: (LUCENE-451) BooleanQuery explain with boost==0

2006-06-16 Thread Hoss Man (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-451?page=all ]

Hoss Man reassigned LUCENE-451:
---

Assign To: Hoss Man

 BooleanQuery explain with boost==0
 --

  Key: LUCENE-451
  URL: http://issues.apache.org/jira/browse/LUCENE-451
  Project: Lucene - Java
 Type: Bug

   Components: Search
 Versions: CVS Nightly - Specify date in submission
 Reporter: Yonik Seeley
 Assignee: Hoss Man
 Priority: Minor


 BooleanWeight.explain() uses the returned score of subweights to determine if 
 a clause matched.
 If any required clause has boost==0, the returned score will be zero and the 
 explain for the entire BooleanWeight will be simply  Explanation(0.0f, match 
 required).
 I'm not sure what the correct fix is here.  I don't think it can be done 
 based on score alone, since that isn't how scorers work.   Perhaps we need a 
 new method boolean Explain.matched() that returns true on a match, 
 regardless of what the score may be? 
 Related to the problem above, even if no boosts are zero, it it sometimes 
 nice to know *why* a particular query failed to match.  It would mean a 
 longer explanation, but maybe we should include non matching explains too?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-559) Turkish Analyzer for Lucene

Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader

Re: Java 1.5 was [jira] Updated: (LUCENE-600) ParallelWriter companion to ParallelReader

GData - Milestone 2

RE: Test failure question

RE: Test failure question

[jira] Created: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher

Re: [jira] Created: (LUCENE-603) index optimize problem

RE: Test failure question

Re: [jira] Commented: (LUCENE-604) do we need a flag to check open status for IndexWriter and IndexSearcher

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

Re: GData - Milestone 2

Re: GData - Milestone 2

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

Re: GData - Milestone 2

Re: Results (Re: Survey: Lucene and Java 1.4 vs. 1.5)

[jira] Created: (LUCENE-605) Make Explanation include information about match/non-match

[jira] Updated: (LUCENE-605) Make Explanation include information about match/non-match

[jira] Assigned: (LUCENE-451) BooleanQuery explain with boost==0

20 matches

Site Navigation

Mail list logo

Footer information