Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Doug Cutting

Andi Vajda wrote:
I'd be interested in doing this but what is it that we're after in 
'supporting gcj' actually ?


I think it would sufficient to:

1. Compile only .jar and .class with gcj (not .java).
2. Pass all unit tests on a single platform.

This would provide an existence proof that Lucene can run under GCJ, and 
doesn't require solving GCJ's porting issues.


Even when only compiling .jar - .so with gcj, a number of patches 
still need to be applied:

http://svn.osafoundation.org/pylucene/trunk/patches.lucene


The patches to JavaCC-generated code should probably really become 
JavaCC patches.  Have you looked into that?  Most of the rest look like 
reasonable changes to Lucene, except perhaps the native matches, which 
looks a bit fishy for Lucene's trunk.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Andi Vajda


On Tue, 11 Jul 2006, Doug Cutting wrote:


Andi Vajda wrote:
I'd be interested in doing this but what is it that we're after in 
'supporting gcj' actually ?


I think it would sufficient to:

1. Compile only .jar and .class with gcj (not .java).
2. Pass all unit tests on a single platform.


Just last week, a PyLucene user got it to work on Solaris. I have no access to 
a Solaris machine to validate this. If I had my choice of platform, I'd pick 
one of (in order of preference):

  - Mac OS X (Intel or PPC)
  - a recent Red Hat Linux since this is the one most gcj developers use
  - Ubuntu 6.06

As for the version of gcj I'd suggest using:
  - Mac OS X Intel : gcj 4.0.2 (heavily patched)
  - Mac OS X PPC : gcj 3.4.6
  - Red Hat Linux : I'd try 4.2.0 downgrading until I find one that works,
probably 4.1.1
  - Ubuntu 6.06: gcj 3.4.6

Unless junit can be made to run compiled under gcj, I see some more work on 
the unit tests side. This could be interesting too...


Even when only compiling .jar - .so with gcj, a number of patches still 
need to be applied:

http://svn.osafoundation.org/pylucene/trunk/patches.lucene


The patches to JavaCC-generated code should probably really become JavaCC 
patches.  Have you looked into that?


Yes, I filed bug 53 almost two years ago, it's not gone very far :(
https://javacc.dev.java.net/issues/show_bug.cgi?id=53

Most of the rest look like reasonable 
changes to Lucene, except perhaps the native matches, which looks a bit 
fishy for Lucene's trunk.


The native match patches are required because the libgcj that comes with gcj 
3.4.x doesn't provide a regular expressions implementation. This is solved in 
PyLucene by using python's. I think gcj 4 comes with regex support but gcj 4 
is not yet well supported on most platforms.


For the gcj platform story, see this pylucene-dev post I sent recently:
  http://lists.osafoundation.org/pipermail/pylucene-dev/2006-June/001106.html

Andi..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Doug Cutting

Andi Vajda wrote:
Just last week, a PyLucene user got it to work on Solaris. I have no 
access to a Solaris machine to validate this. If I had my choice of 
platform, I'd pick one of (in order of preference):

  - Mac OS X (Intel or PPC)
  - a recent Red Hat Linux since this is the one most gcj developers use
  - Ubuntu 6.06


The Apache machine where we run nightly builds runs Solaris.

My first platform of choice would be Ubuntu.

Unless junit can be made to run compiled under gcj, I see some more work 
on the unit tests side. This could be interesting too...


A search for gcj junit finds:

http://www.mail-archive.com/user@ant.apache.org/msg19104.html


Yes, I filed bug 53 almost two years ago, it's not gone very far :(
https://javacc.dev.java.net/issues/show_bug.cgi?id=53


Probably this would get fixed more quickly if someone contributed a 
patch to JavaCC.  Even it were not committed, we could build our own 
version of JavaCC.  Any intrepid volunteers?


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread DM Smith


On Jul 11, 2006, at 12:17 AM, Daniel John Debrunner wrote:


Doug Cutting wrote:

Since GCJ is effectively available on all platforms, we could say  
that
we will start accepting 1.5 features when a GCJ release supports  
those

features.  Does that seem reasonable?


Seems potentially a little strange to me. Does this mean Lucene  
would be

limited to the set of 1.5 features actually implemented by GCJ? So if
there is a 1.5 feature that is not supported by GCJ (while others are)
it cannot be used?

Seems more natural to support the complete 1.5 as defined by Sun/Java,
not the subset implemented by one open source compiler.



Eclipse has a built in compiler called ecj and it can compile Java  
1.6 code today. However, unless classes are provided at runtime for  
linking, one will get build errors.


The same is true with gcj. It still does not fully support Java 1.4,  
(almost there...) classes, though it supports all language features.  
However, on Fedora, Eclipse is built with ecj and to me this  
demonstrates that it is close enough for most use cases.


Gcj will have support for the language features before it supports  
all the new classes.


In terms of Lucene, I believe that the most important classes that  
are wanted are the concurrency ones. (At least that is how I have  
read the posts here.)


I think the measure of readiness is not that it compiles today with  
gcj, but that the Java 1.5 classes and features that are likely to be  
used by lucene are implemented and pass all lucene tests.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread DM Smith


On Jul 11, 2006, at 3:51 AM, Doug Cutting wrote:


Andi Vajda wrote:
I'd be interested in doing this but what is it that we're after in  
'supporting gcj' actually ?


I think it would sufficient to:

1. Compile only .jar and .class with gcj (not .java).
2. Pass all unit tests on a single platform.

This would provide an existence proof that Lucene can run under  
GCJ, and doesn't require solving GCJ's porting issues.




For me the platform of choice would be MacOS X, since 10.3 will never  
have Java 5. (IIRC, 10.4 has only been out for about a year.)

Most of the other platforms will.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Andi Vajda


On Tue, 11 Jul 2006, Doug Cutting wrote:

Probably this would get fixed more quickly if someone contributed a patch to 
JavaCC.  Even it were not committed, we could build our own version of 
JavaCC.  Any intrepid volunteers?


For patches that seem too kludgy to make it into Lucene's sources (for 
example, to work around the lack of proper exception support under Windows 
gcj in the query parser) a compromise could be to keep these patches in a 
separate file and apply them to the Lucene sources before building them with 
gcj. This is how PyLucene is built today.


Some patches have already been incorporated into the Lucene sources (for 
example, in Searcher.java, to workaround gcj bug 15411).


Of course, the long term goal should be to no longer have any patches at all.
I've been working on PyLucene about two and a half years now and the number of 
patches has remained fairly stable.


A nice side effect of trying to support gcj with Java Lucene by including it 
into the Lucene test framework could be that the gcj developers might be more 
inclined to taking a look at gcj-related issues that are thus made much easier 
to reproduce.


Andi..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Andi Vajda


On Tue, 11 Jul 2006, DM Smith wrote:

Eclipse has a built in compiler called ecj and it can compile Java 1.6 code 
today. However, unless classes are provided at runtime for linking, one will 
get build errors.


It looks like ecj is going to replace the gcj java front-end compiler thereby 
making the 1.5 language features available to gcj. In the meantime, the 
classpath project is working towards adding support for all JRE classes. I'm 
quite optimistic that we should see a 1.5 capable gcj this year. This isn't 
saying much, however, about which platforms, besides Red Hat Linux, this gcj 
would be producing stable executables for. For example, gcj on Windows is very 
far behind and is getting very little development time these days.


Andi..


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-11 Thread Andi Vajda


On Tue, 11 Jul 2006, robert engels wrote:

It's been years and GCJ still doesn't have anywhere near full 1.4 classpath 
libraries.


So now if we want to write code for Lucene we have to know what libraries are 
available for GCJ?


GCJ is a joke.


It looks like classpath is quite close to 100% 1.4 JRE support.

http://www.kaffe.org/~stuart/japi/htmlout/h-jdk14-classpath.html

Of course, earlier gcj versions, such as 3.4.x, come with a libgcj based on 
an earlier version of classpath with bigger holes (regex support, for 
example).


Things are moving in the right direction, however...

Andi..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Doug Cutting

Andi Vajda wrote:

On Sat, 8 Jul 2006, Doug Cutting wrote:
Since GCJ is effectively available on all platforms, we could say that 
we will start accepting 1.5 features when a GCJ release supports those 
features. Does that seem reasonable?


+1


If we use this criteria, then we should probably officially support GCJ. 
Ideally we should run nightly unit tests with GCJ.  Andi, would you be 
interested in helping to set this up?


Our unit test scripts are at:

https://svn.apache.org/repos/asf/lucene/java/nightly/

These are run on lucene.zones.apache.org, a Solaris box.  If you (or 
someone else) is willing, then I can make you an account on this machine 
and you can alter the nightly build process to include testing against 
the most recent GCJ release.


Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Vic Bancroft

Andi Vajda wrote:


On Mon, 10 Jul 2006, Doug Cutting wrote:


Andi Vajda wrote:


On Sat, 8 Jul 2006, Doug Cutting wrote:

Since GCJ is effectively available on all platforms, we could say 
that we will start accepting 1.5 features when a GCJ release 
supports those features. Does that seem reasonable?


+1


If we use this criteria, then we should probably officially support 
GCJ. Ideally we should run nightly unit tests with GCJ. Andi, would 
you be interested in helping to set this up?


This is interesting to me, is the nightly build environment difficult to 
replicate ?


I'd be interested in doing this but what is it that we're after in 
'supporting gcj' actually ?


There is some advantage in using gcj as a measure of usability in the 
context of a free (as in beer) java, such that for a given target 
platform, one can deliver executables and shared libraries without 
requiring virtual machine runtimes. The second advantage is to give a 
simple method to nightly test contributions using new features. The 
third advantage seems to be a reduction in computational load on servers 
running native code.



- running a fully compiled program linked against a lucene.so ?
if so, which platforms ? the gcj story is very different on each and 
every

platform, including different linuxes and gcj is not well supported on
some platforms at all.


This seems to be the case, since on an updated fedora core 5 with gcj 
(GCC) 4.1.1 20060525 (Red Hat 4.1.1-1), the Makefile modifications 
required are trivial.



- running java bytecode with the gcj VM (gij, I believe) ?
if the .java code needs to be compiled with gcj then a number of patches
still need to be applied against the Java lucene sources.
PyLucene is built by compiling .java - .jar using a regular JDK (Apple's
or Blackdown) and using gcj to compile from .jar - .so thereby working
around all the gcj java front-end bugs
Even when only compiling .jar - .so with gcj, a number of patches still
need to be applied:
http://svn.osafoundation.org/pylucene/trunk/patches.lucene


The last time I checked for src/gcj/Makefile (revision 420696), all that 
was required was to fix the name of the lucene archive file to match 
what is actually generated, e.g., $(BUILD)/lucene-core-[0-9].*.jar and 
add the FieldCache* to the names to skip . . .


Not having contributed to lucene yet, is it required to generate a 
'patch' to add to jira, or is the following output from a simple `svn 
diff` sufficient for experimentation ?


   Index: src/gcj/Makefile
   ===
   --- src/gcj/Makefile (revision 420696)
   +++ src/gcj/Makefile (working copy)
   @@ -8,7 +8,7 @@
   CORE=$(BUILD)/classes/java
   SRC=.

   -CORE_OBJ:=$(subst .jar,.a,$(wildcard $(BUILD)/lucene-[0-9]*.jar))
   +CORE_OBJ:=$(subst .jar,.a,$(wildcard $(BUILD)/lucene-core-[0-9]*.jar))
   CORE_JAVA:=$(shell find $(ROOT)/src/java -name '*.java')

   CORE_HEADERS=\
   @@ -55,7 +55,7 @@
   # yet accept from .class files.
   # NOTE: Change when
   http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15501 is fixed.
   $(CORE_OBJ) : $(CORE_JAVA)
   - $(GCJ) $(GCJFLAGS) -c -I $(CORE) -o $@ `find $(ROOT)/src/java
   -name '*.java' -not -name '*Sort*' -not -name 'Span*'` `find $(CORE)
   -name '*.class' -name '*Sort*' -or -name 'Span*'`
   + $(GCJ) $(GCJFLAGS) -c -I $(CORE) -o $@ `find $(ROOT)/src/java
   -name '*.java' -not -name '*Sort*' -not -name 'Span*' -not -name
   'FieldCache*'` `find $(CORE) -name '*.class' -name '*Sort*' -or
   -name 'Span*' -or -name 'FieldCache*'`

   # generate object code from jar files using gcj
   %.a : %.jar

more,
l8r,
v

--
The future is here. It's just not evenly distributed yet.
-- William Gibson, quoted by Whitfield Diffie


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Daniel John Debrunner
Doug Cutting wrote:

 Since GCJ is effectively available on all platforms, we could say that
 we will start accepting 1.5 features when a GCJ release supports those
 features.  Does that seem reasonable?

Seems potentially a little strange to me. Does this mean Lucene would be
limited to the set of 1.5 features actually implemented by GCJ? So if
there is a 1.5 feature that is not supported by GCJ (while others are)
it cannot be used?

Seems more natural to support the complete 1.5 as defined by Sun/Java,
not the subset implemented by one open source compiler.

Dan.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread robert engels
Agreed. I think those that are reliant on GCJ should plan on  
expending the effort to do whatever backporting is needed to make  
Lucene work on it. It should also be a GCJ branch or version. Seems  
silly to support 1.5 and not do it this way.



On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote:


Doug Cutting wrote:

Since GCJ is effectively available on all platforms, we could say  
that
we will start accepting 1.5 features when a GCJ release supports  
those

features.  Does that seem reasonable?


Seems potentially a little strange to me. Does this mean Lucene  
would be

limited to the set of 1.5 features actually implemented by GCJ? So if
there is a 1.5 feature that is not supported by GCJ (while others are)
it cannot be used?

Seems more natural to support the complete 1.5 as defined by Sun/Java,
not the subset implemented by one open source compiler.

Dan.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Vic Bancroft

robert engels wrote:


Seems  silly to support 1.5 and not do it this way.


Sometimes a little silliness is some serious fun!  Just give me a rubber 
nose, since I am just clowning around trying to build Andi's kewly 
contrib/db using gcj on the slightly stylish db-4.4.20 and je-3.0.12 . . .



On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote:


Doug Cutting wrote:


Since GCJ is effectively available on all platforms, we could say  that
we will start accepting 1.5 features when a GCJ release supports  those
features.  Does that seem reasonable?


Seems potentially a little strange to me. Does this mean Lucene  
would be

limited to the set of 1.5 features actually implemented by GCJ? So if
there is a 1.5 feature that is not supported by GCJ (while others are)
it cannot be used?

Seems more natural to support the complete 1.5 as defined by Sun/Java,
not the subset implemented by one open source compiler.



Do you have a different favorite open source java compiler for 1.5 ?

more,
l8r,
v

--
The future is here. It's just not evenly distributed yet.
-- William Gibson, quoted by Whitfield Diffie


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-10 Thread Daniel John Debrunner
Vic Bancroft wrote:


 On Jul 10, 2006, at 11:17 PM, Daniel John Debrunner wrote:

 Doug Cutting wrote:

 Since GCJ is effectively available on all platforms, we could say  that
 we will start accepting 1.5 features when a GCJ release supports  those
 features.  Does that seem reasonable?


 Seems potentially a little strange to me. Does this mean Lucene 
 would be
 limited to the set of 1.5 features actually implemented by GCJ? So if
 there is a 1.5 feature that is not supported by GCJ (while others are)
 it cannot be used?

 Seems more natural to support the complete 1.5 as defined by Sun/Java,
 not the subset implemented by one open source compiler.


 Do you have a different favorite open source java compiler for 1.5 ?

No, I just think the platform for Lucene (or any Java project) should be
defined by the spec (JDK 1.4, 1.5 or 1.6), not a single (possible
partial) implementation of the spec.

Dan.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread Doug Cutting

Chuck Williams wrote:

I doubt any single contribution will change anyone's mind.  I would like
to have clarity on the 1.5 decision before deciding whether or not to
contribute this and other things.  My ParallelWriter contribution, which
also requires 1.5, is already sitting in jira.


Sitting in Jira is better than not sitting in Jira, no?


I only work in 1.5 and use its features extensively.  I don't think
about 1.4 at all, and so have no idea how heavily dependent the code in
question is on 1.5.

Unfortunately, I won't be able to contribute anything substantial to
Lucene so long as it has a 1.4 requirement.


The 1.5 decision requires a consensus.  You're making ultimatums, which
does not help to build consensus.  By stating an inflexible position
you've become a fact that informs the process.

I think we should try to minimize the number of inconvenienced people.
Both developers and users are people.  Some developers are happy to
continue in 1.4, adding new features that users who are confined to 1.4
JVMs will be able to use.  Other developers will only contribute 1.5
code, perhaps (unless we find a technical workaround) excluding users
confined to 1.4 JVMs.  But it is difficult to compare the inconvenience
of a developer who refuses to code back-compatibly to a user who is 
deprived new features.


Since GCJ is effectively available on all platforms, we could say that 
we will start accepting 1.5 features when a GCJ release supports those 
features.  Does that seem reasonable?


Doug



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread Andi Vajda


On Sat, 8 Jul 2006, Doug Cutting wrote:

Since GCJ is effectively available on all platforms, we could say that we 
will start accepting 1.5 features when a GCJ release supports those features. 
Does that seem reasonable?


+1

Andi..

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread DM Smith


On Jul 8, 2006, at 12:41 PM, Doug Cutting wrote:



Since GCJ is effectively available on all platforms, we could say  
that we will start accepting 1.5 features when a GCJ release  
supports those features.  Does that seem reasonable?


I have been doing a bit of reading on GCJ compatibility. I think it  
is going to come in 2 parts:

1) It supports all the new language features of Java 1.5.
2) It has an implementation of all the new classes and methods that  
Lucene uses.


For me the test is that it is released for MacOSX.

With these three things, I'd be happy :)

DM Smith, stick in the mud :)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread Chuck Williams
Doug Cutting wrote on 07/08/2006 09:41 AM:
 Chuck Williams wrote:
 I only work in 1.5 and use its features extensively.  I don't think
 about 1.4 at all, and so have no idea how heavily dependent the code in
 question is on 1.5.

 Unfortunately, I won't be able to contribute anything substantial to
 Lucene so long as it has a 1.4 requirement.

 The 1.5 decision requires a consensus.  You're making ultimatums, which
 does not help to build consensus.  By stating an inflexible position
 you've become a fact that informs the process.

My statement was not intended as an ultimatum at all.  Rather, it is
simply a fact.  I prefer to contribute to Lucene, but my workload simply
does not allow time to be spent on backporting.


 I think we should try to minimize the number of inconvenienced people.
 Both developers and users are people.  Some developers are happy to
 continue in 1.4, adding new features that users who are confined to 1.4
 JVMs will be able to use.  Other developers will only contribute 1.5
 code, perhaps (unless we find a technical workaround) excluding users
 confined to 1.4 JVMs.  But it is difficult to compare the inconvenience
 of a developer who refuses to code back-compatibly to a user who is
 deprived new features.

Doug, respectfully, this issue is inflammatory in its nature.  I've
found a couple of your comments to be inflammatory, although I suspect
you did not intend them that way.  Specifically the term refuses above
and your prior comment about considering use of your veto power if the
committers were to vote to move to 1.5.

I'm not refusing to do anything.  I am overwhelmed in a crunch for the
next several months and simply informing the community that I have code
that others may find valuable that might be contributed, but that it
requires 1.5 and that I cannot backport it.  I cannot unilaterally
decide to contribute the code, needing the agreement of the company I'm
working for.  They are only interested in the contribution if there is
interest in having it in the core.  These are simply facts.  I suspect
I'm not the only person in this kind of situation.


 Since GCJ is effectively available on all platforms, we could say that
 we will start accepting 1.5 features when a GCJ release supports those
 features.  Does that seem reasonable?

Seems like a reasonable compromise to me.  If I had a vote on this it
would be +1.

Chuck


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-08 Thread DM Smith


On Jul 8, 2006, at 12:56 PM, Chuck Williams wrote:



I prefer to contribute to Lucene, but my workload simply
does not allow time to be spent on backporting.


I'll stand by my offer to do the backporting when it is possible and  
does not do violence to the implementation.


I'd prefer to wait until the patch that is in Jira is ready to be  
applied. At that point post the request here and I'll see if it is  
doable.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-07 Thread Otis Gospodnetic
Hi Chuck,

I think bulk update would be good (although I'm not sure how it would be 
different from batching deletes and adds, but I'm sure there is a difference, 
or else you wouldn't have done it).
Java 1.5 - no conclusion, but personally I felt:
- no strong arguments for 1.4, only a few people argued for it
- very little interest from 1.4 adversaries in helping with backporting to 1.4 
or updating the build system to do the retro thing with 1.5 code

So I think you should contribute your code.  This will give us a real example 
of having something possibly valuable, and written with 1.5 features, so we can 
finalize 1.4 vs. 1.5 discussion, probably with a vote on lucene-dev.

Otis

- Original Message 
From: Chuck Williams [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Thursday, July 6, 2006 5:07:41 PM
Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in 
IndexWriter (Code and Performance Results Provided)

robert engels wrote on 07/06/2006 12:24 PM:
 I guess we just chose a much simpler way to do this...

 Even with you code changes, to see the modification made using the
 IndexWriter, it must be closed, and a new IndexReader opened.

 So a far simpler way is to get the collection of updates first, then

 using opened indexreader,
 for each doc in collection
   delete document using key
 endfor

 open indexwriter
 for each doc in collection
   add document
 endfor

 open indexreader


 I don't see how your way is any faster. You must always flush to disk
 and open the indexreader to see the changes.



Bulk updates however require yet another approach.  Sorry to change
topics here, but I'm wondering if there was a final decision on the
question of java 1.5 in the core.  If I submitted a bulk update
capability that required java 1.5, would it be eligible for inclusion in
the core or not?

Chuck


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-07 Thread DM Smith

Otis,
	First let me say, I don't want to rehash the arguments for or  
against Java 1.5. We can all go back and read the last two major  
threads on the issue. I don't think there is anything new to say.


However, I think statements like:
no strong arguments (I think the arguments were reasonable)
only a few people argued for it (Only a few argued against it)
		very little interest (Very few votes are on any Jira issue, so  
what does that say)
		adversaries (I am not an adversary, I am a very interested party  
with a personal interest in the outcome)

are inflammatory.

	I am willing to do the back port if it is possible and if it does  
not do violence to the implementation.


	There are a number of patches sitting in Jira and it is not clear to  
me which are even close to being applied. I am not interested in  
doing work on patches that are old or might sit around for a while  
until they are applied (and therefore become out of sync).


	If the patches are identified as being worthy of being applied and  
are also identified as being Java 1.5, I will port it and it's test  
if it make sense.


	It has already been granted that contrib allow Java 1.5. So I  
presume that the build has been updated to allow for 1.5 in contrib  
and not in core. If this is not the case I think that the first  
committer (or submitter) of Java 1.5 code to contrib has the  
responsibility to change the build system (or at least ensure that it  
is done.)


	As to the build system, I am not the right person to see that it  
works. I am using Eclipse to do the builds. I maintain 2 workspaces,  
one with core only and that is Java 1.4.2 and the other is core and  
contrib and that is Java 1.5. I have done this so I can help back  
port to Java 1.4.


	However, I think you have identified that the core people need to  
make a decision and the rest of us need to go with it. So, I suggest  
that Doug convene such a meeting of the minds and communicate the  
decision to the rest of us.


DM



On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote:


Hi Chuck,

I think bulk update would be good (although I'm not sure how it  
would be different from batching deletes and adds, but I'm sure  
there is a difference, or else you wouldn't have done it).

Java 1.5 - no conclusion, but personally I felt:
- no strong arguments for 1.4, only a few people argued for it
- very little interest from 1.4 adversaries in helping with  
backporting to 1.4 or updating the build system to do the retro  
thing with 1.5 code


So I think you should contribute your code.  This will give us a  
real example of having something possibly valuable, and written  
with 1.5 features, so we can finalize 1.4 vs. 1.5 discussion,  
probably with a vote on lucene-dev.


Otis

- Original Message 
From: Chuck Williams [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Thursday, July 6, 2006 5:07:41 PM
Subject: Re: [jira] Commented: (LUCENE-565) Supporting  
deleteDocuments in IndexWriter (Code and Performance Results Provided)


robert engels wrote on 07/06/2006 12:24 PM:

I guess we just chose a much simpler way to do this...

Even with you code changes, to see the modification made using the
IndexWriter, it must be closed, and a new IndexReader opened.

So a far simpler way is to get the collection of updates first, then

using opened indexreader,
for each doc in collection
  delete document using key
endfor

open indexwriter
for each doc in collection
  add document
endfor

open indexreader


I don't see how your way is any faster. You must always flush to disk
and open the indexreader to see the changes.




Bulk updates however require yet another approach.  Sorry to change
topics here, but I'm wondering if there was a final decision on the
question of java 1.5 in the core.  If I submitted a bulk update
capability that required java 1.5, would it be eligible for  
inclusion in

the core or not?

Chuck


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Java 1.5 (was ommented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided))

2006-07-07 Thread Chuck Williams

DM Smith wrote on 07/07/2006 07:07 PM:
 Otis,
 First let me say, I don't want to rehash the arguments for or
 against Java 1.5.

This is an emotional issue for people on both sides.

 However, I think you have identified that the core people need to
 make a decision and the rest of us need to go with it.

It would be most helpful to have clarity on this issue.

 On Jul 7, 2006, at 1:17 PM, Otis Gospodnetic wrote:

 Hi Chuck,

 I think bulk update would be good (although I'm not sure how it would
 be different from batching deletes and adds, but I'm sure there is a
 difference, or else you wouldn't have done it).

Bulk update works by rewriting all segments that contain a document to
be modified in a single linear pass.  This is orders of magnitude faster
than delete/add if the set of documents to be updated is large,
especially if only a few small fields are mutable on Documents that have
many possibly large immutable fields.  E.g., on a somewhat slow
development machine I updated several fields on 1,000,000 large
documents in 43 seconds.

There is an existing patch in jira that takes this same approach
(LUCENE-382).  However the limitations in that patch are substantial: 
only optimized indexes, stored fields are not updated, updates are
independent of the existing field value, etc.  These limitations make
that implementation not suitable for many use cases.

My implementation eliminates all of those limitations, providing a fast
flexible solution for applying an arbitrary value transformation to
selected documents and fields in the index (doc.field.new_value = f(doc,
field.old_value, doc.other_field_values) for arbitrary f).  It also
works with ParallelReader (and the ParallelWriter I've already
contributed).  This allows the mutable fields to be segregated into a
separate subindex.  Only that subindex need be updated.  This alone is
an enormous advantage over a large number of delete/add's where the same
optimization is not possible due to the doc-id synchronization
requirements of ParallelReader.

There is a substantial amount of code required to do this, and it is
completely dependent on the index representation.  To simplify merge
issues with ongoing Lucene changes, I had to copy and edit certain
private methods out of the existing index code (and make extensive use
of the package-only api's).  Beyond normal benefits of open sourcing
code, my interest in contributing this is to see the index code
refactored to take bulk update into account.  This is increased by the
current focus on a new flexible index representation.  I would like to
see bulk update as one of the operations supported in the new
representation.

 So I think you should contribute your code.  This will give us a real
 example of having something possibly valuable, and written with 1.5
 features, so we can finalize 1.4 vs. 1.5 discussion, probably with a
 vote on lucene-dev.

I doubt any single contribution will change anyone's mind.  I would like
to have clarity on the 1.5 decision before deciding whether or not to
contribute this and other things.  My ParallelWriter contribution, which
also requires 1.5, is already sitting in jira.

I only work in 1.5 and use its features extensively.  I don't think
about 1.4 at all, and so have no idea how heavily dependent the code in
question is on 1.5.

Unfortunately, I won't be able to contribute anything substantial to
Lucene so long as it has a 1.4 requirement.

Chuck


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]