Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread Endre Stølsvik

robert engels wrote:



The 'foreach' should be faster in the general case for arrays as the 
bounds checking can be avoided.


Why is that? Where do you mean that the bounds-checking can be avoided?



But, I doubt the speed difference is going to matter much either way, 
and eventually the JVM impl will converge to near equal performance.


This I actually agree on: if the foreach has some disadvantage of 
explicit indexing through an array, it will at some point be fixed so 
that it doesn't have this disadvantage anymore..


Endre.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread robert engels
When iterating over an array using an indexed loop, you typically  
need to access the element, as follows:


for(int i=0;i100;i++) {
String s = array[i];
...
}

Java performs bounds checking on the array[i] access to make sure i  
is within the limits of the array. Granted, there are optimizations  
the JVM can do in many cases using escape processing to know that i  
will always be in the range, but it is not always feasible.


when you use

for(String s : array) {
}

the JVM uses its own internal indexer that it knows cannot be outside  
the bounds, and thus the bounds checking can be avoided.


I would need to read the spec to know what happens if the array  
changes during the loop execution - my bet is that the loop maintains  
a reference to the original array, and thus it continues to work.


On Apr 11, 2008, at 2:28 AM, Endre Stølsvik wrote:


robert engels wrote:

The 'foreach' should be faster in the general case for arrays as  
the bounds checking can be avoided.


Why is that? Where do you mean that the bounds-checking can be  
avoided?


But, I doubt the speed difference is going to matter much either  
way, and eventually the JVM impl will converge to near equal  
performance.


This I actually agree on: if the foreach has some disadvantage of  
explicit indexing through an array, it will at some point be fixed  
so that it doesn't have this disadvantage anymore..


Endre.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread Chris Hostetter

Since i had to read robert's email about 3 times before i got what he was 
saying, i'll elaborate in case anyone else is scratching their head as 
much as i was...

because you could write code that looks like this...
   for (int i = 0; i  arr.length; i++) {
   i = getSomeNumberNotBetweenZeroAndArrLength()
   String s = arr[i];
   }
...the arr[i] lookup must do bounds checking and raise an exception if 
needed.  This is not neccessary in the foreach style construct where 
there is no explicit loop counter.

: When iterating over an array using an indexed loop, you typically need to
: access the element, as follows:
: 
: for(int i=0;i100;i++) {
:   String s = array[i];
:   ...
: }
: 
: Java performs bounds checking on the array[i] access to make sure i is within
: the limits of the array. Granted, there are optimizations the JVM can do in
: many cases using escape processing to know that i will always be in the range,
: but it is not always feasible.
: 
: when you use
: 
: for(String s : array) {
: }
: 
: the JVM uses its own internal indexer that it knows cannot be outside the
: bounds, and thus the bounds checking can be avoided.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-11 Thread robert engels

Correct.

On Apr 11, 2008, at 7:43 PM, Chris Hostetter wrote:



Since i had to read robert's email about 3 times before i got what  
he was

saying, i'll elaborate in case anyone else is scratching their head as
much as i was...

because you could write code that looks like this...
   for (int i = 0; i  arr.length; i++) {
   i = getSomeNumberNotBetweenZeroAndArrLength()
   String s = arr[i];
   }
...the arr[i] lookup must do bounds checking and raise an exception if
needed.  This is not neccessary in the foreach style construct where
there is no explicit loop counter.

: When iterating over an array using an indexed loop, you typically  
need to

: access the element, as follows:
:
: for(int i=0;i100;i++) {
:   String s = array[i];
:   ...
: }
:
: Java performs bounds checking on the array[i] access to make sure  
i is within
: the limits of the array. Granted, there are optimizations the JVM  
can do in
: many cases using escape processing to know that i will always be  
in the range,

: but it is not always feasible.
:
: when you use
:
: for(String s : array) {
: }
:
: the JVM uses its own internal indexer that it knows cannot be  
outside the

: bounds, and thus the bounds checking can be avoided.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread Toke Eskildsen
On Tue, 2008-04-08 at 18:48 -0500, robert engels wrote:
 That is opposite of my testing:...
 
 The 'foreach' is consistently faster. The time difference is  
 independent of the size of the array. What I know about JVM  
 implementations, the foreach version SHOULD always be faster -  
 because the no bounds checking needs to be done on the element access...

That's interesting. Even if it doesn't show in a performance-test right
now, it might do so in later Java versions.

As for your test-code, then it does not measure performance in a fair
way, as the foreach runs after the old-style loop. I'm sure you'll see
different results if you switch the order of the two tests.

I'm a big fan of foreach, but I'll have to admit that Steven's
observations seems to be correct. I hope I'll find the time to take the
advice of Yonik and make my own test sometime soon.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread Steven A Rowe
Hi Toke,

On 04/09/2008 at 2:43 AM, Toke Eskildsen wrote:
 On Tue, 2008-04-08 at 18:48 -0500, robert engels wrote:
  That is opposite of my testing:...
  
  The 'foreach' is consistently faster. The time difference is
  independent of the size of the array. What I know about JVM
  implementations, the foreach version SHOULD always be faster -
  because the no bounds checking needs to be done on the
  element access...
 
 As for your test-code, then it does not measure performance in a fair
 way, as the foreach runs after the old-style loop. I'm sure you'll see
 different results if you switch the order of the two tests.

My first try at a test looked like Robert's, and exactly as you say, Toke, when 
operating on the same array, the first loop is slower and the second one is 
faster.

Steve

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread melix

Hi,

I confirm your results. I didn't think there could be a difference using
foreach constructs...

Cedric


Steven A Rowe wrote:
 
 On 04/04/2008 at 4:40 AM, Toke Eskildsen wrote:
 On Wed, 2008-04-02 at 09:30 -0400, Mark Miller wrote:
   - replacement of indexed for loops with for each constructs
  
  Is this always the best idea? Doesn't the for loop construct make an
  iterator, which can be much slower than an indexed for loop?
 
 Only in the case of iterations over collections. For arrays, the foreach
 is syntactic sugar for indexed for-loop.
 http://java.sun.com/docs/books/jls/third_edition/html/statements.html#14.14.2
 
 I don't think this is actually true.  The text at the above-linked page
 simply says that for-each over an array means the same as an indexed
 loop over the same array.  Syntactic sugar, OTOH, implies that the
 resulting opcode is exactly the same.  When I look at the byte code (using
 javap) for the comparison test I include below, I can see that the indexed
 and for-each loops do not generate the same byte code.
 
 I constructed a simple program to compare the runtime length of the two
 loop control mechanisms, while varying the size of the array.  The test
 program takes command line parameters to control which loop control
 mechanism to use, the size of the array (#elems), and the number of times
 to execute the loop (#iters).  I used a Bash shell script to invoke the
 test program.
 
 Summary of the results: over int[] arrays, indexed loops are faster on
 arrays with fewer than about a million elements.  The fewer the elements,
 the faster indexed loops are relative to for-each loops.  This could be
 explained by a higher one-time setup cost for the for-each loop - above a
 certain array size, the for-each setup cost is lost in the noise.  It
 should be noted, however, that this one-time setup cost is quite small,
 and might be worth the increased code clarity.
 
 Here are the results for three different platforms:
 
   - Best of five iterations for each combination
   - All using the -server JVM option
   - Holding constant #iters * #elems = 10^10
   - Rounding the reported real time to the nearest tenth of a second
   - % Slower = 100 * ((For-each - Indexed) / Indexed)
 
 Platform #1: Windows XP SP2; Intel Core 2 duo [EMAIL PROTECTED]; Java 1.5.0_13
 
 #iters  #elems  For-each  Indexed  % Slower
 --  --    ---  
   10^910^1 22.3s13.8s   62%
   10^810^2 16.0s13.6s   18%
   10^610^4 14.8s13.0s   14%
   10^410^6 12.9s12.9s0%
   10^310^7 13.4s13.3s1%
 
 Platform #2: Debian Linux, 2.6.21.7 kernel; Intel Xeon [EMAIL PROTECTED]; Java
 1.5.0_14
 
 #iters  #elems  For-each  Indexed  % Slower
 --  --    ---  
   10^910^1 33.6s14.2s  137%
   10^810^2 20.4s13.9s   47%
   10^610^4 19.0s12.7s   50%
   10^410^6 12.7s12.8s   -1%
   10^310^7 13.2s13.2s0%
 
 Platform #3: Debian Linux, 2.6.21.7 kernel; Intel Xeon [EMAIL PROTECTED]; Java
 1.5.0_10
 
 #iters  #elems  For-each  Indexed  % Slower
 --  --    ---  
   10^910^1102.7s73.6s   40%
   10^810^2107.8s60.0s   80%
   10^610^4105.2s58.6s   80%
   10^410^6 58.8s53.0s   11%
   10^310^7 60.0s54.1s   11%
 
 
 - ForEachTest.java follows -
 
 import java.util.Date;
 import java.util.Random;
 
 /**
  * This is meant to be called from a shell script that varies the loop
 style,
  * the number of iterations over the loop, and the number of elements in
 the
  * array over which the loop iterates, e.g.:
  * 
  * cmd=java -server -cp . ForEachTest
  * for elems in 10 100 1 100 1000 ; do
  * iters=$((100/${elems}))
  * for run in 1 2 3 4 5 ; do
  * time $cmd --indexed --arraysize $elems --iterations $iters
  * time $cmd --foreach --arraysize $elems --iterations $iters
  * done
  * done
  *
  */
 public class ForEachTest {
   static String NL = System.getProperty(line.separator);
   static String usage
 = Usage: java -server -cp . ForEachTest [ --indexed | --foreach ]
   + NL + \t--iterations num-iterations  --arraysize array-size;
 
   public static void main(String[] args) {
 boolean useIndexedLoop = false;
 int size = 0;
 int iterations = 0;
 try {
   for (int argnum = 0 ; argnum  args.length ; ++argnum) {
 if (args[argnum].equals(--indexed)) {
   useIndexedLoop = true;
 } else if (args[argnum].equals(--foreach)) {
   useIndexedLoop = false;
 } else if (args[argnum].equals(--iterations)) {
   iterations = Integer.parseInt(args[++argnum]);
 } else if (args[argnum].equals(--arraysize)) {
   size = Integer.parseInt(args[++argnum]);
 

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread robert engels

I think it is going to be highly JVM dependent.

I reworked it to call each twice (and reordered the tests)... the  
foreach is still faster. Ialso ran it on Windows (under Parallels)  
and got similar results, but in some cases the indexed was faster.


server times are tough to judge because normally the server is not  
going to compile until it hits it 10k times, but this can be  
configured...


I think this is a case where you need to make a judgement based on  
expected behavior as there are probably too many variables.


The 'foreach' should be faster in the general case for arrays as the  
bounds checking can be avoided.


But, I doubt the speed difference is going to matter much either way,  
and eventually the JVM impl will converge to near equal performance.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread Yonik Seeley
Just for kicks, I tried it on a 64 bit Athlon, linux_x86_64, jvm=64
bit Sun 1.6 -server.
The explicit loop counter was 50% faster (for N=10... the inner loop)

-Yonik

On Tue, Apr 8, 2008 at 8:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Tue, Apr 8, 2008 at 7:48 PM, robert engels [EMAIL PROTECTED] wrote:
   That is opposite of my testing:...
  
The 'foreach' is consistently faster.

  It's consistently slower for me (I tested java5 and java6 both with
  -server on a P4).
  I'm a big fan of testing different methods in different test runs
  (because of hotspot, gc, etc).

  Example results:
  $ c:/opt/jdk16/bin/java -server t 1 10 foreach
  N = 10
  method=foreachlen=10 indexed time = 8734

  [EMAIL PROTECTED] /cygdrive/h/tmp
  $ c:/opt/jdk16/bin/java -server t 1 10 iter
  N = 10
  method=iterlen=10 indexed time = 7062


  Here's my test code (a modified version of yours):

  public class t {

public static void main(String[] args) {
int I = Integer.parseInt(args[0]); // 100
int N = Integer.parseInt(args[1]); // 10
String method = args[2].intern();  // foreach or iter


String[] strings = new String[N];

for (int i = 0; i  N; i++) {
strings[i] = Integer.toString(i);

}

System.out.println(N =  + N);

long len = 0;
long start = System.currentTimeMillis();

if (method==foreach)

  for (int i = 0; i  I; i++) {
  for (String s : strings) {
  len += s.length();
  }
  }
else

  for (int i = 0; i  I; i++) {
  for (int j = 0; j  N; j++) {
  len += strings[j].length();
  }
  }

System.out.println(method=+method + len=+len+ indexed
  time =  + (System.currentTimeMillis() - start));
  }
  }


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread robert engels

That is opposite of my testing:...

The 'foreach' is consistently faster. The time difference is  
independent of the size of the array. What I know about JVM  
implementations, the foreach version SHOULD always be faster -  
because the no bounds checking needs to be done on the element access...


Times for the client JVM under 1.5_13

N = 10
indexed time = 14
foreach time = 8
N = 100
indexed time = 90
foreach time = 75
N = 1000
indexed time = 875
foreach time = 732
N = 1
indexed time = 8801
foreach time = 7552
N = 10
indexed time = 88566
foreach time = 75974

Times for the server JVM under 1.5_13

N = 10
indexed time = 21
foreach time = 21
N = 100
indexed time = 85
foreach time = 32
N = 1000
indexed time = 347
foreach time = 303
N = 1
indexed time = 3472
foreach time = 3017
N = 10
indexed time = 34158
foreach time = 30133

package test;

import junit.framework.TestCase;

public class LoopTest extends TestCase {
public void testLoops() {

int I = 10;
int N = 10;

for (int factor = 0; factor  5; factor++) {
String[] strings = new String[N];

for (int i = 0; i  N; i++) {
strings[i] = some string;
}

System.out.println(N =  + N);

long len = 0;
long start = System.currentTimeMillis();

for (int i = 0; i  I; i++) {
for (int j = 0; j  N; j++) {
len += strings[j].length();
}
}

	System.out.println(indexed time =  + (System.currentTimeMillis 
() - start));


len = 0;
start = System.currentTimeMillis();
for (int i = 0; i  I; i++) {
for (String s : strings) {
len += s.length();
}
}
	System.out.println(foreach time =  + (System.currentTimeMillis 
() - start));

N *= 10;
}
}

}



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread Yonik Seeley
On Tue, Apr 8, 2008 at 7:48 PM, robert engels [EMAIL PROTECTED] wrote:
 That is opposite of my testing:...

  The 'foreach' is consistently faster.

It's consistently slower for me (I tested java5 and java6 both with
-server on a P4).
I'm a big fan of testing different methods in different test runs
(because of hotspot, gc, etc).

Example results:
$ c:/opt/jdk16/bin/java -server t 1 10 foreach
N = 10
method=foreachlen=10 indexed time = 8734

[EMAIL PROTECTED] /cygdrive/h/tmp
$ c:/opt/jdk16/bin/java -server t 1 10 iter
N = 10
method=iterlen=10 indexed time = 7062


Here's my test code (a modified version of yours):

public class t {
   public static void main(String[] args) {
   int I = Integer.parseInt(args[0]); // 100
   int N = Integer.parseInt(args[1]); // 10
   String method = args[2].intern();  // foreach or iter

   String[] strings = new String[N];

   for (int i = 0; i  N; i++) {
   strings[i] = Integer.toString(i);
   }

   System.out.println(N =  + N);

   long len = 0;
   long start = System.currentTimeMillis();

   if (method==foreach)
 for (int i = 0; i  I; i++) {
 for (String s : strings) {
 len += s.length();
 }
 }
   else
 for (int i = 0; i  I; i++) {
 for (int j = 0; j  N; j++) {
 len += strings[j].length();
 }
 }

   System.out.println(method=+method + len=+len+ indexed
time =  + (System.currentTimeMillis() - start));
 }
}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-08 Thread Yonik Seeley
foreach vs explicit loop counter is pretty academic for Lucene anyway I think.
I can't think of any inner loops where it would really matter.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-03 Thread Mark Miller

 - replacement of indexed for loops with for each constructs

Is this always the best idea? Doesn't the for loop construct make an
iterator, which can be much slower than an indexed for loop?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1257) Port to Java5

2008-04-02 Thread JIRA
Port to Java5
-

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
QueryParser, Search, Store, Term Vectors
Affects Versions: 2.3.1
Reporter: Cédric Champeau
 Attachments: java5.patch

For my needs I've updated Lucene so that it uses Java 5 constructs. I know Java 
5 migration had been planned for 2.1 someday in the past, but don't know when 
it is planned now. This patch against the trunk includes :

- most obvious generics usage (there are tons of usages of sets, ... Those 
which are commonly used have been generified)
- PriorityQueue generification
- replacement of indexed for loops with for each constructs
- removal of unnececessary unboxing

The code is to my opinion much more readable with those features (you actually 
*know* what is stored in collections reading the code, without the need to 
lookup for field definitions everytime) and it simplifies many algorithms.

Note that this patch also includes an interface for the Query class. This has 
been done for my company's needs for building custom Query classes which add 
some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
casts. I know this introduction is not wanted by the team, but it really makes 
our developments easier to maintain. If you don't want to use this, replace all 
/Queriable/ calls with standard /Query/.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]