What is the best file system for Lucene?
Hi! I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester Config #2 (a desktop PC): SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320 SCSI winchester You can see that the hardware of #2 is at least twice better/faster than #1. I'm searching the reason and the solution to take advantage of the better hardware compared to the poor notebook. Currently #2 can't amazingly outperform the notebook (#1). The question is: What can be worse in #2 than on the poor notebook? I can imagine only software problems. Which are the sotware parts then? 1. The OS Is SuSE 9.1 a LOT slower than WinXP pro? 2. The file system Is reisefs a LOT slower than NTFS? Regards, Sanyi __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Interesting, what are your merge settings Sorry, I didn't mention that I was talking about search performance. I'm using the same, fully optimized index on both systems. (I've generated both indexes with the same code from the same database on the actual OS) which JDK are you using? I'm using the same Sun JDK on both systems. I've tried so far: j2sdk1.4.2_04 _05 and _06. I didn't notice speed differences between these subversions. Do you know about significant speed differences between them I should notice? Have you tried with hyperthreading turned off on #2? No, but I will try it if the problem isn't in the file system. I hope that the reason of slowness is reiserfs, because it is the easiest to change. What file systems are you people using Lucene on? And what are your experiences? Regards, Sanyi __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
What file systems are you people using Lucene on? And what are your experiences? http://www.apple.com/xsan/ Actually it is a beta version and have some small issues but it is very fast and easy to manage in case you get it installed. The installation it self is tricky since it is very dependend on your network setup and need a well working dns, routings etc. However it is fast as the wind. :-) HTH Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Hi Sanyi Could you try XP on your desktop - that would take some variables out. The problem is that you are comparing OS, as well as filesystems, as well as different hardware configs. Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. So, taking account of this its not too surprising that they are searching at comparable speeds. HTH Pete - Original Message - From: Sanyi [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 30, 2004 11:28 AM Subject: Re: What is the best file system for Lucene? Interesting, what are your merge settings Sorry, I didn't mention that I was talking about search performance. I'm using the same, fully optimized index on both systems. (I've generated both indexes with the same code from the same database on the actual OS) which JDK are you using? I'm using the same Sun JDK on both systems. I've tried so far: j2sdk1.4.2_04 _05 and _06. I didn't notice speed differences between these subversions. Do you know about significant speed differences between them I should notice? Have you tried with hyperthreading turned off on #2? No, but I will try it if the problem isn't in the file system. I hope that the reason of slowness is reiserfs, because it is the easiest to change. What file systems are you people using Lucene on? And what are your experiences? Regards, Sanyi __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Sanyi wrote: I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester Config #2 (a desktop PC): SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320 SCSI winchester You can see that the hardware of #2 is at least twice better/faster than #1. I'm searching the reason and the solution to take advantage of the better hardware compared to the poor notebook. Currently #2 can't amazingly outperform the notebook (#1). How large is the index? If it's less than a couple of GByte then it will be entirely in memory after you've done a few searches on the Linux box. You can force it into memory by cat'ing all the index files on to /dev/null a couple of times (cat * /dev/null). A 3GHz system should now perform dramatically faster than a 1.5GHz system no matter what the file system. (And it's still 3GHz whether or not hyperthreading is turned on -- hyperthreading simply makes use of some under-used silicon to give you somewhere between 1 and 2 CPUs. In some pathlogical cases it can give you less than one CPU, but I don't think lucene falls into the category. And it's going to be a helluva lot faster than any Pentium M because it has a nice healthy cache.) However, I don't believe that the hardware, OS or file system have anything to do with it. Normally if you're seeing similar performance on widely differing platforms you're seeing latency somewhere else. For example (and this is only an example) looking up a hostname in the DNS will take about the same time on almost any machine you can get hold of. You don't say how you're measuring search performance and you don't say what you're seeing. Also, what's the load on the system while you're running the tests? gkrellm on Linux is very useful as an overall view -- are you CPU bound, are you seeing lots of disk traffic? Is the system actually more-or-less idle? jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis [EMAIL PROTECTED] wrote: Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. So, taking account of this its not too surprising that they are searching at comparable speeds. HTH Pete Actually, that isn't how hyperthreading works. The second CPU in a hyperthreaded system should only run threads when the main cpu is waiting on another task, like a memory access. The second, or sub CPU is only a virtual processor. There aren't really two chips on board. New multicore processors will actually have more than one processor in one chip. Problems can arise when you are using a HT processor on an operating system that doesn't know about HT technology. The OS should only schedule jobs to run on the sub CPU under very specific circumstances. This is one of the major reasons for the scheduler overhaul in Linux 2.6. The default scheduler in 2.4 would assign threads to the sub CPU that shouldn't have been, and those threads would suffer from resource starvation. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Could you try XP on your desktop Sure, but I'll only do that I run out of ideas. so your desktop is actually using a 1.5GHz CPU for the search. No, this is not true. It uses a 3.0GHz P4 then. (HT means that you have two 3.0GHz P4s) So, it is still surprising to me. Regards, Sanyi __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: AW: What is the best file system for Lucene?
The notebook is quite good, e.g. the Pentium-M might be faster than your Pentium 4. At least it has a similar speed, because of it better internal design. Never compare cpus of different types by their frequency. Ok, this might be true, but: All of my other tests where the CPU is involved, are running a LOT faster on the desktop PC with the 3GHz P4. Even other JAVA parts are running a LOT faster. (twice as fast nearly) So, we can't even say that the JAVA VM takes no advantage of the 3GHz P4 compared to the 1.8GHz Pentium-M. Everything is a LOT faster, except searching with lucene. (which is also a bit faster, but slightly) Maybe your index is small enough to fit into the cache provided by the operating systems. So you wouldn't recognize any difference between your hard disks. It is a 3GByte index and I always reboot between tests, so cahcing is not the case. I don't think so. I'm using Windows 2000 pro and SuSE 9.0 and (from my memory) Linux seems to be sightly faster, but I can't provide any benchmark now. Are you using reiserfs with SuSE? Regards, Sanyi __ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
As a generalisation, SuSE itself is not a lot slower than Windows XP. I also very much doubt that filesystem is a factor. If you want to test w/out filesystem involvement, simply load your index into a RAMDirectory instead of using FSDirectory. That precludes filesystem overhead in searches. There are quite a number of factors involved that could be affecting performance. First off, 1.8GHz Pentium-M machines are supposed to run at about the speed of a 2.4GHz machine. The clock speeds on the mobile chips are lower, but they tend to perform much better than rated. I recommend you take a general benchmark of both machines testing both disk speed and cpu speed to get a baseline performance comparision. I also suggest turning of HT for your benchmarks and performance testing. Secondly, while the second machine appears to be twice as fast, the disk could actually perform slower on the Linux box, especially if the notebook drive has a big (8M) cache like most 7200RPM ata disk drives do. I imagine that if you hit the index with lots of simultaneous searches, that the Linux box would hold its own for much longer than the XP box simply due to the random seek performance of the scsi disk combined with scsi command queueing. RAM speed is a factor too. Is the p4 a xeon processor? The older HT xeons have a much slower bus than the newer p4-m processors. Memory speed will be affected accordingly. I haven't heard of a hard disk referred to as a winchester disk in a very long time :) Once you have an idea of how the two machines actually compare performance-wise, you can then judge how they perform index operations. Until then, all your measurements are subjective and you don't gain much by comparing the two indexing processes. Justin On Tue, 30 Nov 2004 02:04:46 -0800 (PST), Sanyi [EMAIL PROTECTED] wrote: Hi! I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester Config #2 (a desktop PC): SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320 SCSI winchester You can see that the hardware of #2 is at least twice better/faster than #1. I'm searching the reason and the solution to take advantage of the better hardware compared to the poor notebook. Currently #2 can't amazingly outperform the notebook (#1). The question is: What can be worse in #2 than on the poor notebook? I can imagine only software problems. Which are the sotware parts then? 1. The OS Is SuSE 9.1 a LOT slower than WinXP pro? 2. The file system Is reisefs a LOT slower than NTFS? Regards, Sanyi __ Do you Yahoo!? Yahoo! Mail - You care about security. So do we. http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
How large is the index? If it's less than a couple of GByte then it will be entirely in memory It is 3GBytes big and it will grow a lot. I have to search from the HDD which is very fast compared to the notebook's HDD. Average seek time: Notebook: 8-9ms Desktop: 3.9ms Data read: Notebook: max. ~20MBytes/sec Desktop: 60-80MBytes/sec So, if the bottleneck is the HDD, it has to be 2x-3x faster on the desktop system. Except if reiserfs is a lot slower than NTFS. For example (and this is only an example) looking up a hostname in the DNS will take about the same time on almost any machine you can get hold of. Ok, but I have very simple and pure tests and everything is measured part-by-part. ..and every parts speeds up a lot on the desltop system, except the lucene search part. You don't say how you're measuring search performance and you don't say what you're seeing. I call my java program from command line on both systems, like: search hello Then it searches for bravo and collects the elapsed milliseconds between every call to anything. Then it displays the results. It is very simple. Also, what's the load on the system while you're running the tests? gkrellm on Linux is very useful as an overall view -- are you CPU bound, are you seeing lots of disk traffic? Is the system actually more-or-less idle? Thanx for the hint. Since my search searches for only 30 hits, it completes too fastly to let me monitor it real-time. Anyway, if reiserfs will prove to be fast enough, I'll search for other reasons and will perform longer tests for real-time monitoring. Regards, Sanyi __ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
simply load your index into a RAMDirectory instead of using FSDirectory. I have 3GByte RAM and my index is 3GByte big currently. (it'll be soon about 4GByte) So, I have to find out this another way. First off, 1.8GHz Pentium-M machines are supposed to run at about the speed of a 2.4GHz machine. The clock speeds on the mobile chips are lower, but they tend to perform much better than rated. I recommend you take a general benchmark of both machines testing both disk speed and cpu speed to get a baseline performance comparision. I think that it a good general benchmark that almost everything runs at least twice as fast on the 3.0GHz P4 except lucene search. I can tell one more interesting info: I have a MySQL table with ~20million records. I throw a DROP INDEX on that table, MySQL rebuilds the whole huge table into a tempfile. It completes in 30 minutes on both systems. It doesn't matter again that the 15kRPM U320 HDD is 2x-3x as fast. Very surprising again. Hmm... reiserfs must be very-very slow, or I'm completly lost :) I also suggest turning of HT for your benchmarks and performance testing. I'll try this later and I really hope it won't be the reason. Secondly, while the second machine appears to be twice as fast, the disk could actually perform slower on the Linux box, especially if the notebook drive has a big (8M) cache like most 7200RPM ata disk drives do. Both drives have 8M cache. I imagine that if you hit the index with lots of simultaneous searches, that the Linux box would hold its own for much longer than the XP box simply due to the random seek performance of the scsi disk combined with scsi command queueing. Are you saying that SCSI command queuing wastes more time than a 15kRPM 3.9ms HDD can gain over a 7.2kRPM 8-9ms HDD? It sounds terrible and I hope it isn't true. RAM speed is a factor too. Is the p4 a xeon processor? The older HT xeons have a much slower bus than the newer p4-m processors. Memory speed will be affected accordingly. It is not a Xeon, just a P4 3.0GHz HT. I haven't heard of a hard disk referred to as a winchester disk in a very long time :) ;) Once you have an idea of how the two machines actually compare performance-wise, you can then judge how they perform index operations. Lucene indexing completes in 13-15 hours on the desktop system while it completes in about 29-33 hours on the notebook. Now, combine it with the DROP INDEX tests completing in the same amount of time on both and find out why is the search only slightly faster :) Until then, all your measurements are subjective and you don't gain much by comparing the two indexing processes. I'm worried about searching. Indexing is a lot faster on the desktop config. Regards, Sanyi __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: What is the best file system for Lucene?
You may want to give the IBM JVM a try - I've found it faster in some cases... http://www-106.ibm.com/developerworks/java/jdk/linux140/ Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: What is the best file system for Lucene?
As I understand hyperthreading, this is not true: Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. You still have the full speed of the processor available - the processor itself just keeps switching between different threads of execution. Some people have noted that some (single threaded) applications will run 5-10% slower when hyperthreading is turned on - but that depends on the app. It certainly won't be running at half speed. Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Hello, Lucene indexing completes in 13-15 hours on the desktop system while it completes in about 29-33 hours on the notebook. Now, combine it with the DROP INDEX tests completing in the same amount of time on both and find out why is the search only slightly faster :) Until then, all your measurements are subjective and you don't gain much by comparing the two indexing processes. I'm worried about searching. Indexing is a lot faster on the desktop config. This tells you that your problem is not the disk itself, and not the fielsystem. The bottleneck is elsewhere. Why not run your search under a profiler? That will tell you where the JVM is spending its time. It may even be in some weird InetAddress call, like another person already pointed out. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Thanx for the replies to you all. I was looking for someone with the same experiences as mine ones, but it seems that I'll have to test this myself. I'll try out my ideas and the most interesting ideas from you guys. Regards, Sanyi __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]