Re: What is the best file system for Lucene?
Thanx for the replies to you all. I was looking for someone with the same experiences as mine ones, but it seems that I'll have to test this myself. I'll try out my ideas and the most interesting ideas from you guys. Regards, Sanyi __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Hello, > Lucene indexing completes in 13-15 hours on the desktop system while > it completes in about 29-33 > hours on the notebook. > > Now, combine it with the DROP INDEX tests completing in the same > amount of time on both and find > out why is the search only slightly faster :) > > > Until then, all your measurements are subjective and you > > don't gain much by comparing the two indexing processes. > > I'm worried about searching. Indexing is a lot faster on the desktop > config. This tells you that your problem is not the disk itself, and not the fielsystem. The bottleneck is elsewhere. Why not run your search under a profiler? That will tell you where the JVM is spending its time. It may even be in some weird InetAddress call, like another person already pointed out. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: What is the best file system for Lucene?
As I understand hyperthreading, this is not true: >Also, unless you take your hyperthreading off, with just one index you are >searching with just one half of the CPU - so your desktop is actually using >a 1.5GHz CPU for the search. You still have the full speed of the processor available - the processor itself just keeps switching between different threads of execution. Some people have noted that some (single threaded) applications will run 5-10% slower when hyperthreading is turned on - but that depends on the app. It certainly won't be running at half speed. Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: What is the best file system for Lucene?
You may want to give the IBM JVM a try - I've found it faster in some cases... http://www-106.ibm.com/developerworks/java/jdk/linux140/ Dan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
> simply load your index into a > RAMDirectory instead of using FSDirectory. I have 3GByte RAM and my index is 3GByte big currently. (it'll be soon about 4GByte) So, I have to find out this another way. > First off, 1.8GHz Pentium-M machines are supposed to run at about the > speed of a 2.4GHz machine. The clock speeds on the mobile chips are > lower, but they tend to perform much better than rated. I recommend > you take a general benchmark of both machines testing both disk speed > and cpu speed to get a baseline performance comparision. I think that it a good general benchmark that almost everything runs at least twice as fast on the 3.0GHz P4 except lucene search. I can tell one more interesting info: I have a MySQL table with ~20million records. I throw a DROP INDEX on that table, MySQL rebuilds the whole huge table into a tempfile. It completes in 30 minutes on both systems. It doesn't matter again that the 15kRPM U320 HDD is 2x-3x as fast. Very surprising again. Hmm... reiserfs must be very-very slow, or I'm completly lost :) > I also suggest turning of HT for your benchmarks and performance testing. I'll try this later and I really hope it won't be the reason. > Secondly, while the second machine appears to be twice as fast, the > disk could actually perform slower on the Linux box, especially if the > notebook drive has a big (8M) cache like most 7200RPM ata disk drives > do. Both drives have 8M cache. > I imagine that if you hit the index with lots of simultaneous > searches, that the Linux box would hold its own for much longer than > the XP box simply due to the random seek performance of the scsi disk > combined with scsi command queueing. Are you saying that SCSI command queuing wastes more time than a 15kRPM 3.9ms HDD can gain over a 7.2kRPM 8-9ms HDD? It sounds terrible and I hope it isn't true. > RAM speed is a factor too. Is the p4 a xeon processor? The older HT > xeons have a much slower bus than the newer p4-m processors. Memory > speed will be affected accordingly. It is not a Xeon, just a P4 3.0GHz HT. > I haven't heard of a hard disk referred to as a winchester disk in a > very long time :) ;) > Once you have an idea of how the two machines actually compare > performance-wise, you can then judge how they perform index > operations. Lucene indexing completes in 13-15 hours on the desktop system while it completes in about 29-33 hours on the notebook. Now, combine it with the DROP INDEX tests completing in the same amount of time on both and find out why is the search only slightly faster :) > Until then, all your measurements are subjective and you > don't gain much by comparing the two indexing processes. I'm worried about searching. Indexing is a lot faster on the desktop config. Regards, Sanyi __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
> How large is the index? If it's less than a couple of GByte then it > will be entirely in memory It is 3GBytes big and it will grow a lot. I have to search from the HDD which is very fast compared to the notebook's HDD. Average seek time: Notebook: 8-9ms Desktop: 3.9ms Data read: Notebook: max. ~20MBytes/sec Desktop: 60-80MBytes/sec So, if the bottleneck is the HDD, it has to be 2x-3x faster on the desktop system. Except if reiserfs is a lot slower than NTFS. > For example (and this is only an example) looking up a hostname in the > DNS will take about the same time on almost any machine you can get hold of. Ok, but I have very simple and pure tests and everything is measured part-by-part. ..and every parts speeds up a lot on the desltop system, except the lucene search part. > You don't say how you're measuring search performance and you don't say > what you're seeing. I call my java program from command line on both systems, like: search hello Then it searches for bravo and collects the elapsed milliseconds between every call to anything. Then it displays the results. It is very simple. > Also, what's the load on the system while you're > running the tests? gkrellm on Linux is very useful as an overall view > -- are you CPU bound, are you seeing lots of disk traffic? Is the > system actually more-or-less idle? Thanx for the hint. Since my search searches for only 30 hits, it completes too fastly to let me monitor it real-time. Anyway, if reiserfs will prove to be fast enough, I'll search for other reasons and will perform longer tests for real-time monitoring. Regards, Sanyi __ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
As a generalisation, SuSE itself is not a lot slower than Windows XP. I also very much doubt that filesystem is a factor. If you want to test w/out filesystem involvement, simply load your index into a RAMDirectory instead of using FSDirectory. That precludes filesystem overhead in searches. There are quite a number of factors involved that could be affecting performance. First off, 1.8GHz Pentium-M machines are supposed to run at about the speed of a 2.4GHz machine. The clock speeds on the mobile chips are lower, but they tend to perform much better than rated. I recommend you take a general benchmark of both machines testing both disk speed and cpu speed to get a baseline performance comparision. I also suggest turning of HT for your benchmarks and performance testing. Secondly, while the second machine appears to be twice as fast, the disk could actually perform slower on the Linux box, especially if the notebook drive has a big (8M) cache like most 7200RPM ata disk drives do. I imagine that if you hit the index with lots of simultaneous searches, that the Linux box would hold its own for much longer than the XP box simply due to the random seek performance of the scsi disk combined with scsi command queueing. RAM speed is a factor too. Is the p4 a xeon processor? The older HT xeons have a much slower bus than the newer p4-m processors. Memory speed will be affected accordingly. I haven't heard of a hard disk referred to as a winchester disk in a very long time :) Once you have an idea of how the two machines actually compare performance-wise, you can then judge how they perform index operations. Until then, all your measurements are subjective and you don't gain much by comparing the two indexing processes. Justin On Tue, 30 Nov 2004 02:04:46 -0800 (PST), Sanyi <[EMAIL PROTECTED]> wrote: > Hi! > > I'm testing Lucene 1.4.2 on two very different configs, but with the same > index. > I'm very surprised by the results: Both systems are searching at about the > same speed, but I'd > expect (and I really need) to run Lucene a lot faster on my stronger config. > > Config #1 (a notebook): > WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester > > Config #2 (a desktop PC): > SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, > 15000RPM U320 SCSI > winchester > > You can see that the hardware of #2 is at least twice better/faster than #1. > I'm searching the reason and the solution to take advantage of the better > hardware compared to the > poor notebook. > Currently #2 can't amazingly outperform the notebook (#1). > > The question is: What can be worse in #2 than on the poor notebook? > > I can imagine only software problems. > Which are the sotware parts then? > 1. The OS > Is SuSE 9.1 a LOT slower than WinXP pro? > 2. The file system > Is reisefs a LOT slower than NTFS? > > Regards, > Sanyi > > __ > Do you Yahoo!? > Yahoo! Mail - You care about security. So do we. > http://promotions.yahoo.com/new_mail > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
> Could you try XP on your desktop Sure, but I'll only do that I run out of ideas. > so your desktop is actually using > a 1.5GHz CPU for the search. No, this is not true. It uses a 3.0GHz P4 then. (HT means that you have two 3.0GHz P4s) So, it is still surprising to me. Regards, Sanyi __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis <[EMAIL PROTECTED]> wrote: > Also, unless you take your hyperthreading off, with just one index you are > searching with just one half of the CPU - so your desktop is actually using > a 1.5GHz CPU for the search. So, taking account of this its not too > surprising that they are searching at comparable speeds. > > HTH > Pete Actually, that isn't how hyperthreading works. The "second" CPU in a hyperthreaded system should only run threads when the "main" cpu is waiting on another task, like a memory access. The second, or sub CPU is only a virtual processor. There aren't really two chips on board. New multicore processors will actually have more than one processor in one chip. Problems can arise when you are using a HT processor on an operating system that doesn't know about HT technology. The OS should only schedule jobs to run on the sub CPU under very specific circumstances. This is one of the major reasons for the scheduler overhaul in Linux 2.6. The default scheduler in 2.4 would assign threads to the sub CPU that shouldn't have been, and those threads would suffer from resource starvation. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Sanyi wrote: I'm testing Lucene 1.4.2 on two very different configs, but with the same index. I'm very surprised by the results: Both systems are searching at about the same speed, but I'd expect (and I really need) to run Lucene a lot faster on my stronger config. Config #1 (a notebook): WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester Config #2 (a desktop PC): SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320 SCSI winchester You can see that the hardware of #2 is at least twice better/faster than #1. I'm searching the reason and the solution to take advantage of the better hardware compared to the poor notebook. Currently #2 can't amazingly outperform the notebook (#1). How large is the index? If it's less than a couple of GByte then it will be entirely in memory after you've done a few searches on the Linux box. You can force it into memory by cat'ing all the index files on to /dev/null a couple of times (cat * > /dev/null). A 3GHz system should now perform dramatically faster than a 1.5GHz system no matter what the file system. (And it's still 3GHz whether or not hyperthreading is turned on -- hyperthreading simply makes use of some under-used silicon to give you somewhere between 1 and 2 CPUs. In some pathlogical cases it can give you less than one CPU, but I don't think lucene falls into the category. And it's going to be a helluva lot faster than any Pentium M because it has a nice healthy cache.) However, I don't believe that the hardware, OS or file system have anything to do with it. Normally if you're seeing similar performance on widely differing platforms you're seeing latency somewhere else. For example (and this is only an example) looking up a hostname in the DNS will take about the same time on almost any machine you can get hold of. You don't say how you're measuring search performance and you don't say what you're seeing. Also, what's the load on the system while you're running the tests? gkrellm on Linux is very useful as an overall view -- are you CPU bound, are you seeing lots of disk traffic? Is the system actually more-or-less idle? jch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
Hi Sanyi Could you try XP on your desktop - that would take some variables out. The problem is that you are comparing OS, as well as filesystems, as well as different hardware configs. Also, unless you take your hyperthreading off, with just one index you are searching with just one half of the CPU - so your desktop is actually using a 1.5GHz CPU for the search. So, taking account of this its not too surprising that they are searching at comparable speeds. HTH Pete - Original Message - From: "Sanyi" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Tuesday, November 30, 2004 11:28 AM Subject: Re: What is the best file system for Lucene? > > Interesting, what are your merge settings > > Sorry, I didn't mention that I was talking about search performance. > I'm using the same, fully optimized index on both systems. > (I've generated both indexes with the same code from the same database on the actual OS) > > > which JDK are you using? > > I'm using the same Sun JDK on both systems. > I've tried so far: > j2sdk1.4.2_04 _05 and _06. > I didn't notice speed differences between these subversions. > Do you know about significant speed differences between them I should notice? > > > Have you tried with hyperthreading turned off on #2? > > No, but I will try it if the problem isn't in the file system. > I hope that the reason of slowness is reiserfs, because it is the easiest to change. > > What file systems are you people using Lucene on? And what are your experiences? > > Regards, > Sanyi > > > > > __ > Do you Yahoo!? > The all-new My Yahoo! - What will yours do? > http://my.yahoo.com > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
> What file systems are you people using Lucene on? And what are your > experiences? http://www.apple.com/xsan/ Actually it is a beta version and have some small issues but it is very fast and easy to manage in case you get it installed. The installation it self is tricky since it is very dependend on your network setup and need a well working dns, routings etc. However it is fast as the wind. :-) HTH Stefan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
> Interesting, what are your merge settings Sorry, I didn't mention that I was talking about search performance. I'm using the same, fully optimized index on both systems. (I've generated both indexes with the same code from the same database on the actual OS) > which JDK are you using? I'm using the same Sun JDK on both systems. I've tried so far: j2sdk1.4.2_04 _05 and _06. I didn't notice speed differences between these subversions. Do you know about significant speed differences between them I should notice? > Have you tried with hyperthreading turned off on #2? No, but I will try it if the problem isn't in the file system. I hope that the reason of slowness is reiserfs, because it is the easiest to change. What file systems are you people using Lucene on? And what are your experiences? Regards, Sanyi __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: What is the best file system for Lucene?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Interesting, what are your merge settings, which JDK are you using?(there are big differences between versions). Have you tried with hyperthreading turned off on #2? - if so did it fare any differently? Regards, John Sanyi wrote: | Hi! | | I'm testing Lucene 1.4.2 on two very different configs, but with the same index. | I'm very surprised by the results: Both systems are searching at about the same speed, but I'd | expect (and I really need) to run Lucene a lot faster on my stronger config. | | Config #1 (a notebook): | WinXP Pro, NTFS, 1.8GHz Pentium-M, 768Megs memory, 7200RPM winchester | | Config #2 (a desktop PC): | SuSE 9.1 Pro, resiefs, 3.0GHZ P4 HT (virtually two 3.0GHz P4s), 3GByte RAM, 15000RPM U320 SCSI | winchester | | You can see that the hardware of #2 is at least twice better/faster than #1. | I'm searching the reason and the solution to take advantage of the better hardware compared to the | poor notebook. | Currently #2 can't amazingly outperform the notebook (#1). | | The question is: What can be worse in #2 than on the poor notebook? | | I can imagine only software problems. | Which are the sotware parts then? | 1. The OS | Is SuSE 9.1 a LOT slower than WinXP pro? | 2. The file system | Is reisefs a LOT slower than NTFS? | | Regards, | Sanyi | | | | | __ | Do you Yahoo!? | Yahoo! Mail - You care about security. So do we. | http://promotions.yahoo.com/new_mail | | - | To unsubscribe, e-mail: [EMAIL PROTECTED] | For additional commands, e-mail: [EMAIL PROTECTED] | -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFBrFLWHgDLUzVQ7OARAraNAJ96DcMxVGYZQCmbjTpnaNJHlBEDRwCfcYoa 1UVJ37tcsNRp2m7h42265QA= =BP6l -END PGP SIGNATURE- ** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails to, from and within RTÉ may be subject to the Freedom of Information Act 1997 and may be liable to disclosure. ** - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]