Re: what's better for in memory searching?
I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog goks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
Li Li, have you considered allocating a RAM-Disk? It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory. MMapping on that is likely to be useless but I doubt you can set it to zero. That'd need experiment. Also, doesn't caching and auto-warming provide the lowest latency for all expected queries ? Paul Le 11 juin 2012 à 10:50, Li Li a écrit : I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc.
Re: what's better for in memory searching?
do you mean software RAM disk? using RAM to simulate disk? How to deal with Persistence? maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024. it may have a waste. but I can adjust my merge policy to avoid to much segments. I will have a big segment and a small segment. Every night I will merge them. new added documents will flush into a new segment and I will merge the new generated segment and the small one. Our update operations are not very frequent. On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht p...@hoplahup.net wrote: Li Li, have you considered allocating a RAM-Disk? It's not the most flexible thing... but it's certainly close, in performance to a RAMDirectory. MMapping on that is likely to be useless but I doubt you can set it to zero. That'd need experiment. Also, doesn't caching and auto-warming provide the lowest latency for all expected queries ? Paul Le 11 juin 2012 à 10:50, Li Li a écrit : I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc.
Re: what's better for in memory searching?
I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
You cannot guarantee this when you're running out of RAM. You'd have a problem then anyway. Why are you caring that much? Did you yet have performance issues? 1GB should load really fast, and both auto warming and OS cache should help a lot as well. With such an index, you usually don't need to fine tune performance that much. Did you think about using a SSD? Since you want to persist your index, you'll need to live with disk IO anyway. Greetings, Kuli Am 11.06.2012 11:20, schrieb Li Li: I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.comwrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.comwrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. btw, the full gc may also add uncertainty, So I need optimize it as much as possible. On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann k...@solarier.de wrote: You cannot guarantee this when you're running out of RAM. You'd have a problem then anyway. Why are you caring that much? Did you yet have performance issues? 1GB should load really fast, and both auto warming and OS cache should help a lot as well. With such an index, you usually don't need to fine tune performance that much. Did you think about using a SSD? Since you want to persist your index, you'll need to live with disk IO anyway. Greetings, Kuli Am 11.06.2012 11:20, schrieb Li Li: I am sorry. I make a mistake. even use RAMDirectory, I can not guarantee they are not swapped out. On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
I found this. http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux it can provide fine grained control of swapping On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote: Set the swapiness to 0 to avoid memory pages being swapped to disk too early. http://en.wikipedia.org/wiki/Swappiness -Kuli Am 11.06.2012 10:38, schrieb Li Li: I have roughly read the codes of RAMDirectory. it use a list of 1024 byte arrays and many overheads. But as far as I know, using MMapDirectory, I can't prevent the page faults. OS will swap less frequent pages out. Even if I allocate enough memory for JVM, I can guarantee all the files in the directory are in memory. am I understanding right? if it is, then some less frequent queries will be slow. How can I let them always in memory? On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com wrote: Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com
Re: what's better for in memory searching?
Le 11 juin 2012 à 11:16, Li Li a écrit : do you mean software RAM disk? Right. OS level. using RAM to simulate disk? Yes. That generally makes a disk which is boost fast in reading and writing. How to deal with Persistence? Synchronization (slaving?). paul
Re: what's better for in memory searching?
On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote: yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. I'm with Michael on this one: It seems that you're doing a premature optimization. Guessing that your final index will be 5GB in size with 1 million documents (give or take 900.000:-), relatively simple queries and so on, an average response time of 10 ms should be attainable even on spinning drives. One hundred document updates per day are not many, so again I would not expect problems. As is often the case on this mailing list, the advice is try it. Using a normal on-disk index and doing some warm up is the easy solution to implement and nearly all of your work on this will be usable for a RAM-based solution, if you are not satisfied with the speed. Or you could buy a small cheap SSD and have no more worries... Regards, Toke Eskildsen
Re: what's better for in memory searching?
Point about premature optimization makes sense for me. However some time ago I've bookmarked potentially useful approach http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html. On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote: yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. I'm with Michael on this one: It seems that you're doing a premature optimization. Guessing that your final index will be 5GB in size with 1 million documents (give or take 900.000:-), relatively simple queries and so on, an average response time of 10 ms should be attainable even on spinning drives. One hundred document updates per day are not many, so again I would not expect problems. As is often the case on this mailing list, the advice is try it. Using a normal on-disk index and doing some warm up is the easy solution to implement and nearly all of your work on this will be usable for a RAM-based solution, if you are not satisfied with the speed. Or you could buy a small cheap SSD and have no more worries... Regards, Toke Eskildsen -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: what's better for in memory searching?
is this method equivalent to set vm.swappiness which is global? or it can set the swappiness for jvm process? On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Point about premature optimization makes sense for me. However some time ago I've bookmarked potentially useful approach http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html. On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote: On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote: yes, I need average query time less than 10 ms. The faster the better. I have enough memory for lucene because I know there are not too much data. there are not many modifications. every day there are about hundreds of document update. if indexes are not in physical memory, then IO operations will cost a few ms. I'm with Michael on this one: It seems that you're doing a premature optimization. Guessing that your final index will be 5GB in size with 1 million documents (give or take 900.000:-), relatively simple queries and so on, an average response time of 10 ms should be attainable even on spinning drives. One hundred document updates per day are not many, so again I would not expect problems. As is often the case on this mailing list, the advice is try it. Using a normal on-disk index and doing some warm up is the easy solution to implement and nearly all of your work on this will be usable for a RAM-based solution, if you are not satisfied with the speed. Or you could buy a small cheap SSD and have no more worries... Regards, Toke Eskildsen -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: what's better for in memory searching?
Yes, use MMapDirectory. It is faster and uses memory more efficiently than RAMDirectory. This sounds wrong, but it is true. With RAMDirectory, Java has to work harder doing garbage collection. On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote: hi all I want to use lucene 3.6 providing searching service. my data is not very large, raw data is less that 1GB and I want to use load all indexes into memory. also I need save all indexes into disk persistently. I originally want to use RAMDirectory. But when I read its javadoc. Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte [1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. should I use MMapDirectory? it seems another contrib instantiated. anyone test it with RAMDirectory? -- Lance Norskog goks...@gmail.com