Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskog goks...@gmail.com wrote:
 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote:
 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?



 --
 Lance Norskog
 goks...@gmail.com


Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
Set the swapiness to 0 to avoid memory pages being swapped to disk too 
early.


http://en.wikipedia.org/wiki/Swappiness

-Kuli

Am 11.06.2012 10:38, schrieb Li Li:

I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com  wrote:

Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com  wrote:

hi all
   I want to use lucene 3.6 providing searching service. my data is
not very large, raw data is less that 1GB and I want to use load all
indexes into memory. also I need save all indexes into disk
persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on
  multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
the operating system, so copying data to
  Java heap space is not useful.

should I use MMapDirectory? it seems another contrib instantiated.
anyone test it with RAMDirectory?




--
Lance Norskog
goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht
Li Li,

have you considered allocating a RAM-Disk?
It's not the most flexible thing... but it's certainly close, in performance to 
a RAMDirectory.
MMapping on that is likely to be useless but I doubt you can set it to zero.
That'd need experiment.

Also, doesn't caching and auto-warming provide the lowest latency for all 
expected queries ?

Paul


Le 11 juin 2012 à 10:50, Li Li a écrit :

   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
do you mean software RAM disk? using RAM to simulate disk? How to deal
with Persistence?

maybe I can hack by increase RAMOutputStream.BUFFER_SIZE from 1024 to 1024*1024.
it may have a waste. but I can adjust my merge policy to avoid to much segments.
I will have a big segment and a small segment. Every night I will
merge them. new added documents will flush into a new segment and I
will merge the new generated segment and the small one.
Our update operations are not very frequent.

On Mon, Jun 11, 2012 at 4:59 PM, Paul Libbrecht p...@hoplahup.net wrote:
 Li Li,

 have you considered allocating a RAM-Disk?
 It's not the most flexible thing... but it's certainly close, in performance 
 to a RAMDirectory.
 MMapping on that is likely to be useless but I doubt you can set it to zero.
 That'd need experiment.

 Also, doesn't caching and auto-warming provide the lowest latency for all 
 expected queries ?

 Paul


 Le 11 juin 2012 à 10:50, Li Li a écrit :

   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.




Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote:
 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com  wrote:

 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?




 --
 Lance Norskog
 goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Michael Kuhlmann
You cannot guarantee this when you're running out of RAM. You'd have a 
problem then anyway.


Why are you caring that much? Did you yet have performance issues? 1GB 
should load really fast, and both auto warming and OS cache should help 
a lot as well. With such an index, you usually don't need to fine tune 
performance that much.


Did you think about using a SSD? Since you want to persist your index, 
you'll need to live with disk IO anyway.


Greetings,
Kuli

Am 11.06.2012 11:20, schrieb Li Li:

I am sorry. I make a mistake. even use RAMDirectory, I can not
guarantee they are not swapped out.

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de  wrote:

Set the swapiness to 0 to avoid memory pages being swapped to disk too
early.

http://en.wikipedia.org/wiki/Swappiness

-Kuli

Am 11.06.2012 10:38, schrieb Li Li:


I have roughly read the codes of RAMDirectory. it use a list of 1024
byte arrays and many overheads.
But as far as I know, using MMapDirectory, I can't prevent the page
faults. OS will swap less frequent pages out. Even if I allocate
enough memory for JVM, I can guarantee all the files in the directory
are in memory. am I understanding right? if it is, then some less
frequent queries will be slow.  How can I let them always in memory?

On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.comwrote:


Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.comwrote:


hi all
   I want to use lucene 3.6 providing searching service. my data is
not very large, raw data is less that 1GB and I want to use load all
indexes into memory. also I need save all indexes into disk
persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
indexes. It also has bad concurrency on
  multithreaded environments.
It is recommended to materialize large indexes on disk and use
MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
the operating system, so copying data to
  Java heap space is not useful.

should I use MMapDirectory? it seems another contrib instantiated.
anyone test it with RAMDirectory?





--
Lance Norskog
goks...@gmail.com







Re: what's better for in memory searching?

2012-06-11 Thread Li Li
yes, I need average query time less than 10 ms. The faster the better.
I have enough memory for lucene because I know there are not too much
data. there are not many modifications. every day there are about
hundreds of document update. if indexes are not in physical memory,
then IO operations will cost a few ms.
btw, the full gc may also add uncertainty, So I need optimize it as
much as possible.
On Mon, Jun 11, 2012 at 5:27 PM, Michael Kuhlmann k...@solarier.de wrote:
 You cannot guarantee this when you're running out of RAM. You'd have a
 problem then anyway.

 Why are you caring that much? Did you yet have performance issues? 1GB
 should load really fast, and both auto warming and OS cache should help a
 lot as well. With such an index, you usually don't need to fine tune
 performance that much.

 Did you think about using a SSD? Since you want to persist your index,
 you'll need to live with disk IO anyway.

 Greetings,
 Kuli

 Am 11.06.2012 11:20, schrieb Li Li:

 I am sorry. I make a mistake. even use RAMDirectory, I can not
 guarantee they are not swapped out.

 On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmannk...@solarier.de
  wrote:

 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com
  wrote:


 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com    wrote:


 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?





 --
 Lance Norskog
 goks...@gmail.com






Re: what's better for in memory searching?

2012-06-11 Thread Li Li
I found this. 
http://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux
it can provide  fine grained control of swapping

On Mon, Jun 11, 2012 at 4:45 PM, Michael Kuhlmann k...@solarier.de wrote:
 Set the swapiness to 0 to avoid memory pages being swapped to disk too
 early.

 http://en.wikipedia.org/wiki/Swappiness

 -Kuli

 Am 11.06.2012 10:38, schrieb Li Li:

 I have roughly read the codes of RAMDirectory. it use a list of 1024
 byte arrays and many overheads.
 But as far as I know, using MMapDirectory, I can't prevent the page
 faults. OS will swap less frequent pages out. Even if I allocate
 enough memory for JVM, I can guarantee all the files in the directory
 are in memory. am I understanding right? if it is, then some less
 frequent queries will be slow.  How can I let them always in memory?

 On Fri, Jun 8, 2012 at 5:53 PM, Lance Norskoggoks...@gmail.com  wrote:

 Yes, use MMapDirectory. It is faster and uses memory more efficiently
 than RAMDirectory. This sounds wrong, but it is true. With
 RAMDirectory, Java has to work harder doing garbage collection.

 On Fri, Jun 8, 2012 at 1:30 AM, Li Lifancye...@gmail.com  wrote:

 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?




 --
 Lance Norskog
 goks...@gmail.com




Re: what's better for in memory searching?

2012-06-11 Thread Paul Libbrecht

Le 11 juin 2012 à 11:16, Li Li a écrit :

 do you mean software RAM disk?

Right. OS level.

 using RAM to simulate disk?

Yes.
That generally makes a disk which is boost fast in reading and writing.

 How to deal with Persistence?

Synchronization (slaving?).

paul



Re: what's better for in memory searching?

2012-06-11 Thread Toke Eskildsen
On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
 yes, I need average query time less than 10 ms. The faster the better.
 I have enough memory for lucene because I know there are not too much
 data. there are not many modifications. every day there are about
 hundreds of document update. if indexes are not in physical memory,
 then IO operations will cost a few ms.

I'm with Michael on this one: It seems that you're doing a premature
optimization. Guessing that your final index will be  5GB in size with
1 million documents (give or take 900.000:-), relatively simple queries
and so on, an average response time of 10 ms should be attainable even
on spinning drives. One hundred document updates per day are not many,
so again I would not expect problems.

As is often the case on this mailing list, the advice is try it. Using
a normal on-disk index and doing some warm up is the easy solution to
implement and nearly all of your work on this will be usable for a
RAM-based solution, if you are not satisfied with the speed. Or you
could buy a small  cheap SSD and have no more worries...

Regards,
Toke Eskildsen



Re: what's better for in memory searching?

2012-06-11 Thread Mikhail Khludnev
Point about premature optimization makes sense for me. However some time
ago I've bookmarked potentially useful approach
http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.

On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen t...@statsbiblioteket.dkwrote:

 On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
  yes, I need average query time less than 10 ms. The faster the better.
  I have enough memory for lucene because I know there are not too much
  data. there are not many modifications. every day there are about
  hundreds of document update. if indexes are not in physical memory,
  then IO operations will cost a few ms.

 I'm with Michael on this one: It seems that you're doing a premature
 optimization. Guessing that your final index will be  5GB in size with
 1 million documents (give or take 900.000:-), relatively simple queries
 and so on, an average response time of 10 ms should be attainable even
 on spinning drives. One hundred document updates per day are not many,
 so again I would not expect problems.

 As is often the case on this mailing list, the advice is try it. Using
 a normal on-disk index and doing some warm up is the easy solution to
 implement and nearly all of your work on this will be usable for a
 RAM-based solution, if you are not satisfied with the speed. Or you
 could buy a small  cheap SSD and have no more worries...

 Regards,
 Toke Eskildsen




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: what's better for in memory searching?

2012-06-11 Thread Li Li
is this method equivalent to set vm.swappiness which is global?
or it can set the swappiness for jvm process?

On Tue, Jun 12, 2012 at 5:11 AM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
 Point about premature optimization makes sense for me. However some time
 ago I've bookmarked potentially useful approach
 http://lucene.472066.n3.nabble.com/High-response-time-after-being-idle-tp3616599p3617604.html.

 On Mon, Jun 11, 2012 at 3:02 PM, Toke Eskildsen 
 t...@statsbiblioteket.dkwrote:

 On Mon, 2012-06-11 at 11:38 +0200, Li Li wrote:
  yes, I need average query time less than 10 ms. The faster the better.
  I have enough memory for lucene because I know there are not too much
  data. there are not many modifications. every day there are about
  hundreds of document update. if indexes are not in physical memory,
  then IO operations will cost a few ms.

 I'm with Michael on this one: It seems that you're doing a premature
 optimization. Guessing that your final index will be  5GB in size with
 1 million documents (give or take 900.000:-), relatively simple queries
 and so on, an average response time of 10 ms should be attainable even
 on spinning drives. One hundred document updates per day are not many,
 so again I would not expect problems.

 As is often the case on this mailing list, the advice is try it. Using
 a normal on-disk index and doing some warm up is the easy solution to
 implement and nearly all of your work on this will be usable for a
 RAM-based solution, if you are not satisfied with the speed. Or you
 could buy a small  cheap SSD and have no more worries...

 Regards,
 Toke Eskildsen




 --
 Sincerely yours
 Mikhail Khludnev
 Tech Lead
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com


Re: what's better for in memory searching?

2012-06-08 Thread Lance Norskog
Yes, use MMapDirectory. It is faster and uses memory more efficiently
than RAMDirectory. This sounds wrong, but it is true. With
RAMDirectory, Java has to work harder doing garbage collection.

On Fri, Jun 8, 2012 at 1:30 AM, Li Li fancye...@gmail.com wrote:
 hi all
   I want to use lucene 3.6 providing searching service. my data is
 not very large, raw data is less that 1GB and I want to use load all
 indexes into memory. also I need save all indexes into disk
 persistently.
   I originally want to use RAMDirectory. But when I read its javadoc.

   Warning: This class is not intended to work with huge indexes.
 Everything beyond several hundred megabytes
  will waste resources (GC cycles), because it uses an internal buffer
 size of 1024 bytes, producing millions of byte
  [1024] arrays. This class is optimized for small memory-resident
 indexes. It also has bad concurrency on
  multithreaded environments.
 It is recommended to materialize large indexes on disk and use
 MMapDirectory, which is a high-performance
  directory implementation working directly on the file system cache of
 the operating system, so copying data to
  Java heap space is not useful.

    should I use MMapDirectory? it seems another contrib instantiated.
 anyone test it with RAMDirectory?



-- 
Lance Norskog
goks...@gmail.com