Lucene Performance and usage alternatives
I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Yes i saw that.. it talks about performance, but not about the variants i mentioned before. Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing. Thanks for the help Grant Ingersoll-6 wrote: Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Grant, wich other information can i provide in order to clarify my questions? ezer wrote: Yes i saw that.. it talks about performance, but not about the variants i mentioned before. Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing. Thanks for the help Grant Ingersoll-6 wrote: Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18834310.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
An alternative is always to distribute the index to a set of servers. If you need to scale I guess this is the only long term perspective. You can do your own home grown lucene distribution or look into existing one. I'm currently working on katta (http://katta.wiki.sourceforge.net/) - there is no release yet but we are in the QA and test cycles. But there are other as well - solar for example provides distribution as well. Stefan On Aug 5, 2008, at 7:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. ~~~ 101tec Inc. Menlo Park, California, USA http://www.101tec.com
Re: Lucene Performance and usage alternatives
My point is more that you don't necessarily need to go looking for variants. I've seen Lucene Java scale to millions no problem. I talked w/ a guy using Solr this past week who had ~80 million records in a single 80 gb index on one machine. If I had a PHP front end, I would most likely start with Solr and it's PHP client. No sense in reinventing the wheel, IMO. On Aug 5, 2008, at 11:15 AM, ezer wrote: Yes i saw that.. it talks about performance, but not about the variants i mentioned before. Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing. Thanks for the help Grant Ingersoll-6 wrote: Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html Sent from the Lucene - General mailing list archive at Nabble.com.