Re: Lucene Performance and usage alternatives
On Aug 5, 2008, at 2:29 PM, ezer wrote: Thanks Stefan and Grant. Yes solr seems very intresting i tried once, i am seeing now the part of the php client you mentioned. What hapens if rhater than starting a server that opens a port to listen to requests, i call from php every time i need to search using for example exec(theSearchingProgram, $arrayResult). That won't perform. The main cost of searching is loading up the index and you would have to do that every time. By now is the solution i am testing, but i am not sure if it is an elegant way of use this. I would like to know the pros and cons from each solution, in the first instance i think that opening a port has a security issue behind. What kind of environment are you in that you can't secure the port? I'm not a security expert, but starting points would be to allow only from a given IP, use SSL, put behind a firewall, etc. Treat Solr just as you treat a database in the typical tiered architecture. -Grant
Re: Lucene Performance and usage alternatives
Thanks Stefan and Grant. Yes solr seems very intresting i tried once, i am seeing now the part of the php client you mentioned. What hapens if rhater than starting a server that opens a port to listen to requests, i call from php every time i need to search using for example exec(theSearchingProgram, $arrayResult). By now is the solution i am testing, but i am not sure if it is an elegant way of use this. I would like to know the pros and cons from each solution, in the first instance i think that opening a port has a security issue behind. Grant Ingersoll-6 wrote: > > My point is more that you don't necessarily need to go looking for > variants. I've seen Lucene Java scale to millions no problem. I > talked w/ a guy using Solr this past week who had ~80 million records > in a single 80 gb index on one machine. > > If I had a PHP front end, I would most likely start with Solr and it's > PHP client. No sense in reinventing the wheel, IMO. > > On Aug 5, 2008, at 11:15 AM, ezer wrote: > >> >> Yes i saw that.. it talks about performance, but not about the >> variants i >> mentioned before. >> Actually i tested indexing a database of about 200.000 registers. As i >> mentioned it works fine with response of less than a second. But this >> database can grow to millions of registers, and not sure if i am >> choosing >> the best architecture for that step to allow simultaneous accesing. >> >> Thanks for the help >> >> >> Grant Ingersoll-6 wrote: >>> >>> Before we go solving a problem that isn't necessarily there, can you >>> share a bit about what sizes you are at currently? Num docs, index >>> size, query rate? >>> >>> Have you looked at >>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>> ? >>> >>> -Grant >>> >>> On Aug 5, 2008, at 10:21 AM, ezer wrote: >>> I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? >>> >>> You shouldn't be instantiating a Reader/Searcher for each query. See >>> the link above. >>> -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html >> Sent from the Lucene - General mailing list archive at Nabble.com. >> > > > > > > > > -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
My point is more that you don't necessarily need to go looking for variants. I've seen Lucene Java scale to millions no problem. I talked w/ a guy using Solr this past week who had ~80 million records in a single 80 gb index on one machine. If I had a PHP front end, I would most likely start with Solr and it's PHP client. No sense in reinventing the wheel, IMO. On Aug 5, 2008, at 11:15 AM, ezer wrote: Yes i saw that.. it talks about performance, but not about the variants i mentioned before. Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing. Thanks for the help Grant Ingersoll-6 wrote: Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
An alternative is always to distribute the index to a set of servers. If you need to scale I guess this is the only long term perspective. You can do your own home grown lucene distribution or look into existing one. I'm currently working on katta (http://katta.wiki.sourceforge.net/) - there is no release yet but we are in the QA and test cycles. But there are other as well - solar for example provides distribution as well. Stefan On Aug 5, 2008, at 7:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com. ~~~ 101tec Inc. Menlo Park, California, USA http://www.101tec.com
Re: Lucene Performance and usage alternatives
Grant, wich other information can i provide in order to clarify my questions? ezer wrote: > > Yes i saw that.. it talks about performance, but not about the variants i > mentioned before. > Actually i tested indexing a database of about 200.000 registers. As i > mentioned it works fine with response of less than a second. But this > database can grow to millions of registers, and not sure if i am choosing > the best architecture for that step to allow simultaneous accesing. > > Thanks for the help > > > Grant Ingersoll-6 wrote: >> >> Before we go solving a problem that isn't necessarily there, can you >> share a bit about what sizes you are at currently? Num docs, index >> size, query rate? >> >> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance >>? >> >> -Grant >> >> On Aug 5, 2008, at 10:21 AM, ezer wrote: >> >>> >>> I just made a program using the java api of Lucene. Its is working >>> fine for >>> my actually index size. But i am worried about performance with an >>> biger >>> index and simultaneous users access. >>> >>> 1) I am worried with the fact of having to make the program in java. I >>> searched for alternative like the C Port, but i saw that the version >>> used >>> its a little old an no much people seem to use that. >>> >>> 2) I also thinking in compiling the code with cgj to generate native >>> code >>> and not use the jvm. Anybody tried it ? Can be an advantage that could >>> aproximate to the performance of a C program ? >>> >>> 3) I wont use an application server, i will call the program >>> directly from a >>> php page, is there any architecture model suggested for doing that? >>> I mean >>> for preview many users accessing to the program. The fact of >>> initiating one >>> isntance each time someone do a query and opening the index should not >>> degrade the performance? >> >> You shouldn't be instantiating a Reader/Searcher for each query. See >> the link above. >> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html >>> Sent from the Lucene - General mailing list archive at Nabble.com. >>> >> >> >> >> > > -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18834310.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Yes i saw that.. it talks about performance, but not about the variants i mentioned before. Actually i tested indexing a database of about 200.000 registers. As i mentioned it works fine with response of less than a second. But this database can grow to millions of registers, and not sure if i am choosing the best architecture for that step to allow simultaneous accesing. Thanks for the help Grant Ingersoll-6 wrote: > > Before we go solving a problem that isn't necessarily there, can you > share a bit about what sizes you are at currently? Num docs, index > size, query rate? > > Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance >? > > -Grant > > On Aug 5, 2008, at 10:21 AM, ezer wrote: > >> >> I just made a program using the java api of Lucene. Its is working >> fine for >> my actually index size. But i am worried about performance with an >> biger >> index and simultaneous users access. >> >> 1) I am worried with the fact of having to make the program in java. I >> searched for alternative like the C Port, but i saw that the version >> used >> its a little old an no much people seem to use that. >> >> 2) I also thinking in compiling the code with cgj to generate native >> code >> and not use the jvm. Anybody tried it ? Can be an advantage that could >> aproximate to the performance of a C program ? >> >> 3) I wont use an application server, i will call the program >> directly from a >> php page, is there any architecture model suggested for doing that? >> I mean >> for preview many users accessing to the program. The fact of >> initiating one >> isntance each time someone do a query and opening the index should not >> degrade the performance? > > You shouldn't be instantiating a Reader/Searcher for each query. See > the link above. > >> >> -- >> View this message in context: >> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html >> Sent from the Lucene - General mailing list archive at Nabble.com. >> > > > > -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html Sent from the Lucene - General mailing list archive at Nabble.com.
Re: Lucene Performance and usage alternatives
Before we go solving a problem that isn't necessarily there, can you share a bit about what sizes you are at currently? Num docs, index size, query rate? Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance ? -Grant On Aug 5, 2008, at 10:21 AM, ezer wrote: I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? You shouldn't be instantiating a Reader/Searcher for each query. See the link above. -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com.
Lucene Performance and usage alternatives
I just made a program using the java api of Lucene. Its is working fine for my actually index size. But i am worried about performance with an biger index and simultaneous users access. 1) I am worried with the fact of having to make the program in java. I searched for alternative like the C Port, but i saw that the version used its a little old an no much people seem to use that. 2) I also thinking in compiling the code with cgj to generate native code and not use the jvm. Anybody tried it ? Can be an advantage that could aproximate to the performance of a C program ? 3) I wont use an application server, i will call the program directly from a php page, is there any architecture model suggested for doing that? I mean for preview many users accessing to the program. The fact of initiating one isntance each time someone do a query and opening the index should not degrade the performance? -- View this message in context: http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html Sent from the Lucene - General mailing list archive at Nabble.com.
[IMPORTANT] Fieldable and LUCENE-1349
Per https://issues.apache.org/jira/browse/LUCENE-1349, we have made an exception to Lucene's backward compatibility rules and marked Fieldable as "changeable", namely meaning we will allow, on a case-by- case basis, changes to the interface, meaning anyone who implements there own Fieldable (which we suspect is very, very few people) may have to make code changes when upgrading within a minor version. More than likely, Fieldable will be deprecated and changed for 3.0 (when we get there.) This is noted prominently in CHANGES.txt and on the interface. Sorry for the inconvenience. Thanks, Grant