Re: Lucene Performance and usage alternatives

2008-08-05 Thread Grant Ingersoll


On Aug 5, 2008, at 2:29 PM, ezer wrote:



Thanks Stefan and Grant.
Yes solr seems very intresting i tried once, i am seeing now the  
part of the

php client you mentioned.
What hapens if rhater than starting a server that opens a port to  
listen to
requests, i call from php every time i need to search using for  
example

exec(theSearchingProgram, $arrayResult).


That won't perform.  The main cost of searching is loading up the  
index and you would have to do that every time.



By now is the solution i am
testing, but i am not sure if it is an elegant way of use this. I  
would like
to know the pros and cons from each solution, in the first instance  
i think

that opening a port has a  security issue behind.


What kind of environment are you in that you can't secure the port?  
I'm not a security expert, but starting points would be to allow only  
from a given IP, use SSL, put behind a firewall, etc.   Treat Solr  
just as you treat a database in the typical tiered architecture.


-Grant


Re: Lucene Performance and usage alternatives

2008-08-05 Thread ezer

Thanks Stefan and Grant.
Yes solr seems very intresting i tried once, i am seeing now the part of the
php client you mentioned.
What hapens if rhater than starting a server that opens a port to listen to
requests, i call from php every time i need to search using for example
exec(theSearchingProgram, $arrayResult). By now is the solution i am
testing, but i am not sure if it is an elegant way of use this. I would like
to know the pros and cons from each solution, in the first instance i think
that opening a port has a  security issue behind.



Grant Ingersoll-6 wrote:
> 
> My point is more that you don't necessarily need to go looking for  
> variants.  I've seen Lucene Java scale to millions no problem.  I  
> talked w/ a guy using Solr this past week who had ~80 million records  
> in a single 80 gb index on one machine.
> 
> If I had a PHP front end, I would most likely start with Solr and it's  
> PHP client.  No sense in reinventing the wheel, IMO.
> 
> On Aug 5, 2008, at 11:15 AM, ezer wrote:
> 
>>
>> Yes i saw that.. it talks about performance, but not about the  
>> variants i
>> mentioned before.
>> Actually i tested indexing a database of about 200.000 registers. As i
>> mentioned it works fine with response of less than a second. But this
>> database can grow to millions of registers, and not sure if i am  
>> choosing
>> the best architecture for that step to allow simultaneous accesing.
>>
>> Thanks for the help
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> Before we go solving a problem that isn't necessarily there, can you
>>> share a bit about what sizes you are at currently?  Num docs, index
>>> size, query rate?
>>>
>>> Have you looked at
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>   ?
>>>
>>> -Grant
>>>
>>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>>>

 I just made a program using the java api of Lucene. Its is working
 fine for
 my actually index size. But i am worried about performance with an
 biger
 index and simultaneous users access.

 1) I am worried with the fact of having to make the program in  
 java. I
 searched for alternative like the C Port, but i saw that the version
 used
 its a little old an no much people seem to use that.

 2) I also thinking in compiling the code with cgj to generate native
 code
 and not use the jvm. Anybody tried it ? Can be an advantage that  
 could
 aproximate to the performance of a C program ?

 3) I wont use an application server, i will call the program
 directly from a
 php page, is there any architecture model suggested for doing that?
 I mean
 for preview many users accessing to the program. The fact of
 initiating one
 isntance each time someone do a query and opening the index should  
 not
 degrade the performance?
>>>
>>> You shouldn't be instantiating a Reader/Searcher for each query.  See
>>> the link above.
>>>

 -- 
 View this message in context:
 http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
 Sent from the Lucene - General mailing list archive at Nabble.com.

>>>
>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
> 
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18837195.html
Sent from the Lucene - General mailing list archive at Nabble.com.



Re: Lucene Performance and usage alternatives

2008-08-05 Thread Grant Ingersoll
My point is more that you don't necessarily need to go looking for  
variants.  I've seen Lucene Java scale to millions no problem.  I  
talked w/ a guy using Solr this past week who had ~80 million records  
in a single 80 gb index on one machine.


If I had a PHP front end, I would most likely start with Solr and it's  
PHP client.  No sense in reinventing the wheel, IMO.


On Aug 5, 2008, at 11:15 AM, ezer wrote:



Yes i saw that.. it talks about performance, but not about the  
variants i

mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i
mentioned it works fine with response of less than a second. But this
database can grow to millions of registers, and not sure if i am  
choosing

the best architecture for that step to allow simultaneous accesing.

Thanks for the help


Grant Ingersoll-6 wrote:


Before we go solving a problem that isn't necessarily there, can you
share a bit about what sizes you are at currently?  Num docs, index
size, query rate?

Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance
  ?

-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:



I just made a program using the java api of Lucene. Its is working
fine for
my actually index size. But i am worried about performance with an
biger
index and simultaneous users access.

1) I am worried with the fact of having to make the program in  
java. I

searched for alternative like the C Port, but i saw that the version
used
its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native
code
and not use the jvm. Anybody tried it ? Can be an advantage that  
could

aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program
directly from a
php page, is there any architecture model suggested for doing that?
I mean
for preview many users accessing to the program. The fact of
initiating one
isntance each time someone do a query and opening the index should  
not

degrade the performance?


You shouldn't be instantiating a Reader/Searcher for each query.  See
the link above.



--
View this message in context:
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.








--
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
Sent from the Lucene - General mailing list archive at Nabble.com.










Re: Lucene Performance and usage alternatives

2008-08-05 Thread Stefan Groschupf
An alternative is always to distribute the index to a set of servers.  
If you need to scale I guess this is the only long term perspective.
You can do your own home grown lucene distribution or look into  
existing one.
I'm currently working on katta (http://katta.wiki.sourceforge.net/) -  
there is no release yet but we are in the QA and test cycles.
But there are other as well - solar for example provides distribution  
as well.


Stefan


On Aug 5, 2008, at 7:21 AM, ezer wrote:



I just made a program using the java api of Lucene. Its is working  
fine for
my actually index size. But i am worried about performance with an  
biger

index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version  
used

its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native  
code

and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program  
directly from a
php page, is there any architecture model suggested for doing that?  
I mean
for preview many users accessing to the program. The fact of  
initiating one

isntance each time someone do a query and opening the index should not
degrade the performance?
--
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.




~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com




Re: Lucene Performance and usage alternatives

2008-08-05 Thread ezer

Grant, wich other information can i provide in order to clarify my questions?



ezer wrote:
> 
> Yes i saw that.. it talks about performance, but not about the variants i
> mentioned before.
> Actually i tested indexing a database of about 200.000 registers. As i
> mentioned it works fine with response of less than a second. But this
> database can grow to millions of registers, and not sure if i am choosing
> the best architecture for that step to allow simultaneous accesing.
> 
> Thanks for the help
> 
> 
> Grant Ingersoll-6 wrote:
>> 
>> Before we go solving a problem that isn't necessarily there, can you  
>> share a bit about what sizes you are at currently?  Num docs, index  
>> size, query rate?
>> 
>> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
>>?
>> 
>> -Grant
>> 
>> On Aug 5, 2008, at 10:21 AM, ezer wrote:
>> 
>>>
>>> I just made a program using the java api of Lucene. Its is working  
>>> fine for
>>> my actually index size. But i am worried about performance with an  
>>> biger
>>> index and simultaneous users access.
>>>
>>> 1) I am worried with the fact of having to make the program in java. I
>>> searched for alternative like the C Port, but i saw that the version  
>>> used
>>> its a little old an no much people seem to use that.
>>>
>>> 2) I also thinking in compiling the code with cgj to generate native  
>>> code
>>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>>> aproximate to the performance of a C program ?
>>>
>>> 3) I wont use an application server, i will call the program  
>>> directly from a
>>> php page, is there any architecture model suggested for doing that?  
>>> I mean
>>> for preview many users accessing to the program. The fact of  
>>> initiating one
>>> isntance each time someone do a query and opening the index should not
>>> degrade the performance?
>> 
>> You shouldn't be instantiating a Reader/Searcher for each query.  See  
>> the link above.
>> 
>>>
>>> -- 
>>> View this message in context:
>>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>>
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18834310.html
Sent from the Lucene - General mailing list archive at Nabble.com.



Re: Lucene Performance and usage alternatives

2008-08-05 Thread ezer

Yes i saw that.. it talks about performance, but not about the variants i
mentioned before.
Actually i tested indexing a database of about 200.000 registers. As i
mentioned it works fine with response of less than a second. But this
database can grow to millions of registers, and not sure if i am choosing
the best architecture for that step to allow simultaneous accesing.

Thanks for the help


Grant Ingersoll-6 wrote:
> 
> Before we go solving a problem that isn't necessarily there, can you  
> share a bit about what sizes you are at currently?  Num docs, index  
> size, query rate?
> 
> Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
>?
> 
> -Grant
> 
> On Aug 5, 2008, at 10:21 AM, ezer wrote:
> 
>>
>> I just made a program using the java api of Lucene. Its is working  
>> fine for
>> my actually index size. But i am worried about performance with an  
>> biger
>> index and simultaneous users access.
>>
>> 1) I am worried with the fact of having to make the program in java. I
>> searched for alternative like the C Port, but i saw that the version  
>> used
>> its a little old an no much people seem to use that.
>>
>> 2) I also thinking in compiling the code with cgj to generate native  
>> code
>> and not use the jvm. Anybody tried it ? Can be an advantage that could
>> aproximate to the performance of a C program ?
>>
>> 3) I wont use an application server, i will call the program  
>> directly from a
>> php page, is there any architecture model suggested for doing that?  
>> I mean
>> for preview many users accessing to the program. The fact of  
>> initiating one
>> isntance each time someone do a query and opening the index should not
>> degrade the performance?
> 
> You shouldn't be instantiating a Reader/Searcher for each query.  See  
> the link above.
> 
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
>> Sent from the Lucene - General mailing list archive at Nabble.com.
>>
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18833292.html
Sent from the Lucene - General mailing list archive at Nabble.com.



Re: Lucene Performance and usage alternatives

2008-08-05 Thread Grant Ingersoll
Before we go solving a problem that isn't necessarily there, can you  
share a bit about what sizes you are at currently?  Num docs, index  
size, query rate?


Have you looked at http://wiki.apache.org/lucene-java/BasicsOfPerformance 
  ?


-Grant

On Aug 5, 2008, at 10:21 AM, ezer wrote:



I just made a program using the java api of Lucene. Its is working  
fine for
my actually index size. But i am worried about performance with an  
biger

index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version  
used

its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native  
code

and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program  
directly from a
php page, is there any architecture model suggested for doing that?  
I mean
for preview many users accessing to the program. The fact of  
initiating one

isntance each time someone do a query and opening the index should not
degrade the performance?


You shouldn't be instantiating a Reader/Searcher for each query.  See  
the link above.




--
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.






Lucene Performance and usage alternatives

2008-08-05 Thread ezer

I just made a program using the java api of Lucene. Its is working fine for
my actually index size. But i am worried about performance with an biger
index and simultaneous users access.

1) I am worried with the fact of having to make the program in java. I
searched for alternative like the C Port, but i saw that the version used
its a little old an no much people seem to use that.

2) I also thinking in compiling the code with cgj to generate native code
and not use the jvm. Anybody tried it ? Can be an advantage that could
aproximate to the performance of a C program ?

3) I wont use an application server, i will call the program directly from a
php page, is there any architecture model suggested for doing that? I mean
for preview many users accessing to the program. The fact of initiating one
isntance each time someone do a query and opening the index should not
degrade the performance?
-- 
View this message in context: 
http://www.nabble.com/Lucene-Performance-and-usage-alternatives-tp18832162p18832162.html
Sent from the Lucene - General mailing list archive at Nabble.com.



[IMPORTANT] Fieldable and LUCENE-1349

2008-08-05 Thread Grant Ingersoll
Per https://issues.apache.org/jira/browse/LUCENE-1349, we have made an  
exception to Lucene's backward compatibility rules and marked  
Fieldable as "changeable", namely meaning we will allow, on a case-by- 
case basis, changes to the interface, meaning anyone who implements  
there own Fieldable (which we suspect is very, very few people) may  
have to make code changes when upgrading within a minor version.  More  
than likely, Fieldable will be deprecated and changed for 3.0 (when we  
get there.)


This is noted prominently in CHANGES.txt and on the interface.  Sorry  
for the inconvenience.


Thanks,
Grant