Re: No wildcards with solr.ASCIIFoldingFilterFactory?

2009-06-26 Thread vladimirneu

Thank you Mark!

Let me see whether I understand right you idea.

I have to write a Plugin like LuceneQParserPlugin which uses not the
SolrQueryParser but a MySolrQueryParser which is based on SolrQueryParser
und uses AnalyzingQueryParser methods.

I think this is too difficult for me because I am not a programmer. Maybe I
get the code together but I have no experiences in debugging of java
applications.

Maybe there is another method to solve this problem without coding? Or could
I ask the Solr community to help us to write a Plugin? Maybe there are some
others people who are interested in such kind of an analyzed wildcard
search?

Vladimir


-- 
View this message in context: 
http://www.nabble.com/No-wildcards-with-solr.ASCIIFoldingFilterFactory--tp24162104p24216154.html
Sent from the Solr - User mailing list archive at Nabble.com.



facets: case and accent insensitive sort

2009-06-26 Thread Sébastien Lamy

Hi!

When I ask solr for facets, with the parameter facet.sort=index, it 
gives me the facets sorted alphabetically, but case and accent sensitive.


I found no way to have the facets returned with the original case and 
accents, and sorted alphabetically, with no sensibility to case and accents.


Is there anything I can do to achieve this goal, without having to 
retrieve all facets and sort it myself? (We have fields with many, many 
facets, and doing so impacts performance a lot).


Sebastien.


Re: facets: case and accent insensitive sort

2009-06-26 Thread Shalin Shekhar Mangar
On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy lamys...@free.fr wrote:

 Hi!

 When I ask solr for facets, with the parameter facet.sort=index, it gives
 me the facets sorted alphabetically, but case and accent sensitive.

 I found no way to have the facets returned with the original case and
 accents, and sorted alphabetically, with no sensibility to case and accents.

 Is there anything I can do to achieve this goal, without having to retrieve
 all facets and sort it myself? (We have fields with many, many facets, and
 doing so impacts performance a lot).


Faceting is done on indexed values so if your indexed values are with
original case and accents, they will be sorted accordingly. You could use a
copyField to store these values into a string type and facet on that.

-- 
Regards,
Shalin Shekhar Mangar.


How much data can Solr handle?

2009-06-26 Thread Daniel Löfquist
We're looking to build a search solution that can contain as many as 10 million
different items and I was wondering if Solr could handle that kind of data 
amount or not?

Has anybody done any testing or published any kind of results for a 
Solr-installation
working on huge amounts of data like this?

//Daniel

-- 
Daniel Löfquist
Software Engineer

CDON.COM
Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden

Office: +46 40 601 61 00
Direct: +46 40 601 61 16
Fax: +46 40 601 61 20
E-mail: daniel.lofqu...@it.cdon.com mailto:daniel.lofqu...@it.cdon.com

CDON.COM http://www.cdon.com/

Confidentiality
Information contained in this e-mail is intended for the use of the
addressee only, and is confidential. Any dissemination, distribution,
copying or use of this communication without prior permission of
the addressee is strictly prohibited. If you are not the intended
addressee you must delete this e-mail and its attachments.


Re: How much data can Solr handle?

2009-06-26 Thread Mats Lindh
On Fri, Jun 26, 2009 at 1:27 PM, Daniel
Löfquistdaniel.lofqu...@it.cdon.com wrote:
 We're looking to build a search solution that can contain as many as 10 
 million
 different items and I was wondering if Solr could handle that kind of data 
 amount or not?

10m documents is a quite common load. We're currently running two
installations with about 4m documents in one and 6m documents
(articles) in the other. Both run from single machines and with sub
0.1s search times.

 Has anybody done any testing or published any kind of results for a 
 Solr-installation
 working on huge amounts of data like this?

There's a page dedicated on the wiki to listing known companies and
installations based on Solr:

http://wiki.apache.org/solr/PublicServers

Hopefully that'll give you an idea. It shouldn't be to hard to just
try it out (i'm guessing you could do most of the setup in a day or
two).

Hope that helps!

--mats


Re: facets: case and accent insensitive sort

2009-06-26 Thread Sébastien Lamy

Shalin Shekhar Mangar a écrit :

On Fri, Jun 26, 2009 at 4:06 PM, Sébastien Lamy lamys...@free.fr wrote:

  

Hi!

When I ask solr for facets, with the parameter facet.sort=index, it gives
me the facets sorted alphabetically, but case and accent sensitive.

I found no way to have the facets returned with the original case and
accents, and sorted alphabetically, with no sensibility to case and accents.

Is there anything I can do to achieve this goal, without having to retrieve
all facets and sort it myself? (We have fields with many, many facets, and
doing so impacts performance a lot).




Faceting is done on indexed values so if your indexed values are with
original case and accents, they will be sorted accordingly. You could use a
copyField to store these values into a string type and facet on that.

  
If I use a copyField to store into a string type, and facet on that, my 
problem remains:
The facets are sorted case and accent sensitive. And I want an 
*insensitive* sort.
If I use a copyField to store into a type with no accents and case (e.g 
alphaOnlySort), then solr return me facet values with no accents and no 
case. And I want the facet values returned by solr to *have accents and 
case*.




Re: facets: case and accent insensitive sort

2009-06-26 Thread Shalin Shekhar Mangar
On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote:



 If I use a copyField to store into a string type, and facet on that, my
 problem remains:
 The facets are sorted case and accent sensitive. And I want an
 *insensitive* sort.
 If I use a copyField to store into a type with no accents and case (e.g
 alphaOnlySort), then solr return me facet values with no accents and no
 case. And I want the facet values returned by solr to *have accents and
 case*.


Ah, of course you are right. There is no way to do this right now except at
the client side.

-- 
Regards,
Shalin Shekhar Mangar.


Re: facets: case and accent insensitive sort

2009-06-26 Thread Sébastien Lamy

Shalin Shekhar Mangar a écrit :

On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote:

  

If I use a copyField to store into a string type, and facet on that, my
problem remains:
The facets are sorted case and accent sensitive. And I want an
*insensitive* sort.
If I use a copyField to store into a type with no accents and case (e.g
alphaOnlySort), then solr return me facet values with no accents and no
case. And I want the facet values returned by solr to *have accents and
case*.


Ah, of course you are right. There is no way to do this right now except at
the client side.
  

Thank you for your response.
Would it be easy to modify Solr to behave like I want. Where should I 
start to investigate?


Upgrade to solr 1.4

2009-06-26 Thread David Baker

Hi,

I need to upgrade from solr 1.3 to solr 1.4.  I was wondering if there 
is a particular revision of 1.4 that I should use that is considered 
very stable for a production environment?


Re: Upgrade to solr 1.4

2009-06-26 Thread Julian Davchev
David Baker wrote:
 Hi,

 I need to upgrade from solr 1.3 to solr 1.4.  I was wondering if there
 is a particular revision of 1.4 that I should use that is considered
 very stable for a production environment?
Well it it's not pronounced stable and given in download page I don't
think you can rely on being very stable for production environment.


Re: Upgrade to solr 1.4

2009-06-26 Thread Eric Pugh
Solr in general is fairly stable in trunk.  That isn't to say that a  
critical error can't get through, because that does happen, but the  
test suite is pretty comprehensive.   With Solr 1.4 getting closer and  
closer, I think you'll see the pace of change dropping off.


I think it's one of those things that you have to judge for  
yourself..  Are the features/fixes/enhancements in 1.4 trunk worth a  
potential risk?  I assume that as part of deployment into production  
you have some sort of defined criteria that says Solr can be added?
Testing of server capacity/performance etc?  Those might tell you if  
there are any issues with Solr 1.4 trunk that would need to delay your  
deployment.


Eric


On Jun 26, 2009, at 10:58 AM, Julian Davchev wrote:


David Baker wrote:

Hi,

I need to upgrade from solr 1.3 to solr 1.4.  I was wondering if  
there

is a particular revision of 1.4 that I should use that is considered
very stable for a production environment?

Well it it's not pronounced stable and given in download page I don't
think you can rely on being very stable for production environment.


-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal






Re: Query Filter fq with OR operator

2009-06-26 Thread Yao Ge

I will like to submit a JIRA issue for this. Can anyone help me on where to
go?
-Yao


Otis Gospodnetic wrote:
 
 
 Brian,
 
 Opening a JIRA issue if it doesn't already exist is the best way.  If you
 can provide a patch, even better!
 
  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: brian519 bpear...@desire2learn.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 16, 2009 1:32:41 PM
 Subject: Re: Query Filter fq with OR operator
 
 
 This feature is very important to me .. should I post something on the
 dev
 forum?  Not sure what the proper protocol is for adding a feature to the
 roadmap
 
 Thanks,
 Brian.
 -- 
 View this message in context: 
 http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24059181.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24222170.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query Filter fq with OR operator

2009-06-26 Thread Otis Gospodnetic

Hello Yao,

A contribution would be great.  Here is information about how to contribute: 
http://wiki.apache.org/solr/HowToContribute


Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Yao Ge yao...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Friday, June 26, 2009 11:20:25 AM
 Subject: Re: Query Filter fq with OR operator
 
 
 I will like to submit a JIRA issue for this. Can anyone help me on where to
 go?
 -Yao
 
 
 Otis Gospodnetic wrote:
  
  
  Brian,
  
  Opening a JIRA issue if it doesn't already exist is the best way.  If you
  can provide a patch, even better!
  
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: brian519 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, June 16, 2009 1:32:41 PM
  Subject: Re: Query Filter fq with OR operator
  
  
  This feature is very important to me .. should I post something on the
  dev
  forum?  Not sure what the proper protocol is for adding a feature to the
  roadmap
  
  Thanks,
  Brian.
  -- 
  View this message in context: 
  
 http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24059181.html
  Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Query-Filter-fq-with-OR-operator-tp23895837p24222170.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query Filter fq with OR operator

2009-06-26 Thread Shalin Shekhar Mangar
On Fri, Jun 26, 2009 at 8:50 PM, Yao Ge yao...@gmail.com wrote:


 I will like to submit a JIRA issue for this. Can anyone help me on where to
 go?


An issue has been opened already. You may want to add a vote to the
following issue.

https://issues.apache.org/jira/browse/SOLR-1223

-- 
Regards,
Shalin Shekhar Mangar.


Re: How much data can Solr handle?

2009-06-26 Thread Otis Gospodnetic

Hi Daniel,

How much Solr can handle really depends on the hardware you run it on, the type 
of document you index in it, and the query rate and type.

10M doesn't sound like a large number even for an average server today (e.g. 4 
GB of RAM, 1-2 cores), web-page sized documents, and a query rate of a few 
dozen a second simple keyword, boolean, or phrase queries

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Daniel Löfquist daniel.lofqu...@it.cdon.com
 To: solr-user@lucene.apache.org
 Sent: Friday, June 26, 2009 7:27:45 AM
 Subject: How much data can Solr handle?
 
 We're looking to build a search solution that can contain as many as 10 
 million
 different items and I was wondering if Solr could handle that kind of data 
 amount or not?
 
 Has anybody done any testing or published any kind of results for a 
 Solr-installation
 working on huge amounts of data like this?
 
 //Daniel
 
 -- 
 Daniel Löfquist
 Software Engineer
 
 CDON.COM
 Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden
 
 Office: +46 40 601 61 00
 Direct: +46 40 601 61 16
 Fax: +46 40 601 61 20
 E-mail: daniel.lofqu...@it.cdon.com 
 
 CDON.COM 
 
 Confidentiality
 Information contained in this e-mail is intended for the use of the
 addressee only, and is confidential. Any dissemination, distribution,
 copying or use of this communication without prior permission of
 the addressee is strictly prohibited. If you are not the intended
 addressee you must delete this e-mail and its attachments.



Re: Upgrade to solr 1.4

2009-06-26 Thread Walter Underwood
Netflix is running a nightly build from May in production. We did our
normal QA on it, then ran it on one of our five servers for two weeks.
No problems. It is handling about 10% more traffic with 10% less CPU.

We deployed 1.4 to all our servers yesterday.

wunder

On 6/26/09 7:58 AM, Julian Davchev j...@drun.net wrote:

 David Baker wrote:
 Hi,
 
 I need to upgrade from solr 1.3 to solr 1.4.  I was wondering if there
 is a particular revision of 1.4 that I should use that is considered
 very stable for a production environment?
 Well it it's not pronounced stable and given in download page I don't
 think you can rely on being very stable for production environment.



Re: Upgrade to solr 1.4

2009-06-26 Thread Shalin Shekhar Mangar
On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood wunderw...@netflix.comwrote:

 Netflix is running a nightly build from May in production. We did our
 normal QA on it, then ran it on one of our five servers for two weeks.
 No problems. It is handling about 10% more traffic with 10% less CPU.


Wow, that is good news! Are you also using the java based replication?



 We deployed 1.4 to all our servers yesterday.


Can you tell us which revision you used?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Upgrade to solr 1.4

2009-06-26 Thread Walter Underwood
We are using the script replication. I have no interest in spending time
configuring and QA'ing a different method when the scripts work fine.

We are running the nightly from 2009-05-11.

wunder

On 6/26/09 8:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote:

 On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood
 wunderw...@netflix.comwrote:
 
 Netflix is running a nightly build from May in production. We did our
 normal QA on it, then ran it on one of our five servers for two weeks.
 No problems. It is handling about 10% more traffic with 10% less CPU.
 
 Wow, that is good news! Are you also using the java based replication?
 
 We deployed 1.4 to all our servers yesterday.
 
 Can you tell us which revision you used?



Re: Upgrade to solr 1.4

2009-06-26 Thread Jeff Newburn
We are using a trunk build from approximately the same time with little to
no issues including the new replication.
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Shalin Shekhar Mangar shalinman...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Fri, 26 Jun 2009 21:21:44 +0530
 To: solr-user@lucene.apache.org
 Subject: Re: Upgrade to solr 1.4
 
 On Fri, Jun 26, 2009 at 9:11 PM, Walter Underwood
 wunderw...@netflix.comwrote:
 
 Netflix is running a nightly build from May in production. We did our
 normal QA on it, then ran it on one of our five servers for two weeks.
 No problems. It is handling about 10% more traffic with 10% less CPU.
 
 
 Wow, that is good news! Are you also using the java based replication?
 
 
 
 We deployed 1.4 to all our servers yesterday.
 
 
 Can you tell us which revision you used?
 
 -- 
 Regards,
 Shalin Shekhar Mangar.



Error while trying to index

2009-06-26 Thread David Baker
I am trying to index a solr server from a nightly build.  I get the 
following error in my catalina.out:


26-Jun-2009 5:52:06 PM 
org.apache.solr.update.processor.LogUpdateProcessor 
finish 

INFO: {} 0 
4  

26-Jun-2009 5:52:06 PM org.apache.solr.common.SolrException 
log   

SEVERE: java.lang.NoSuchFieldError: 
log   

   at 
com.pjaol.search.solr.update.LocalUpdaterProcessor.processAdd(LocalUpdateProcessorFactory.java:138)

   at 
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)

   at 
org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)  

   at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)   

   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)  

   at 
org.apache.solr.core.SolrCore.execute(SolrCore.java:1292)  

   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)   

   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)  

   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)  

   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)  

   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)  

   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)  

   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)

   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) 

   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)   

   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

   at 
java.lang.Thread.run(Thread.java:619)   

Scaling out/up or a mix

2009-06-26 Thread Marcus Herou
Hi.

I currently have an index which is 16GB per machine (8 machines = 128GB)
(data is stored externally, not in index) and is growing like crazy (we are
indexing blogs which is crazy by nature) and have only allocated 2GB per
machine to the SOLR app since we are running some other stuff there in
parallell.

Each doc should be roughly the size of a blog post, no more than 20k.

We currently have about 90M documents and it is increasing rapidly so
getting into the G+ document range is not going to be too far away.

Now due to search performance I think I need to move these instances to
dedicated index/search machines (or index on some machines and search on
others).
Anyway I would like to get some feedback about two things:

1. What is the most important hardware aspect when it comes to add document
to the index and optimize it.
1.1 Is it disk I|O write throghput ? (sequential or random-io ?)
1.2 Is it RAM ?
1.3 Is is CPU ?

My guess would be disk-io, right, wrong ?

2. What is the most important hardware aspect when it comes to searching
documents in my setup ? (result-set is limited to return only the top 10
matches with page handling)
We facet and sort on the publishedDate of the entry (memory intensive I
presume)

2.1 Is it disk read throughput ? (sequential or random-io ?)
2.2 Is it RAM ?
2.3 Is is CPU ?

I have no clue since the data might not fit into memory. What is then the
most important factor ? read-performance while scanning the index ? CPU
while comparing fields and collecting results ?

What I'm trying to find out is what I can do to get most bang for the buck
with a limited (aren't we all limited?) budget.

Kindly

//Marcus





-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/


Re: How much data can Solr handle?

2009-06-26 Thread Lance Norskog
Total # of bytes for the input data is a more useful number than # of
documents.

400 million documents was our peak at my last job. They were maybe 300-500
bytes of text, for 1k of disk space per document.  The index was thus 400
gigabytes.  The problems were:

1) system administration: the logistics of the index were a nightmare.
Optimize took 14 hours, a full copy to the query servers took 1/2 an hour.
Optimize needs twice the index size in the same partition.
2) sorting creates an array with one element for every document. We needed
32G of ram in a server to allow sorted results.
3) faceting on some fields was likewise impossible, since faceting makes an
array of facet values. Faceting on timestamps was a no-no.

The servers were Dell 2950s, 2 or 4 processor, 32G ram, 6 300mb high-speed
SATA in Raid-5 for 1.2 terabytes of space.

Basic searching was a little slower than the smaller index, but still 50ms
for pre-cached queries.

On Fri, Jun 26, 2009 at 8:28 AM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:


 Hi Daniel,

 How much Solr can handle really depends on the hardware you run it on, the
 type of document you index in it, and the query rate and type.

 10M doesn't sound like a large number even for an average server today
 (e.g. 4 GB of RAM, 1-2 cores), web-page sized documents, and a query rate of
 a few dozen a second simple keyword, boolean, or phrase queries

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Daniel Löfquist daniel.lofqu...@it.cdon.com
  To: solr-user@lucene.apache.org
  Sent: Friday, June 26, 2009 7:27:45 AM
  Subject: How much data can Solr handle?
 
  We're looking to build a search solution that can contain as many as 10
 million
  different items and I was wondering if Solr could handle that kind of
 data
  amount or not?
 
  Has anybody done any testing or published any kind of results for a
  Solr-installation
  working on huge amounts of data like this?
 
  //Daniel
 
  --
  Daniel Löfquist
  Software Engineer
 
  CDON.COM
  Bergsgatan 20, Box 385, SE 201 23 Malmö, Sweden
 
  Office: +46 40 601 61 00
  Direct: +46 40 601 61 16
  Fax: +46 40 601 61 20
  E-mail: daniel.lofqu...@it.cdon.com
 
  CDON.COM
 
  Confidentiality
  Information contained in this e-mail is intended for the use of the
  addressee only, and is confidential. Any dissemination, distribution,
  copying or use of this communication without prior permission of
  the addressee is strictly prohibited. If you are not the intended
  addressee you must delete this e-mail and its attachments.




-- 
Lance Norskog
goks...@gmail.com
650-922-8831 (US)