date:20110119

Re: Search on two core and two schema

2011-01-19 Thread Damien Fontaine

Ok, but i need a relation beetween the two type of document for faceting 
on label field.


Damien

Le 18/01/2011 18:55, Geert-Jan Brits a écrit :

Schemas are very differents, i can't group them.

In contrast to what you're saying above, you may rethink the option of
combining both type of documents in a single core.
It's a perfectly valid approach to combine heteregenous documents in a
single core in Solr. (and use a specific field -say 'type'-  to distinguish
between them when needed)

Geert-Jan

2011/1/18 Jonathan Rochkind


Solr can't do that. Two cores are two seperate cores, you have to do two
seperate queries, and get two seperate result sets.

Solr is not an rdbms.


On 1/18/2011 12:24 PM, Damien Fontaine wrote:


I want execute this query :

Schema 1 :




Schema 2 :




Query :

select?facet=true&fl=title&q=title:*&facet.field=UUID_location&rows=10&qt=standard

Result :




0
0

true
title
title:*
UUID_location
standard




titre 1


Titre 2






998
891


<
  /lst>


Le 18/01/2011 17:55, Stefan Matheis a écrit :


Okay .. and .. now .. you're trying to do what? perhaps you could give us
an
example, w/ real data .. sample queries&- results.
because actually i cannot imagine what you want to achieve, sorry

On Tue, Jan 18, 2011 at 5:24 PM, Damien Fontaine
wrote:

  On my first schema, there are informations about a document like title,

lead, text etc and many UUID(each UUID is a taxon's ID)
My second schema contains my taxonomies with auto-complete and facets.

Le 18/01/2011 17:06, Stefan Matheis a écrit :

   Search on two cores but combine the results afterwards to present them
in


one group, or what exactly are you trying to do Damien?

On Tue, Jan 18, 2011 at 5:04 PM, Damien Fontaine
wrote:


   Hi,


I would like make a search on two core with differents schemas.

Sample :

Schema Core1
   - ID
   - Label
   - IDTaxon
...

Schema Core2
   - IDTaxon
   - Label
   - Hierarchy
...

Schemas are very differents, i can't group them. Have you an idea to
realize this search ?

Thanks,

Damien

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Péter Király

> [x] ASF Mirrors (linked in our release announcements or via the Lucene 
> website)
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
> [x] I/we build them from source via an SVN/Git checkout.
> I rarely build, only if I would like to try an interesting patch.
> [] Other (someone in your company mirrors them internally or via a downstream 
> project)

Király Péter
eXtensible Catalog
http://eXtensibleCatalog.org

Re: unix permission styles for access control

2011-01-19 Thread Toke Eskildsen

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.


You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)


Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4


Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Solr with Unknown Lucene Index?

2011-01-19 Thread Lee Goddard

I have to use some Lucene indexes, and Solr looks like the perfect 
solution.


However, all I know about the Lucene indexes are what Luke tells me, and 
simply setting the schema to represent all fields as text does not seem 
to be working -- though as this is my first Solr, I am not sure if that 
is due to some other issue.


Is there some way to ascertain how the Solr schema should describe the 
Lucene fields?


Many thanks in anticipation
Lee

Re: Solr Out of Memory Error

2011-01-19 Thread Isan Fulia

Hi all,
By adding more servers do u mean sharding of index.And after sharding , how
my query performance will be affected .
Will the query execution time increase.

Thanks,
Isan Fulia.

On 19 January 2011 12:52, Grijesh  wrote:

>
> Hi Isan,
>
> It seems your index size 25GB si much more compared to you have total Ram
> size is 4GB.
> You have to do 2 things to avoid Out Of Memory Problem.
> 1-Buy more Ram ,add at least 12 GB of more ram.
> 2-Increase the Memory allocated to solr by setting XMX values.at least 12
> GB
> allocate to solr.
>
> But if your all index will fit into the Cache memory it will give you the
> better result.
>
> Also add more servers to load balance as your QPS is high.
> Your 7 Laks data makes 25 GB of index its looking quite high.Try to lower
> the index size
> What are you indexing in your 25GB of index?
>
> -
> Thanx:
> Grijesh
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Out-of-Memory-Error-tp2280037p2285779.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Local param tag voodoo ?

2011-01-19 Thread Xavier SCHEPLER

You're right the second query didn't result in an error but neither gave the 
expected result.
I'm gone to have a look at the link you gave me.
Thanks !

> 
> From: Markus Jelsma 
> Sent: Tue Jan 18 21:31:52 CET 2011
> To: 
> Subject: Re: Local param tag voodoo ?
> 
> 
> Hi,
> 
> You get an error because LocalParams need to be in the beginning of a 
> parameter's value. So no parenthesis first. The second query should not give 
> an 
> error because it's a valid query.
> 
> Anyway, i assume you're looking for :
> http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
> Select_Faceting_and_LocalParams
> 
> Cheers,
> 
> > Hey,
> > 
> > here are my needs :
> > 
> > - a query that has tagged and untagged contents
> > - facets that ignore the tagged contents
> > 
> > I tryed :
> > 
> > q=({!tag=toExclude} ignored)  taken into account
> > q={tag=toExclude v='ignored'} take into account
> > 
> > Both resulted in a error.
> > 
> > Is this possible or do I have to try another way ?


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm

Re: Local param tag voodoo ?

2011-01-19 Thread Xavier SCHEPLER

Ok I was already at this point.
My facetting system use exactly what is described in this page. I read it from 
the Solr 1.4 book. Otherwise I would'nt ask.
The problem is that the filter queries doesn't affect the relevance score of 
the results so I want the terms in the main query.


> 
> From: Markus Jelsma 
> Sent: Tue Jan 18 21:31:52 CET 2011
> To: 
> Subject: Re: Local param tag voodoo ?
> 
> 
> Hi,
> 
> You get an error because LocalParams need to be in the beginning of a 
> parameter's value. So no parenthesis first. The second query should not give 
> an 
> error because it's a valid query.
> 
> Anyway, i assume you're looking for :
> http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
> Select_Faceting_and_LocalParams
> 
> Cheers,
> 
> > Hey,
> > 
> > here are my needs :
> > 
> > - a query that has tagged and untagged contents
> > - facets that ignore the tagged contents
> > 
> > I tryed :
> > 
> > q=({!tag=toExclude} ignored)  taken into account
> > q={tag=toExclude v='ignored'} take into account
> > 
> > Both resulted in a error.
> > 
> > Is this possible or do I have to try another way ?


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm

How to keep a maintained index with crawled data

2011-01-19 Thread Erlend Garåsen



We need a crawler for all web pages outside our CMS, but one crucial 
future seems to be missing in many of them - a way to detect changes in 
these documents. Say that you have run a daily crawler job for two 
months looking for new web pages to crawl in order to keep the Solr 
index updated. But suddenly a lot of pages where either changed or 
deleted, and now you have an outdated Solr index.


In other words, we need to detect removed web pages and trigger a delete 
command to Solr. We also need to detect web pages which have been 
modified in order to update the Solr index.


For me it seems that the Aperture web crawler is the only one with such 
futures. The crawler handler has methods for modified and removed documents:

http://sourceforge.net/apps/trac/aperture/wiki/Crawlers

Or is it possible to do similar things with the other crawlers such as 
Nutch?


Many thanks in advance for all kinds of suggestions!

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Switching existing solr indexes from Segment to Compound Style index files

2011-01-19 Thread Nicholas W

Dear All,
 On a Linux system running a multi-core linux server, we are
experiencing a problem of too many files open which is causing tomcat
to abort. Reading the documentation, one of the things it seems we can
do is to switch to using compound indexes. We can see that in the
solrconfig.xml there is an option:


   true

in the  and  section. We have set this to
true and restarted tomcat.

I have then used the script ./optimize script to get Solr to optimize
the index. In the lucene documentation it suggests this is the way to
switch to a compound index. However, with SOLR while the index is
optimized, its not converted to a compound file.

What are we doing wrong? What is the correct way to convert an index
to use a compound file?

Thanks a lot for your suggestions.

Regards,
Nicholas

lazy loading error?

2011-01-19 Thread Jörg Agatz

Hallo, i have a problem with Solr and it looks like RequestHandlers.. but i
dont know what i must do...


i have remove and reinstall Openjdk
installt maven2 and tika,

nothing Chane..

someware in idea for me?







Command:

curl "
http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_commit=true";
-F "myfile=@test.xls"


EROOR:

Apache Tomcat/6.0.24 - Error report
HTTP Status 500 - lazy loading error


org.apache.solr.common.SolrException: lazy loading error

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)

at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)

at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:636)

Caused by: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler'

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)

at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)

... 16 more

Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler

at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:615)

at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)

... 19 more

type Status
reportmessage lazy loading error


org.apache.solr.common.SolrException: lazy loading error

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)

at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)

at
org.apache.coyote.http11.Http11Processor.process(Htt

Highlighting default encoder

2011-01-19 Thread Darx Oman

In Solr admin advance search page the highlighted text is not displayed
correctly for Arabic characters!

I'm using Solr Trunk 2011-01-10 ….

It use to be working in solr 1.4.1.

Does anybody knows why?

How to find Master & Slave are in sync

2011-01-19 Thread Shanmugavel SRD


How to find Master & Slave are in sync?
Is there a way apart from checking the index version of master and slave
using below two HTTP APIs?

http://master_host:port/solr/replication?command=indexversion
http://slave_host:port/solr/replication?command=details
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-find-Master-Slave-are-in-sync-tp2287014p2287014.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: lazy loading error?

2011-01-19 Thread Juan Grande

In order to use the ExtractingRequestHandler, you have to first copy
apache-solr-cell-.jar and all the libraries from
contrib/extraction/lib to a "lib" folder next to the "conf" folder of your
instance.

Also, check the URL because there is an ampersand missing.

Regards,

*Juan Grande*

On Wed, Jan 19, 2011 at 7:43 AM, Jörg Agatz wrote:

> Hallo, i have a problem with Solr and it looks like RequestHandlers.. but i
> dont know what i must do...
>
>
> i have remove and reinstall Openjdk
> installt maven2 and tika,
>
> nothing Chane..
>
> someware in idea for me?
>
>
>
>
>
>
>
> Command:
>
> curl "
>
> http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_commit=true
> "
> -F "myfile=@test.xls"
>
>
> EROOR:
>
> Apache Tomcat/6.0.24 - Error report
> HTTP Status 500 - lazy loading error
>
>
> org.apache.solr.common.SolrException: lazy loading error
>
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
>
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
>
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
>
> at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>
> at java.lang.Thread.run(Thread.java:636)
>
> Caused by: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.extraction.ExtractingRequestHandler'
>
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
>
> at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
>
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:240)
>
> ... 16 more
>
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.handler.extraction.ExtractingRequestHandler
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>
> at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:615)
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>
> at java.lang.Class.forName0(Native Method)
>
> at java.lang.Class.forName(Class.java:264)
>
> at
>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
>
> ... 19 more
>
> type Status
> reportmessage lazy loading error
>
>
> org.apache.solr.common.SolrException: lazy loading error
>
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
>
> at
>
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
>
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>
> at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>
> at
>
> org.apac

Re: Return all contents from collection

2011-01-19 Thread Erick Erickson

Follow Ahmet's lead here. Selecting all documents and counting will
absolutely
not work for you once you get to any real-world corpus. You want
to turn on faceting I'm pretty sure. Here's a good resource...

http://wiki.apache.org/solr/SimpleFacetParameters

Best
Erick

P.S. Please don't "reply to all", that sends a copy to my personal
inbox. No big deal

On Tue, Jan 18, 2011 at 10:04 PM, Dan Baughman  wrote:

> I'm fairly new to solr, but I'm running into some behaviour I can't
> explain.
>
> I have about 30 documents in the index that are in one specific category,
> and no others.  If I run my query and facet query with *:* this category is
> not represented in the facet counts. If i search for a word in those
> documents, but leave the facet query to *:*, these documents are suddenly
> represented in the face counts.
>
> For some reason every document in this category is being left out of the
> facet counts for *:*.. it makes me wonder if other documents are, as
> well.
>
> Is that expected behaviour?
>
> -Original message-
> From: Ahmet Arslan iori...@yahoo.com
> Date: Tue, 18 Jan 2011 20:43:40 -0700
> To: solr-user@lucene.apache.org,  Dan Baughman da...@hostworks.com
> Subject: Re: Return all contents from collection
>
> > > I am building a faceted search and
> > > want the default view to show all of the facet counts.
> > >
> > > When I try submitting just a wild card like that, I get an
> > > error.
> > >
> > >
> > > '*' or '?' not allowed as first character in WildcardQuery
> >
> > *:* should be just fine. It is a special match all docs query.
> >
> >
> >
>

Solr with many indexes

2011-01-19 Thread Joscha Feth

Hello Solrs,

I am looking into using Solr, but my intended usage would require having
many different indexes which are not connected (e.g some index-tenancy with
one or multiple indexes per user).
I understand that creating independent indexes in Solr happens by creating
Solr cores via CoreAdmin.
I came across this document: http://wiki.apache.org/solr/LotsOfCores which
basically tells me that having many indexes is not an intended use for Solr.
Is this also true for SolrCloud (http://wiki.apache.org/solr/SolrCloud)?
If yes, about what upper limit of indexes are we talking about here? Tens?
Hundreds? Thousands?

Thank you very much!
Regards,
Joscha Feth

Re: How to find Master & Slave are in sync

2011-01-19 Thread Markus Jelsma

Notice the index version number? If it's equal, then they are in sync.

On Wednesday 19 January 2011 13:37:32 Shanmugavel SRD wrote:
> How to find Master & Slave are in sync?
> Is there a way apart from checking the index version of master and slave
> using below two HTTP APIs?
> 
> http://master_host:port/solr/replication?command=indexversion
> http://slave_host:port/solr/replication?command=details

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Paige Cook

>
>
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
>

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Matthew Hall


[X] ASF Mirrors (linked in our release announcements or via the Lucene website)

[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a downstream 
project)



--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012

Re: Solr Out of Memory Error

2011-01-19 Thread Adam Estrada

Is anyone familiar with the environment variable, JAVA_OPTS? I set
mine to a much larger heap size and never had any of these issues
again.

JAVA_OPTS = -server -Xms4048m -Xmx4048m

Adam

On Wed, Jan 19, 2011 at 3:29 AM, Isan Fulia  wrote:
> Hi all,
> By adding more servers do u mean sharding of index.And after sharding , how
> my query performance will be affected .
> Will the query execution time increase.
>
> Thanks,
> Isan Fulia.
>
> On 19 January 2011 12:52, Grijesh  wrote:
>
>>
>> Hi Isan,
>>
>> It seems your index size 25GB si much more compared to you have total Ram
>> size is 4GB.
>> You have to do 2 things to avoid Out Of Memory Problem.
>> 1-Buy more Ram ,add at least 12 GB of more ram.
>> 2-Increase the Memory allocated to solr by setting XMX values.at least 12
>> GB
>> allocate to solr.
>>
>> But if your all index will fit into the Cache memory it will give you the
>> better result.
>>
>> Also add more servers to load balance as your QPS is high.
>> Your 7 Laks data makes 25 GB of index its looking quite high.Try to lower
>> the index size
>> What are you indexing in your 25GB of index?
>>
>> -
>> Thanx:
>> Grijesh
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-Out-of-Memory-Error-tp2280037p2285779.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>

Re: lazy loading error?

2011-01-19 Thread Jörg Agatz

ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
no existing contrib folder or lib folder?


where will missing an ampersand missing???


curl "
http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_commit=true";
-F "myfile=@test.xls"


>
King

Re: lazy loading error?

2011-01-19 Thread Gora Mohanty

On Wed, Jan 19, 2011 at 7:35 PM, Jörg Agatz  wrote:
> ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
> no existing contrib folder or lib folder?

The contrib/extraction/lib folder should be under the top-level
directory of your Solr source directory. The location of that might
depend on how you installed Solr, and on which operating system,
and distribution you are running.

> where will missing an ampersand missing???
[...]
> curl "
> http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_commit=true";
[...]

An ampersand is missing between the uprefix=attr_ and the
commit=true, i.e., the URL should be
  
http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_&commit=true

Regards,
Gora

Re: Re: lazy loading error?

2011-01-19 Thread ahopedog

J鰎g_Agatz,您好！

copy to tomcat common lib folder.

=== 2011-01-19 22:06:18 您在来信中写道：===

>ok, but i cant find the folders in the Tomcat folder /varlib/tomcat6/solr/
>no existing contrib folder or lib folder?
>
>
>where will missing an ampersand missing???
>
>
>curl "
>http://192.168.105.210:8080/solr/rechnungen/update/extract?literal.id=1234567&uprefix=attr_commit=true";
>-F "myfile=@test.xls"
>
>
>>
>King
>

= = = = = = = = = = = = = = = = = = = =


致
礼！
 
 
ahopedog
ahope...@126.com
　　2011-01-19

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

so fieldName.x ishow to address bits?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.


You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)


Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4


Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.

So 'fieldName.x' is how to address bits?

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.

You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)

Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4

Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

facet or filter based on user's history

2011-01-19 Thread Jon Brock

Hi,

I'm looking for ideas on how to make an efficient facet query on a
user's history with respect to the catalog of documents (something
like "Read document already: yes / no"). The catalog is around 100k
titles and there are several thousand users. Of course, each user has
a different history, many having read fewer than 500 titles, but some
heavy users having read perhaps 50k titles.

Performance is not terribly important right now so all I did was bump
up the boolean query limit and put together a big string of document
id's that the user has read. The first query is slow but once it's in
the query cache it's fine. I would like to find a better way of doing
it though.

What type of solr plugin would be best suited to helping in this
situation? I could make a function plugin that provides something like
hasHadBefore() - true/false, but would that be efficient for faceting
and filtering? Another idea is a QParserPlugin that looks for a field
like hasHadBefore:userid and somehow substitutes in the list of docs.
But I'm not sure how a new parser plugin would interact with the
existing parser. Can solr use a parser plugin to only handle one
field, and leave all the other fields to the default parser?

Thanks,
Jon

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Did some more searching this morning. Perhaps being bleary eyed helpe :-) I 
found this JIRA which does bitwise boolean operator filtering:

 https://issues.apache.org/jira/browse/SOLR-1913

I'm not that sure how to interpret JIRA pages for features. It's 'OPEN", but 
the 
comments all say it works.

So, what's they syntax for combining filters in queries? I am currently using 
the spatial filter.How would I write a query that combines:

http://localhost:8983/path/to/solr/select/?q={!bitwise  field=fieldname 
op=OPERATION_NAME source=sourcevalue  negate=boolean}remainder
   {!spatial lat=37.393026 long=-121.998304 radius=10 unit=km threadCount=3} 
ts_begin:[1 TO 2145916800] AND text:"find_this"
 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Toke Eskildsen 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
> I was wondering if the are binary operation filters? Haven't seen any in the 
> book nor was able to find any using google.
> 
> So if I had 0600(octal) in a permission field, and I wanted to return any 
> records that 'permission & 0400(octal)==TRUE', how would I filter that?

Don't you mean permission & 0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.

You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to 
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)

Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4

Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: Replication: abort-fetch and restarting

2011-01-19 Thread Markus Jelsma

Issue created:
https://issues.apache.org/jira/browse/SOLR-2323

On Tuesday 04 January 2011 20:08:40 Markus Jelsma wrote:
> Hi,
> 
> It seems abort-fetch nicely removes the index directory which i'm
> replicating to which is fine. Restarting, however, does not trigger the
> the same feature as the abort-fetch command does. At least, that's what my
> tests seems to tell me.
> 
> Shouldn't a restart of Solr nicely clean up the mess before exiting? And,
> shouldn't starting Solr also look for mess left behind by a possible sudden
> shutdown of the server at which the mess obviously cannot get cleaned?
> 
> If i now stop, clean and start my slave it will attempt to download an
> existing index. If i abort-fetch it will clean up the mess and (due to low
> interval polling) make another attempt. If i, however, restart (instead of
> abort-fetch) the old temporary directory will stay and needs to be deleted
> manually.
> 
> Cheers,

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: How to keep a maintained index with crawled data

2011-01-19 Thread Jack Krupansky


Take a look at Apache ManifoldCF (incubating, close to 0.1 release):

http://incubator.apache.org/connectors/

In addition to a fairly sophisticated general web crawler which maintains 
the state of crawled web pages it has a file system crawler and crawlers for 
a variety of document repositories. It has an output connector that sends 
documents and delete requests to Solr Cell.


-- Jack Krupansky

-Original Message- 
From: Erlend Garåsen

Sent: Wednesday, January 19, 2011 4:29 AM
To: solr-user@lucene.apache.org
Subject: How to keep a maintained index with crawled data


We need a crawler for all web pages outside our CMS, but one crucial
future seems to be missing in many of them - a way to detect changes in
these documents. Say that you have run a daily crawler job for two
months looking for new web pages to crawl in order to keep the Solr
index updated. But suddenly a lot of pages where either changed or
deleted, and now you have an outdated Solr index.

In other words, we need to detect removed web pages and trigger a delete
command to Solr. We also need to detect web pages which have been
modified in order to update the Solr index.

For me it seems that the Aperture web crawler is the only one with such
futures. The crawler handler has methods for modified and removed documents:
http://sourceforge.net/apps/trac/aperture/wiki/Crawlers

Or is it possible to do similar things with the other crawlers such as
Nutch?

Many thanks in advance for all kinds of suggestions!

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Martijn v Groningen

[] ASF Mirrors (linked in our release announcements or via the Lucene
website)

[ X ] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[ X ] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a
downstream project)

On 19 January 2011 14:59, Matthew Hall  wrote:

> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downstream project)
>
>
>
> --
> Matthew Hall
> Software Engineer
> Mouse Genome Informatics
> mh...@informatics.jax.org
> (207) 288-6012
>
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-19 Thread Matthias Epheser




[] ASF Mirrors (linked in our release announcements or via the Lucene website)

[x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

[] I/we build them from source via an SVN/Git checkout.

[] Other (someone in your company mirrors them internally or via a downstream 
project)

Re: Return all contents from collection

2011-01-19 Thread Jonathan Rochkind

I know that this is often a performance problem -- but Erick, I am 
interested in the 'better solution' you hint at!


There are a variety of cases where you want to 'dump' all documents from 
a collection. One example might be in order to build a Google SiteMap 
for your app that's fronting your Solr. That's mine at the moment.   If 
anyone can think of a way to do this that doesn't have horrible 
performance (and bonus points if it doesn't completely mess up caches 
too by filling them with everything), that would be awesome.


Jonathan

On 1/18/2011 8:47 PM, Erick Erickson wrote:

This is usually a bad idea, but if you really must use
q=*:*&start=0&rows=100

Assuming that there are fewer than 1,000,000 documents in your index.

And if there are more, you won't like the performance anyway.

Why do you want to do this? There might be a better solution.

Best
Erick

On Tue, Jan 18, 2011 at 7:58 PM, Dan Baughman  wrote:


Is there a way I can simply tell the index to return its entire record set?

I tried starting and ending with just  a "*" but no dice.

Re: Local param tag voodoo ?

2011-01-19 Thread Jonathan Rochkind

What query are you actually trying to do?  There's probably a way to do 
it, possibly using nested queries -- but not using illegal syntax like 
some of your examples!  If you explain what you want to do, someone may 
be able to tell you how.  From the hints in your last message, I suspect 
nested queries _might_ be helpful to you.


On 1/19/2011 3:46 AM, Xavier SCHEPLER wrote:

Ok I was already at this point.
My facetting system use exactly what is described in this page. I read it from 
the Solr 1.4 book. Otherwise I would'nt ask.
The problem is that the filter queries doesn't affect the relevance score of 
the results so I want the terms in the main query.




From: Markus Jelsma
Sent: Tue Jan 18 21:31:52 CET 2011
To:
Subject: Re: Local param tag voodoo ?


Hi,

You get an error because LocalParams need to be in the beginning of a
parameter's value. So no parenthesis first. The second query should not give an
error because it's a valid query.

Anyway, i assume you're looking for :
http://wiki.apache.org/solr/SimpleFacetParameters#Multi-
Select_Faceting_and_LocalParams

Cheers,


Hey,

here are my needs :

- a query that has tagged and untagged contents
- facets that ignore the tagged contents

I tryed :

q=({!tag=toExclude} ignored)  taken into account
q={tag=toExclude v='ignored'} take into account

Both resulted in a error.

Is this possible or do I have to try another way ?


--
Tous les courriers électroniques émis depuis la messagerie
de Sciences Po doivent respecter des conditions d'usages.
Pour les consulter rendez-vous sur
http://www.ressources-numeriques.sciences-po.fr/confidentialite_courriel.htm

Re: unix permission styles for access control

2011-01-19 Thread Jonathan Rochkind

No. There is no built in way to address 'bits' in Solr that I am aware 
of.  Instead you can think about how to transform your data at indexing 
into individual tokens (rather than bits) in one or more field, such 
that they are capable of answering your query.  Solr works in tokens as 
the basic unit of operation (mostly, basically), not characters or bytes 
or bits.


On 1/19/2011 9:48 AM, Dennis Gearon wrote:

Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.

So 'fieldName.x' is how to address bits?


  Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Toke Eskildsen
To: "solr-user@lucene.apache.org"
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:

I was wondering if the are binary operation filters? Haven't seen any in the
book nor was able to find any using google.

So if I had 0600(octal) in a permission field, and I wanted to return any
records that 'permission&  0400(octal)==TRUE', how would I filter that?

Don't you mean permission&  0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.


You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)


Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4


Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: HTTP Status 400 - org.apache.lucene.queryParser.ParseException

2011-01-19 Thread Erick Erickson

There's nothing that I know of that would accomplish this, sorry...

Best
Erick

On Tue, Jan 18, 2011 at 11:22 PM, kun xiong  wrote:

> Hi Erick,
>  Thanks for the fast reply. I kind of figured it was not supposed to be
> that way.
> But it would have some benefits when we need migrate from Lucene to Solr.
> We don't have to rewrite the build query part, right. Is there any parser
> can do that?
>
> 2011/1/18 Ahmet Arslan 
>
> > > what's the alternative?
> >
> > q=kfc+mdc&defType=dismax&mm=1&qf=I_NAME_ENUM
> >
> > See more: http://wiki.apache.org/solr/DisMaxQParserPlugin
> >
> >
> >
> >
>

Re: Mem allocation - SOLR vs OS

2011-01-19 Thread Erick Erickson

You're better off using two cores on the same Solr instance rather than two
instances of Tomcat, that way you avoid some overhead.

The usual advice is to monitor the Solr caches, particularly for evictions
and
size the Solr caches accordingly. You can see these from the admin/stats
page
and also by mining the logs, looking particularly for cache evictions. Since
cache
usage is so dependent on the particular installation and usage pattern
(particularly
sorting and faceting), "general" advice is hard to give.

Hope this helps
Erick

On Wed, Jan 19, 2011 at 2:25 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> In case it helps there are two SOLR indexes (160GB and 700GB) on the
> machine.
>
> Also these are separate indexes and not shards so would it help to put them
> on two separate Tomcat servers on same machine? This way I think one index
> won't be affecting others cache.
>
> On Wed, Jan 19, 2011 at 12:00 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > Hi,
> >
> > I know this is a subjective topic but from what I have read it seems more
> > RAM should be spared for OS caching and much less for SOLR/Tomcat even on
> a
> > dedicated SOLR server.
> >
> > Can someone give me an idea about the theoretically ideal proportion b/w
> > them for a dedicated Windows server with 32GB RAM? Also the index is
> updated
> > every hour.
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
> >
>
>
> --
> Regards,
>
> Salman Akram
>

Re: Search on two core and two schema

2011-01-19 Thread Erick Erickson

Then you probably want to consider simply flattening the data and storing
the
relevant data with a single schema. If that doesn't work for you, there is a
limited
join capability going into the trunk, see:
https://issues.apache.org/jira/browse/SOLR-2272

Best
Erick

On Wed, Jan 19, 2011 at 3:17 AM, Damien Fontaine wrote:

> Ok, but i need a relation beetween the two type of document for faceting on
> label field.
>
> Damien
>
> Le 18/01/2011 18:55, Geert-Jan Brits a écrit :
>
>  Schemas are very differents, i can't group them.

>>> In contrast to what you're saying above, you may rethink the option of
>> combining both type of documents in a single core.
>> It's a perfectly valid approach to combine heteregenous documents in a
>> single core in Solr. (and use a specific field -say 'type'-  to
>> distinguish
>> between them when needed)
>>
>> Geert-Jan
>>
>> 2011/1/18 Jonathan Rochkind
>>
>>  Solr can't do that. Two cores are two seperate cores, you have to do two
>>> seperate queries, and get two seperate result sets.
>>>
>>> Solr is not an rdbms.
>>>
>>>
>>> On 1/18/2011 12:24 PM, Damien Fontaine wrote:
>>>
>>>  I want execute this query :

 Schema 1 :
 >>> required="true" />
 >>> required="true" />
 >>> required="true" />

 Schema 2 :
 >>> required="true" />
 >>> required="true" />
 >>> required="true" />

 Query :


 select?facet=true&fl=title&q=title:*&facet.field=UUID_location&rows=10&qt=standard

 Result :

 
 
 
 0
 0
 
 true
 title
 title:*
 UUID_location
 standard
 
 
 
 
 titre 1
 
 
 Titre 2
 
 
 
 
 
 
 998
 891
 
 
 <
  /lst>
 

 Le 18/01/2011 17:55, Stefan Matheis a écrit :

  Okay .. and .. now .. you're trying to do what? perhaps you could give
> us
> an
> example, w/ real data .. sample queries&- results.
> because actually i cannot imagine what you want to achieve, sorry
>
> On Tue, Jan 18, 2011 at 5:24 PM, Damien Fontaine
>> wrote:
>>
>  On my first schema, there are informations about a document like
> title,
>
>> lead, text etc and many UUID(each UUID is a taxon's ID)
>> My second schema contains my taxonomies with auto-complete and facets.
>>
>> Le 18/01/2011 17:06, Stefan Matheis a écrit :
>>
>>   Search on two cores but combine the results afterwards to present
>> them
>> in
>>
>>  one group, or what exactly are you trying to do Damien?
>>>
>>> On Tue, Jan 18, 2011 at 5:04 PM, Damien Fontaine<
>>> dfonta...@rosebud.fr
>>>
>>>  wrote:

Hi,
>>>
>>>  I would like make a search on two core with differents schemas.

 Sample :

 Schema Core1
   - ID
   - Label
   - IDTaxon
 ...

 Schema Core2
   - IDTaxon
   - Label
   - Hierarchy
 ...

 Schemas are very differents, i can't group them. Have you an idea to
 realize this search ?

 Thanks,

 Damien





>

Re: Solr with Unknown Lucene Index?

2011-01-19 Thread Erick Erickson

I don't really think this is possible/reasonable. There's nothing fixed
about
a Lucene index, you could index a field in different documents with any
number of analysis chains. The tricky part here will, as you've discovered,
find a way to match the Solr schema "closely enough" to get your desired
results.

Are you sure there's no way to re-index the data? Or find the original code
that indexed it?

Best
Erick

On Wed, Jan 19, 2011 at 3:22 AM, Lee Goddard  wrote:

> I have to use some Lucene indexes, and Solr looks like the perfect
> solution.
>
> However, all I know about the Lucene indexes are what Luke tells me, and
> simply setting the schema to represent all fields as text does not seem to
> be working -- though as this is my first Solr, I am not sure if that is due
> to some other issue.
>
> Is there some way to ascertain how the Solr schema should describe the
> Lucene fields?
>
> Many thanks in anticipation
> Lee
>

Highlighting approach.

2011-01-19 Thread Hasnain


Hi all,

  Im looking into solr's highlighting component, as far as I understood,
solr's response.getHighlighting() gives back formatted string along with id,
then we have to loop through the searched documents and search for id and
then replace the formatted string. This approach will seriously slow things
up because of looping. Is this the only way to use highlighting component or
is my understanding not correct?
Im using solr 1.4

Thanks in advance

Hasnain. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-approach-tp2288552p2288552.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Switching existing solr indexes from Segment to Compound Style index files

2011-01-19 Thread Erick Erickson

Let's back up a ways here and figure out why you're getting so many
files open.

1> how many files are in your index?
2> are you committing very frequently?
3> or do you simply have a LOT of cores?
4> do you optimize your indexes? If so, how many files to you have in your
cores before/after optimizing?

You're perhaps exactly right in your approach, but with a bit more info
we may be able to suggest other alternatives.

Best
Erick

On Wed, Jan 19, 2011 at 4:48 AM, Nicholas W <4...@log1.net> wrote:

> Dear All,
>  On a Linux system running a multi-core linux server, we are
> experiencing a problem of too many files open which is causing tomcat
> to abort. Reading the documentation, one of the things it seems we can
> do is to switch to using compound indexes. We can see that in the
> solrconfig.xml there is an option:
>
>
>   true
>
> in the  and  section. We have set this to
> true and restarted tomcat.
>
> I have then used the script ./optimize script to get Solr to optimize
> the index. In the lucene documentation it suggests this is the way to
> switch to a compound index. However, with SOLR while the index is
> optimized, its not converted to a compound file.
>
> What are we doing wrong? What is the correct way to convert an index
> to use a compound file?
>
> Thanks a lot for your suggestions.
>
> Regards,
> Nicholas
>

Re: Switching existing solr indexes from Segment to Compound Style index files

2011-01-19 Thread Markus Jelsma

Indeed, wouldn't reducing the number of segments be a better idea? Speeds up 
searching too! Do you happen to have a very high mergeFactor value for each 
core?

On Wednesday 19 January 2011 17:53:12 Erick Erickson wrote:
> You're perhaps exactly right in your approach, but with a bit more info
> we may be able to suggest other alternatives.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Solr with many indexes

2011-01-19 Thread Erick Erickson

Solr will handle lots of cores, but that page is talking about lots.
Thousands.

But I question why you *require* many different indexes. It's perfectly
reasonable
to store different fields in different documents in the *same* index, unlike
a table in an RDBMS.

There are good reasons to have separate cores, including isolating one
users'
data from all others, so I'm not saying that you should necessarily put
everything
into a single core

And even using lots of cores can be made to work if you don't pre-warm
newly-opened
cores, assuming that the response time when using "cold searchers" is
adequate.

Best
Erick

On Wed, Jan 19, 2011 at 7:41 AM, Joscha Feth  wrote:

> Hello Solrs,
>
> I am looking into using Solr, but my intended usage would require having
> many different indexes which are not connected (e.g some index-tenancy with
> one or multiple indexes per user).
> I understand that creating independent indexes in Solr happens by creating
> Solr cores via CoreAdmin.
> I came across this document: http://wiki.apache.org/solr/LotsOfCores which
> basically tells me that having many indexes is not an intended use for
> Solr.
> Is this also true for SolrCloud (http://wiki.apache.org/solr/SolrCloud)?
> If yes, about what upper limit of indexes are we talking about here? Tens?
> Hundreds? Thousands?
>
> Thank you very much!
> Regards,
> Joscha Feth
>

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

So, if I used something like r-u-d-o in a field (read,update,delete,others) I 
could get it tokenized to those four characters,and then search for those in 
that field. Is that what you're suggesting, (thanks by the way).

An article I read created a 'hybrid' access control system (can't remember if 
it 
was ACL or RBAC). It used a primary system like Unix file system 9bit 
permission 
for the primary permissions normally needed on most objects of any kind, and 
then flagged if there were any other permissions and any other groups. It was 
very fast for the primary permissons, and fast for the secondary. 

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Jonathan Rochkind 
To: "solr-user@lucene.apache.org" 
Sent: Wed, January 19, 2011 8:40:30 AM
Subject: Re: unix permission styles for access control

No. There is no built in way to address 'bits' in Solr that I am aware 
of.  Instead you can think about how to transform your data at indexing 
into individual tokens (rather than bits) in one or more field, such 
that they are capable of answering your query.  Solr works in tokens as 
the basic unit of operation (mostly, basically), not characters or bytes 
or bits.

On 1/19/2011 9:48 AM, Dennis Gearon wrote:
> Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.
>
> So 'fieldName.x' is how to address bits?
>
>
>   Dennis Gearon
>
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a 
>better
> idea to learn from others’ mistakes, so you do not have to make them yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> - Original Message 
> From: Toke Eskildsen
> To: "solr-user@lucene.apache.org"
> Sent: Wed, January 19, 2011 12:23:04 AM
> Subject: Re: unix permission styles for access control
>
> On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:
>> I was wondering if the are binary operation filters? Haven't seen any in the
>> book nor was able to find any using google.
>>
>> So if I had 0600(octal) in a permission field, and I wanted to return any
>> records that 'permission&  0400(octal)==TRUE', how would I filter that?
> Don't you mean permission&  0400(octal) == 0400? Anyway, the
> functionality can be accomplished by extending your index a bit.
>
>
> You could split the permission into user, group and all parts, then use
> an expanded query.
>
> If the permission is 0755 it will be indexed as
> user_p:7 group_p:5 all_p:5
>
> If you're searching for something with at least 0650 your query should
> be expanded to
> (user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)
>
>
> Alternatively you could represent the bits explicitly in the index:
> user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5
>
> Then a search for 0650 would query with
> user_p:2 AND user_p:4 AND group_p:1 AND group_p:4
>
>
> Finally you could represent all valid permission values, still split
> into parts with
> user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
> group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
> all_p:1 all_p:2 all_p:3 all_p:4 all_p:5
>
> The query would be simply
> user_p:6 AND group_p:5

Documentaion: For newbies and recent newbies

2011-01-19 Thread Dennis Gearon

If someone is looking for good documentation and getting started guides, I am 
putting this in the newsgroups to be searched upon. I recommend:

A/ The Wikis: (FREE)
   http://wiki.apache.org/solr/FrontPage

B/ The book and eBook: (COSTS $45.89)
  https://www.packtpub.com/solr-1-4-enterprise-search-server/book

C/ The (seemingly) total reference guide:(FREE, with registration)
   
http://www.lucidimagination.com/software_downloads/certified/cdrg/lucidworks-solr-refguide-1.4.pdf



D/ The webinar on optimizing the search engine to Do a GOOD search, 
 based on YOUR needs, not general ones: (FREE, with registration)
  
http://www.lucidimagination.com/Solutions/Webinars/Analyze-This-Tips-and-tricks-getting-LuceneSolr-Analyzer-index-and-search-your-content


Personally, I am working on being more than barely informed on items A & B :-)

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

How to index my users info

2011-01-19 Thread Jonilson Pinheiro da Silva

I would like to index the information of my employees to be able to get
through some fields such as: e-mail, registration, ID, cell phone, name.

I am very new to SOLR and would like to know how to index these fields this
way and how to search filtering by some of these fields.

Thanks in advance

Jota.

Specifying an AnalyzerFactory in the schema

2011-01-19 Thread Renaud Delbru


Hi,

I notice that in the schema, it is only possible to specify a Analyzer 
class, but not a Factory class as for the other elements (Tokenizer, 
Fitler, etc.).
This limits the use of this feature, as it is impossible to specify 
parameters for the Analyzer.
I have looked at the IndexSchema implementation, and I think this 
requires a simple fix. Do I open an issue about it ?


Regards,
--
Renaud Delbru

Re: unix permission styles for access control

2011-01-19 Thread Jonathan Rochkind

Yep, that's what I'm suggesting as one possible approach to consider, 
whether it will work or not depends on your specifics.


Character length in a token doesn't really matter for solr performance.  
It might be less confusing  to actually put "read update delete own" (or 
whatever 'o' stands for) in a field, such that it will be tokenized so 
each of those words is a seperate token.  (Make sure you aren't stemming 
or using synonyms, heh!).


Or instead of seperating a single string into tokens, use a multi-valued 
String field, and put "read", "delete", etc in as seperate values. That 
is actually more straightforward and less confusing than tokenizing.


Then you can just search for fq=permissions:read or whatever.

Again, whether this will actually work for you depends on exactly what 
you're requirements are, but it's something to consider, before 
resorting to weird patches.  It will work in any Solr version.


The first approach to solving a problem in Solr should be trying to 
think "Can I solve this by setting up my index in such a way that I can 
ask the questions I want simply by asking if a certain token is in a 
certain field?"  Because that's what Solr does, basically, tell you if 
certain tokens are in certain fields. If you can reduce the problem to 
that, Solr will handle it easily, simply, and efficiently.  Otherwise, 
you might need weird patches. :)


On 1/19/2011 12:45 PM, Dennis Gearon wrote:

So, if I used something like r-u-d-o in a field (read,update,delete,others) I
could get it tokenized to those four characters,and then search for those in
that field. Is that what you're suggesting, (thanks by the way).

An article I read created a 'hybrid' access control system (can't remember if it
was ACL or RBAC). It used a primary system like Unix file system 9bit permission
for the primary permissions normally needed on most objects of any kind, and
then flagged if there were any other permissions and any other groups. It was
very fast for the primary permissons, and fast for the secondary.


  Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Jonathan Rochkind
To: "solr-user@lucene.apache.org"
Sent: Wed, January 19, 2011 8:40:30 AM
Subject: Re: unix permission styles for access control

No. There is no built in way to address 'bits' in Solr that I am aware
of.  Instead you can think about how to transform your data at indexing
into individual tokens (rather than bits) in one or more field, such
that they are capable of answering your query.  Solr works in tokens as
the basic unit of operation (mostly, basically), not characters or bytes
or bits.

On 1/19/2011 9:48 AM, Dennis Gearon wrote:

Sorry for repeat, trying to make sure this gets on the newsgroup to 'all'.

So 'fieldName.x' is how to address bits?


   Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Toke Eskildsen
To: "solr-user@lucene.apache.org"
Sent: Wed, January 19, 2011 12:23:04 AM
Subject: Re: unix permission styles for access control

On Wed, 2011-01-19 at 08:15 +0100, Dennis Gearon wrote:

I was wondering if the are binary operation filters? Haven't seen any in the
book nor was able to find any using google.

So if I had 0600(octal) in a permission field, and I wanted to return any
records that 'permission&   0400(octal)==TRUE', how would I filter that?

Don't you mean permission&   0400(octal) == 0400? Anyway, the
functionality can be accomplished by extending your index a bit.


You could split the permission into user, group and all parts, then use
an expanded query.

If the permission is 0755 it will be indexed as
user_p:7 group_p:5 all_p:5

If you're searching for something with at least 0650 your query should
be expanded to
(user_p:7 OR user_p:6) AND (group_p:7 OR group_p:5)


Alternatively you could represent the bits explicitly in the index:
user_p:1 user_p:2 user_p:4 group_p:1 group_p:4 all_p:1 all_p:5

Then a search for 0650 would query with
user_p:2 AND user_p:4 AND group_p:1 AND group_p:4


Finally you could represent all valid permission values, still split
into parts with
user_p:1 user_p:2 user_p:3 user_p:4 user_p:5 user_p:6 user_p:7
group_p:1 group_p:2 group_p:3 group_p:4 group_p:5
all_p:1 all_p:2 all_p:3 all_p:4 all_p:5

The query would be simply
user_p:6 AND group_p:5

Re: Documentaion: For newbies and recent newbies

2011-01-19 Thread Markus Jelsma

That someone should just visit the wiki:
http://wiki.apache.org/solr/SolrResources

> If someone is looking for good documentation and getting started guides, I
> am putting this in the newsgroups to be searched upon. I recommend:
> 
> A/ The Wikis: (FREE)
>http://wiki.apache.org/solr/FrontPage
> 
> B/ The book and eBook: (COSTS $45.89)
>   https://www.packtpub.com/solr-1-4-enterprise-search-server/book
> 
> C/ The (seemingly) total reference guide:(FREE, with registration)
> 
> http://www.lucidimagination.com/software_downloads/certified/cdrg/lucidwork
> s-solr-refguide-1.4.pdf
> 
> 
> 
> D/ The webinar on optimizing the search engine to Do a GOOD search,
>  based on YOUR needs, not general ones: (FREE, with registration)
> 
> http://www.lucidimagination.com/Solutions/Webinars/Analyze-This-Tips-and-tr
> icks-getting-LuceneSolr-Analyzer-index-and-search-your-content
> 
> 
> Personally, I am working on being more than barely informed on items A & B
> :-)
> 
> Dennis Gearon
> 
> 
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make
> them yourself. from
> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> 
> EARTH has a Right To Life,
> otherwise we all die.

Re: Mem allocation - SOLR vs OS

2011-01-19 Thread Salman Akram

Actually we don't have much load on the server (like the usage currently is
quite low) but user queries are very complex e.g. long phrases/multiple
proximity/wildcard etc so I know these values need to be tried out but I
wanted to see whats the right 'start' so that I am not way off.

Also regarding Solr cores just to clarify they are totally different indexes
(not 2 parts of one index) so the queries on them are separate do you still
think its better to keep them on two cores?

Thanks a lot!

On Wed, Jan 19, 2011 at 9:43 PM, Erick Erickson wrote:

> You're better off using two cores on the same Solr instance rather than two
> instances of Tomcat, that way you avoid some overhead.
>
> The usual advice is to monitor the Solr caches, particularly for evictions
> and
> size the Solr caches accordingly. You can see these from the admin/stats
> page
> and also by mining the logs, looking particularly for cache evictions.
> Since
> cache
> usage is so dependent on the particular installation and usage pattern
> (particularly
> sorting and faceting), "general" advice is hard to give.
>
> Hope this helps
> Erick
>
> On Wed, Jan 19, 2011 at 2:25 AM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > In case it helps there are two SOLR indexes (160GB and 700GB) on the
> > machine.
> >
> > Also these are separate indexes and not shards so would it help to put
> them
> > on two separate Tomcat servers on same machine? This way I think one
> index
> > won't be affecting others cache.
> >
> > On Wed, Jan 19, 2011 at 12:00 PM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> > > Hi,
> > >
> > > I know this is a subjective topic but from what I have read it seems
> more
> > > RAM should be spared for OS caching and much less for SOLR/Tomcat even
> on
> > a
> > > dedicated SOLR server.
> > >
> > > Can someone give me an idea about the theoretically ideal proportion
> b/w
> > > them for a dedicated Windows server with 32GB RAM? Also the index is
> > updated
> > > every hour.
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> > >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>



-- 
Regards,

Salman Akram

Re: unix permission styles for access control

2011-01-19 Thread Dennis Gearon

Three-dimensional multi value sounds good.  Tough choice on character 
vs full-length words. Full length os easier & less confusing, but with 
hopefully millions pd documents in the future, it increasas index size.
Sent from Yahoo! Mail on Android

Re: How to index my users info

2011-01-19 Thread Markus Jelsma

http://lucene.apache.org/solr/#getstarted

> I would like to index the information of my employees to be able to get
> through some fields such as: e-mail, registration, ID, cell phone, name.
> 
> I am very new to SOLR and would like to know how to index these fields this
> way and how to search filtering by some of these fields.
> 
> Thanks in advance
> 
> Jota.

Re: Mem allocation - SOLR vs OS

2011-01-19 Thread Markus Jelsma

You only need so much for Solr so it can do its thing. Faceting can take quite 
some memory on a large index but sorting can be a really big RAM consumer.

As Erick pointed out, inspect and tune the cache settings and adjust RAM 
allocated to the JVM if required. Using tools like JConsole you can monitor 
various things via JMX including RAM consumption.

> Hi,
> 
> I know this is a subjective topic but from what I have read it seems more
> RAM should be spared for OS caching and much less for SOLR/Tomcat even on a
> dedicated SOLR server.
> 
> Can someone give me an idea about the theoretically ideal proportion b/w
> them for a dedicated Windows server with 32GB RAM? Also the index is
> updated every hour.

[Announce] Solr-RA, Solr with RankingAlgorithm

2011-01-19 Thread Nagendra Nagarajayya


Hi!

I would like to announce Solr-RA, Solr with RankingAlgorithm. Solr-RA 
uses the RankingAlgorithm, a new scoring and ranking algorithm instead 
of Lucene to rank the searches. Solr with RA seems to enable Solr 
searches to be comparable to Google site search results, and much better 
than Lucene (perl index). RankingAlgorithm still uses the Lucene index 
to read documents but ranks and scores on its own. There is no change to 
the existing Solr setup, so searches, faceting, highlighting, etc. 
should work as before.


You can get more information about Solr-RA from here: 
http://solr-ra.tgels.com 


You can find the comparison paper between Google site search and Lucene 
here: 
/http://solr-ra.tgels.com/docs/TestWithPerlOrgComparisonWithGoogleAndLucene.pdf/


You can try out a demo (search Wikipedia, Perl/Php/Python indexes) here: 
http://solr-ra.tgels.com/rankingsearch.jsp


Solr-RA has two modes, a Product mode and a Document mode for document 
searches. Document mode is for searching html, rich text (pdf/word, 
etc.), books, faq, forums, etc. while Product mode is for short text 
searches as in retail stores, ecommerce, etc.


You can download Solr-RA from here: http://solr-ra.tgels.com 
 (free)


You can download RankingAlgorithm from here: 
http://rankingalgorithm.tgels.com  
(free)


I would like you to try it (it is free) and provide us your valuable 
feedback. You can contact me at solr-ra at tgels.com or at the above 
email address or on twitter @solr_ra.



Sincerely,


- Nagendra Nagarajayya
solr-ra.tgels.com
rankingalgorithm.tgels.com

Re: facet or filter based on user's history

2011-01-19 Thread Markus Jelsma

Hi,

I've never seen Solr's behaviour with a huge amount of values in a multi 
valued but i think it should work alright. Then you can stored a list of user 
ID's along with each book document and user filter queries to include or 
exclude the book from the result set.

Cheers,

> Hi,
> 
> I'm looking for ideas on how to make an efficient facet query on a
> user's history with respect to the catalog of documents (something
> like "Read document already: yes / no"). The catalog is around 100k
> titles and there are several thousand users. Of course, each user has
> a different history, many having read fewer than 500 titles, but some
> heavy users having read perhaps 50k titles.
> 
> Performance is not terribly important right now so all I did was bump
> up the boolean query limit and put together a big string of document
> id's that the user has read. The first query is slow but once it's in
> the query cache it's fine. I would like to find a better way of doing
> it though.
> 
> What type of solr plugin would be best suited to helping in this
> situation? I could make a function plugin that provides something like
> hasHadBefore() - true/false, but would that be efficient for faceting
> and filtering? Another idea is a QParserPlugin that looks for a field
> like hasHadBefore:userid and somehow substitutes in the list of docs.
> But I'm not sure how a new parser plugin would interact with the
> existing parser. Can solr use a parser plugin to only handle one
> field, and leave all the other fields to the default parser?
> 
> Thanks,
> Jon

Re: Mem allocation - SOLR vs OS

2011-01-19 Thread Salman Akram

We do have sorting but not faceting. OK so I guess there is no 'hard and
fast rule' as such so I will play with it and see.

Thanks for the help

On Wed, Jan 19, 2011 at 11:48 PM, Markus Jelsma
wrote:

> You only need so much for Solr so it can do its thing. Faceting can take
> quite
> some memory on a large index but sorting can be a really big RAM consumer.
>
> As Erick pointed out, inspect and tune the cache settings and adjust RAM
> allocated to the JVM if required. Using tools like JConsole you can monitor
> various things via JMX including RAM consumption.
>
> > Hi,
> >
> > I know this is a subjective topic but from what I have read it seems more
> > RAM should be spared for OS caching and much less for SOLR/Tomcat even on
> a
> > dedicated SOLR server.
> >
> > Can someone give me an idea about the theoretically ideal proportion b/w
> > them for a dedicated Windows server with 32GB RAM? Also the index is
> > updated every hour.
>



-- 
Regards,

Salman Akram

Re: No system property or default value specified for...

2011-01-19 Thread Markus Jelsma

Hi,

I'm unsure if i completely understand but you first had the error for 
local.code and then set the property in solr.xml? Then of course it will give 
an error for the next undefined property that has no default set.

If you use a property without default it _must_ be defined in solr.xml or 
solrcore.properties. And since you don't use defaults in your dataconfig they 
all must be explicitely defined.

This is proper behaviour.

Cheers,

> I'm trying to dynamically add a core to a multi core system using the
> following command:
> 
> http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceDir
> =items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=true
> 
> the data-config.xml looks like this:
> 
> 
>   url="jdbc:mysql://localhost/"
>...
>name="server"/>
>   
>query="select code from master.locals"
>rootEntity="false">
>  query="select '${local.code}' as localcode,
> items.*
> FROM ${local.code}_meta.item
> WHERE
>   item.lastmodified > '${dataimporter.last_index_time}'
> OR
>   '${dataimporter.request.clean}' != 'false'
> order by item.objid"
> />
> 
> 
> 
> 
> this same configuration works for a core that is already imported into the
> system, but when trying to add the core with the above command I get the
> following error:
> 
> No system property or default value specified for local.code
> 
> so I added a  tag in the solr.xml figuring that it needed some
> type of default value for this to work, then I restarted solr, but now when
> I try the import I get:
> 
> No system property or default value specified for
> dataimporter.last_index_time
> 
> Do I have to define a default value for every variable I will conceivably
> use for future cores? is there a way to bypass this error?
> 
> Thanks in advance

dataDir in solr.xml

2011-01-19 Thread Fred Gilmore

I've checked the archive, and plenty of people have suggested an 
arrangement where you can have two cores which share a configuration but 
maintain separate data paths.  But I can't seem to get solr to stop 
thinking solrconfig.xml is the first and last word for any value 
regarding data.  I am running 1.4


solr.xml:

persistent="true">








.
.
.

In all other respects, my multicore setup is working as it should.  So 
the setup is finding solr.xml at the value set for solr home as it 
should.  I can get into admin, etc.  However, if I comment out the 
 stanza in cores/staff/conf/solrconfig.xml, and restart, I just 
get this:


WARNING: [staff] Solr index directory 
'/usr/local/solr/cores/staff/data/index' doesn't exist. Creating new 
index...


Ignoring the value set in solr.xml.

Is there some other override I'm ignoring?

thanks,

Fred

Re: Mem allocation - SOLR vs OS

2011-01-19 Thread Markus Jelsma

Sorting on field X will build an array of the size of maxDoc. The data type 
equals the one used by the field you're sorting on. Also, if you have a very 
high amount of deletes per update it might be a good idea to optimize as well 
since it reduces maxDoc to the number of documents that actually can be found.

> We do have sorting but not faceting. OK so I guess there is no 'hard and
> fast rule' as such so I will play with it and see.
> 
> Thanks for the help
> 
> On Wed, Jan 19, 2011 at 11:48 PM, Markus Jelsma
> 
> wrote:
> > You only need so much for Solr so it can do its thing. Faceting can take
> > quite
> > some memory on a large index but sorting can be a really big RAM
> > consumer.
> > 
> > As Erick pointed out, inspect and tune the cache settings and adjust RAM
> > allocated to the JVM if required. Using tools like JConsole you can
> > monitor various things via JMX including RAM consumption.
> > 
> > > Hi,
> > > 
> > > I know this is a subjective topic but from what I have read it seems
> > > more RAM should be spared for OS caching and much less for SOLR/Tomcat
> > > even on
> > 
> > a
> > 
> > > dedicated SOLR server.
> > > 
> > > Can someone give me an idea about the theoretically ideal proportion
> > > b/w them for a dedicated Windows server with 32GB RAM? Also the index
> > > is updated every hour.

Re: dataDir in solr.xml

2011-01-19 Thread Markus Jelsma

You have set the property already but i haven't seen you use that same 
property for the dataDir setting in solrconfig.

> I've checked the archive, and plenty of people have suggested an
> arrangement where you can have two cores which share a configuration but
> maintain separate data paths.  But I can't seem to get solr to stop
> thinking solrconfig.xml is the first and last word for any value
> regarding data.  I am running 1.4
> 
> solr.xml:
> 
>  persistent="true">
> 
> 
> 
> 
> 
> 
> 
> .
> .
> .
> 
> In all other respects, my multicore setup is working as it should.  So
> the setup is finding solr.xml at the value set for solr home as it
> should.  I can get into admin, etc.  However, if I comment out the
>  stanza in cores/staff/conf/solrconfig.xml, and restart, I just
> get this:
> 
> WARNING: [staff] Solr index directory
> '/usr/local/solr/cores/staff/data/index' doesn't exist. Creating new
> index...
> 
> Ignoring the value set in solr.xml.
> 
> Is there some other override I'm ignoring?
> 
> thanks,
> 
> Fred

Adding metadata to a Solr schema

2011-01-19 Thread David McLaughlin

Hi,

I need to add some meta data to a schema file in Solr - such a version and
current transaction id. I need to be able to query Solr to get this
information. What would be the best way to do this?

Thanks,
David

Re: facet or filter based on user's history

2011-01-19 Thread Jonathan Rochkind

The problem is going to be 'near real time' indexing issues.  Solr 1.4 
at least does not do a very good job of handling very frequent commits. 
If you want to add to the user's history in the Solr index ever time 
they click the button, and they click the button a lot, and this 
naturally leads to commits very frequent commits to Solr (every minute, 
every second, multiple times a second), you're going to have RAM and 
performance problems.


I believe there are some things in trunk that make handling this better, 
don't know the details but "near real time search" is what people talk 
about, to google or ask on this list.


Or, if it's acceptable for your requirements, you could record all the 
"I've read this" clicks in an external store, and only add them to the 
Solr index nightly, or even hourly.  If you batch em and add em as 
frequently as you can get away with (every hour sure, every 10 minutes 
pushing it, every minute, no), you can get around that issue. Or for 
that matter you could ADD em to Solr but only 'commit' every hour or 
whatever, but I don't like that strategy since if Solr crashes or 
otherwise restarts you pretty much lose those pending commits, better to 
queue em up in an external store.


On 1/19/2011 1:52 PM, Markus Jelsma wrote:

Hi,

I've never seen Solr's behaviour with a huge amount of values in a multi
valued but i think it should work alright. Then you can stored a list of user
ID's along with each book document and user filter queries to include or
exclude the book from the result set.

Cheers,


Hi,

I'm looking for ideas on how to make an efficient facet query on a
user's history with respect to the catalog of documents (something
like "Read document already: yes / no"). The catalog is around 100k
titles and there are several thousand users. Of course, each user has
a different history, many having read fewer than 500 titles, but some
heavy users having read perhaps 50k titles.

Performance is not terribly important right now so all I did was bump
up the boolean query limit and put together a big string of document
id's that the user has read. The first query is slow but once it's in
the query cache it's fine. I would like to find a better way of doing
it though.

What type of solr plugin would be best suited to helping in this
situation? I could make a function plugin that provides something like
hasHadBefore() - true/false, but would that be efficient for faceting
and filtering? Another idea is a QParserPlugin that looks for a field
like hasHadBefore:userid and somehow substitutes in the list of docs.
But I'm not sure how a new parser plugin would interact with the
existing parser. Can solr use a parser plugin to only handle one
field, and leave all the other fields to the default parser?

Thanks,
Jon

performance during index switch

2011-01-19 Thread Tri Nguyen

Hi,
 
Are there performance issues during the index switch?
 
As the size of index gets bigger, response time slows down?  Are there any 
studies on this?
 
Thanks,
 
Tri

Re: performance during index switch

2011-01-19 Thread Jonathan Rochkind


During commit?

A commit (and especially an optimize) can be expensive in terms of both 
CPU and RAM as your index grows larger, leaving less CPU for querying, 
and possibly less RAM which can cause Java GC slowdowns in some cases.


A common suggestion is to use Solr replication to seperate out a Solr 
index that you index to, and then replicate to a slave index that 
actually serves your queries. This should minimize any performance 
problems on your 'live' Solr while indexing, although there's still 
something that has to be done for the actual replication of course. 
Haven't tried it yet myself.  Plan to -- my plan is actually to put them 
both on the same server (I've only got one), but in seperate JVMs, and 
on a server with enough CPU cores that hopefully the indexing won't 
steal CPU the querying needs.


On 1/19/2011 2:23 PM, Tri Nguyen wrote:

Hi,
  
Are there performance issues during the index switch?
  
As the size of index gets bigger, response time slows down?  Are there any studies on this?
  
Thanks,
  
Tri

Re: No system property or default value specified for...

2011-01-19 Thread Tanner Postert

i even have to define default values for the dataimport.delta values? that
doesn't seem right

On Wed, Jan 19, 2011 at 11:57 AM, Markus Jelsma
wrote:

> Hi,
>
> I'm unsure if i completely understand but you first had the error for
> local.code and then set the property in solr.xml? Then of course it will
> give
> an error for the next undefined property that has no default set.
>
> If you use a property without default it _must_ be defined in solr.xml or
> solrcore.properties. And since you don't use defaults in your dataconfig
> they
> all must be explicitely defined.
>
> This is proper behaviour.
>
> Cheers,
>
> > I'm trying to dynamically add a core to a multi core system using the
> > following command:
> >
> >
> http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceDir
> > =items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=true
> >
> > the data-config.xml looks like this:
> >
> > 
> >>url="jdbc:mysql://localhost/"
> >...
> >name="server"/>
> >   
> > >query="select code from master.locals"
> >rootEntity="false">
> >  > query="select '${local.code}' as localcode,
> > items.*
> > FROM ${local.code}_meta.item
> > WHERE
> >   item.lastmodified > '${dataimporter.last_index_time}'
> > OR
> >   '${dataimporter.request.clean}' != 'false'
> > order by item.objid"
> > />
> > 
> > 
> > 
> >
> > this same configuration works for a core that is already imported into
> the
> > system, but when trying to add the core with the above command I get the
> > following error:
> >
> > No system property or default value specified for local.code
> >
> > so I added a  tag in the solr.xml figuring that it needed some
> > type of default value for this to work, then I restarted solr, but now
> when
> > I try the import I get:
> >
> > No system property or default value specified for
> > dataimporter.last_index_time
> >
> > Do I have to define a default value for every variable I will conceivably
> > use for future cores? is there a way to bypass this error?
> >
> > Thanks in advance
>

Re: performance during index switch

2011-01-19 Thread Markus Jelsma

> Hi,
>  
> Are there performance issues during the index switch?

What do you mean by index switch?

>  
> As the size of index gets bigger, response time slows down?  Are there any
> studies on this? 

I haven't seen any studies as of yet but response time will slow down for some 
components. Sorting and faceting will tend to consume more RAM and CPU cycles 
with the increase of documents and unique values. It also becomes increasingly 
slow if you query for very high start values. And, of course, cache warming 
queries usually take some more time as well increasing latency between commit 
and availability.

> Thanks,
>  
> Tri

Re: No system property or default value specified for...

2011-01-19 Thread Markus Jelsma

No, you only need defaults if you use properties that are not defined in 
solr.xml or solrcore.properties.

What would the value for local.core be if you don't define it anyway and you 
don't specify a default? Quite unpredictable i gues =)

> i even have to define default values for the dataimport.delta values? that
> doesn't seem right
> 
> On Wed, Jan 19, 2011 at 11:57 AM, Markus Jelsma
> 
> wrote:
> > Hi,
> > 
> > I'm unsure if i completely understand but you first had the error for
> > local.code and then set the property in solr.xml? Then of course it will
> > give
> > an error for the next undefined property that has no default set.
> > 
> > If you use a property without default it _must_ be defined in solr.xml or
> > solrcore.properties. And since you don't use defaults in your dataconfig
> > they
> > all must be explicitely defined.
> > 
> > This is proper behaviour.
> > 
> > Cheers,
> > 
> > > I'm trying to dynamically add a core to a multi core system using the
> > 
> > > following command:
> > http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceD
> > ir
> > 
> > > =items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=tr
> > > ue
> > > 
> > > the data-config.xml looks like this:
> > > 
> > > 
> > > 
> > >> >   
> > >url="jdbc:mysql://localhost/"
> > >...
> > >name="server"/>
> > >   
> > >   
> > >   
> > > > >
> > >query="select code from master.locals"
> > >rootEntity="false">
> > > 
> > >  > > 
> > > query="select '${local.code}' as localcode,
> > > items.*
> > > 
> > > FROM ${local.code}_meta.item
> > > WHERE
> > > 
> > >   item.lastmodified > '${dataimporter.last_index_time}'
> > > 
> > > OR
> > > 
> > >   '${dataimporter.request.clean}' != 'false'
> > > 
> > > order by item.objid"
> > > />
> > > 
> > > 
> > > 
> > > 
> > > this same configuration works for a core that is already imported into
> > 
> > the
> > 
> > > system, but when trying to add the core with the above command I get
> > > the following error:
> > > 
> > > No system property or default value specified for local.code
> > > 
> > > so I added a  tag in the solr.xml figuring that it needed
> > > some type of default value for this to work, then I restarted solr,
> > > but now
> > 
> > when
> > 
> > > I try the import I get:
> > > 
> > > No system property or default value specified for
> > > dataimporter.last_index_time
> > > 
> > > Do I have to define a default value for every variable I will
> > > conceivably use for future cores? is there a way to bypass this error?
> > > 
> > > Thanks in advance

Re: Adding metadata to a Solr schema

2011-01-19 Thread Otis Gospodnetic

David,

I'm not sure if you are asking about adding this to the schema.xml file or to 
the Solr schema and therefore the Solr index?
If the former, you could put it in comments, then get the schema via HTTP (see 
Admin UI for the URL), and "grep" for your line from there.
If the latter, this sounds like 2 fields.  Not sure if every document would 
have 
them, of you just need 1 doc with this data...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: David McLaughlin 
> To: solr-user@lucene.apache.org
> Sent: Wed, January 19, 2011 2:13:30 PM
> Subject: Adding metadata to a Solr schema
> 
> Hi,
> 
> I need to add some meta data to a schema file in Solr - such a  version and
> current transaction id. I need to be able to query Solr to get  this
> information. What would be the best way to do  this?
> 
> Thanks,
> David
>

Re: No system property or default value specified for...

2011-01-19 Thread Tanner Postert

there error I am getting is that I have no default value
for ${dataimporter.last_index_time}

should I just define -00-00 00:00:00 as the default for that field?

On Wed, Jan 19, 2011 at 12:45 PM, Markus Jelsma
wrote:

> No, you only need defaults if you use properties that are not defined in
> solr.xml or solrcore.properties.
>
> What would the value for local.core be if you don't define it anyway and
> you
> don't specify a default? Quite unpredictable i gues =)
>
> > i even have to define default values for the dataimport.delta values?
> that
> > doesn't seem right
> >
> > On Wed, Jan 19, 2011 at 11:57 AM, Markus Jelsma
> >
> > wrote:
> > > Hi,
> > >
> > > I'm unsure if i completely understand but you first had the error for
> > > local.code and then set the property in solr.xml? Then of course it
> will
> > > give
> > > an error for the next undefined property that has no default set.
> > >
> > > If you use a property without default it _must_ be defined in solr.xml
> or
> > > solrcore.properties. And since you don't use defaults in your
> dataconfig
> > > they
> > > all must be explicitely defined.
> > >
> > > This is proper behaviour.
> > >
> > > Cheers,
> > >
> > > > I'm trying to dynamically add a core to a multi core system using the
> > >
> > > > following command:
> > >
> http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceD
> > > ir
> > >
> > > >
> =items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=tr
> > > > ue
> > > >
> > > > the data-config.xml looks like this:
> > > >
> > > > 
> > > >
> > > >> > >
> > > >url="jdbc:mysql://localhost/"
> > > >...
> > > >name="server"/>
> > > >
> > > >   
> > > >
> > > > > > >
> > > >query="select code from master.locals"
> > > >rootEntity="false">
> > > >
> > > >  > > >
> > > > query="select '${local.code}' as localcode,
> > > > items.*
> > > >
> > > > FROM ${local.code}_meta.item
> > > > WHERE
> > > >
> > > >   item.lastmodified > '${dataimporter.last_index_time}'
> > > >
> > > > OR
> > > >
> > > >   '${dataimporter.request.clean}' != 'false'
> > > >
> > > > order by item.objid"
> > > > />
> > > > 
> > > > 
> > > > 
> > > >
> > > > this same configuration works for a core that is already imported
> into
> > >
> > > the
> > >
> > > > system, but when trying to add the core with the above command I get
> > > > the following error:
> > > >
> > > > No system property or default value specified for local.code
> > > >
> > > > so I added a  tag in the solr.xml figuring that it needed
> > > > some type of default value for this to work, then I restarted solr,
> > > > but now
> > >
> > > when
> > >
> > > > I try the import I get:
> > > >
> > > > No system property or default value specified for
> > > > dataimporter.last_index_time
> > > >
> > > > Do I have to define a default value for every variable I will
> > > > conceivably use for future cores? is there a way to bypass this
> error?
> > > >
> > > > Thanks in advance
>

Re: performance during index switch

2011-01-19 Thread Tri Nguyen

Yes, during a commit.

I'm planning to do as you suggested, having a master do the indexing and 
replicating the index to a slave which leads to my next questions.

During the slave replicates the index files from the master, how does it impact 
performance on the slave?

Tri

--- On Wed, 1/19/11, Jonathan Rochkind  wrote:

From: Jonathan Rochkind 
Subject: Re: performance during index switch
To: "solr-user@lucene.apache.org" 
Date: Wednesday, January 19, 2011, 11:30 AM

During commit?

A commit (and especially an optimize) can be expensive in terms of both CPU and 
RAM as your index grows larger, leaving less CPU for querying, and possibly 
less RAM which can cause Java GC slowdowns in some cases.

A common suggestion is to use Solr replication to seperate out a Solr index 
that you index to, and then replicate to a slave index that actually serves 
your queries. This should minimize any performance problems on your 'live' Solr 
while indexing, although there's still something that has to be done for the 
actual replication of course. Haven't tried it yet myself.  Plan to -- my plan 
is actually to put them both on the same server (I've only got one), but in 
seperate JVMs, and on a server with enough CPU cores that hopefully the 
indexing won't steal CPU the querying needs.

On 1/19/2011 2:23 PM, Tri Nguyen wrote:
> Hi,
>   Are there performance issues during the index switch?
>   As the size of index gets bigger, response time slows down?  Are there any 
>studies on this?
>   Thanks,
>   Tri

Re: No system property or default value specified for...

2011-01-19 Thread Markus Jelsma

Ok, have you defined dataimporter.last_index_time in solr.xml or 
solrcore.properties? If not, then you can either define the default value or 
set it in solrcore.properties or solr.xml.

Maybe a catch up on the wiki clears things up:
http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution

> there error I am getting is that I have no default value
> for ${dataimporter.last_index_time}
> 
> should I just define -00-00 00:00:00 as the default for that field?
> 
> On Wed, Jan 19, 2011 at 12:45 PM, Markus Jelsma
> 
> wrote:
> > No, you only need defaults if you use properties that are not defined in
> > solr.xml or solrcore.properties.
> > 
> > What would the value for local.core be if you don't define it anyway and
> > you
> > don't specify a default? Quite unpredictable i gues =)
> > 
> > > i even have to define default values for the dataimport.delta values?
> > 
> > that
> > 
> > > doesn't seem right
> > > 
> > > On Wed, Jan 19, 2011 at 11:57 AM, Markus Jelsma
> > > 
> > > wrote:
> > > > Hi,
> > > > 
> > > > I'm unsure if i completely understand but you first had the error for
> > > > local.code and then set the property in solr.xml? Then of course it
> > 
> > will
> > 
> > > > give
> > > > an error for the next undefined property that has no default set.
> > > > 
> > > > If you use a property without default it _must_ be defined in
> > > > solr.xml
> > 
> > or
> > 
> > > > solrcore.properties. And since you don't use defaults in your
> > 
> > dataconfig
> > 
> > > > they
> > > > all must be explicitely defined.
> > > > 
> > > > This is proper behaviour.
> > > > 
> > > > Cheers,
> > > > 
> > > > > I'm trying to dynamically add a core to a multi core system using
> > > > > the
> > 
> > > > > following command:
> > http://localhost:8983/solr/admin/cores?action=CREATE&name=items&instanceD
> > 
> > > > ir
> > 
> > =items&config=data-config.xml&schema=schema.xml&dataDir=data&persist=tr
> > 
> > > > > ue
> > > > > 
> > > > > the data-config.xml looks like this:
> > > > > 
> > > > > 
> > > > > 
> > > > >> > > >   
> > > > >url="jdbc:mysql://localhost/"
> > > > >...
> > > > >name="server"/>
> > > > >   
> > > > >   
> > > > >   
> > > > > > > > >
> > > > >query="select code from master.locals"
> > > > >rootEntity="false">
> > > > > 
> > > > >  > > > > 
> > > > > query="select '${local.code}' as localcode,
> > > > > items.*
> > > > > 
> > > > > FROM ${local.code}_meta.item
> > > > > WHERE
> > > > > 
> > > > >   item.lastmodified > '${dataimporter.last_index_time}'
> > > > > 
> > > > > OR
> > > > > 
> > > > >   '${dataimporter.request.clean}' != 'false'
> > > > > 
> > > > > order by item.objid"
> > > > > />
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > this same configuration works for a core that is already imported
> > 
> > into
> > 
> > > > the
> > > > 
> > > > > system, but when trying to add the core with the above command I
> > > > > get the following error:
> > > > > 
> > > > > No system property or default value specified for local.code
> > > > > 
> > > > > so I added a  tag in the solr.xml figuring that it
> > > > > needed some type of default value for this to work, then I
> > > > > restarted solr, but now
> > > > 
> > > > when
> > > > 
> > > > > I try the import I get:
> > > > > 
> > > > > No system property or default value specified for
> > > > > dataimporter.last_index_time
> > > > > 
> > > > > Do I have to define a default value for every variable I will
> > > > > conceivably use for future cores? is there a way to bypass this
> > 
> > error?
> > 
> > > > > Thanks in advance

SolrCloud Feedback

2011-01-19 Thread Mark Miller

Hello Users,

About a little over a year ago, a few of us started working on what we called 
SolrCloud.

This initial bit of work was really a combination of laying some base work - 
figuring out how to integrate ZooKeeper with Solr in a limited way, dealing 
with some infrastructure - and picking off some low hanging search side fruit.

The next step is the indexing side. And we plan on starting to tackle that 
sometime soon.

But first - could you help with some feedback?ISome people are using our 
SolrCloud start - I have seen evidence of it ;) Some, even in production.

I would love to have your help in targeting what we now try and improve. Any 
suggestions or feedback? If you have sent this before, I/others likely missed 
it - send it again!

I know anyone that has used SolrCloud has some feedback. I know it because I've 
used it too ;) It's too complicated to setup still. There are still plenty of 
pain points. We accepted some compromise trying to fit into what Solr was, and 
not wanting to dig in too far before feeling things out and letting users try 
things out a bit. Thinking that we might be able to adjust Solr to be more in 
favor of SolrCloud as we go, what is the ideal state of the work we have 
currently done?

If anyone using SolrCloud helps with the feedback, I'll help with the coding 
effort.

- Mark Miller
-- lucidimagination.com

Re: performance during index switch

2011-01-19 Thread Otis Gospodnetic

Tri,

During replication:
* extra disk IO on slaves during replication - worst if you are replicating an 
optimized index, which can hurt if your index is not RAM resident
* the above will consume some of your OS buffer cache, which can hurt
* increased network usage - never seen this becoming a real problem, but if you 
are replicating a large and always optimized index, it might cause problems

After replication:
* potentially high CPU usage during the warmup of the new IndexSearcher, 
depending on warmup queries used, cache warmup settings, etc.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Tri Nguyen 
> To: solr-user@lucene.apache.org
> Sent: Wed, January 19, 2011 2:56:58 PM
> Subject: Re: performance during index switch
> 
> Yes, during a commit.
>  
> I'm planning to do as you suggested, having a  master do the indexing and 
>replicating the index to a slave which leads to my  next questions.
>  
> During the slave replicates the index files from the  master, how does it 
>impact performance on the slave?
>  
> Tri
> 
> 
> ---  On Wed, 1/19/11, Jonathan Rochkind  wrote:
> 
> 
> From:  Jonathan Rochkind 
> Subject: Re:  performance during index switch
> To: "solr-user@lucene.apache.org"  
> Date:  Wednesday, January 19, 2011, 11:30 AM
> 
> 
> During commit?
> 
> A commit  (and especially an optimize) can be expensive in terms of both CPU 
>and RAM as  your index grows larger, leaving less CPU for querying, and 
>possibly 
>less RAM  which can cause Java GC slowdowns in some cases.
> 
> A common suggestion is  to use Solr replication to seperate out a Solr index 
>that you index to, and then  replicate to a slave index that actually serves 
>your queries. This should  minimize any performance problems on your 'live' 
>Solr 
>while indexing, although  there's still something that has to be done for the 
>actual replication of  course. Haven't tried it yet myself.  Plan to -- my 
>plan 
>is actually to put them  both on the same server (I've only got one), but in 
>seperate JVMs, and on a  server with enough CPU cores that hopefully the 
>indexing won't steal CPU the  querying needs.
> 
> On 1/19/2011 2:23 PM, Tri Nguyen wrote:
> >  Hi,
> >   Are there performance issues during the index switch?
> >   As  the size of index gets bigger, response time slows down?  Are there 
> > any 
>studies  on this?
> >   Thanks,
> >   Tri
>

Re: performance during index switch

2011-01-19 Thread Jonathan Rochkind

On 1/19/2011 2:56 PM, Tri Nguyen wrote:

Yes, during a commit.

I'm planning to do as you suggested, having a master do the indexing and replicating the index to a slave which leads to my next questions.

During the slave replicates the index files from the master, how does it impact performance on the slave?

That I am not certain, because I haven't done it yet myself, but I am
optimistic it will be tolerable.

As with any commit, when the slave replicates it will temporarily make a
second copy of any changed index files (possibly the whole index), and
it will then set up new searchers on the new copy of the index, and it
will warm that new index, and then once warmed, it'll switch live
searches over to the new index, and delete any old copies of indexes.

So you may still need a bunch of 'extra' RAM in the JVM to accomodate
that overlap period. You will need some extra diskspace. But the actual
CPU I mean, it will take some CPU for the slave to run the new
warmers, but it should be tolerable not very noticeable... I'm hoping.

One main benefit of the replication setup is that you can _optimize_ on
the master, which will be completely out of the way of the slave.

Even with the replication setup, you still can't commit (ie pull down
changes from master) "near real time" in 1.4 though, you can't commit so
often that a new index is not done warming when a new commit comes in,
or your Solr will grind to a halt as it uses too much CPU and RAM. There
are various ways people have suggested you can try to work around this,
but I havne't been too happy with any of em, I think it's best just not
to commit/pull down changes from master that often. Unless you REALLY
need to, and are prepared to get into details of Solr to figure out how
to make it work as well as it can.

Re: Adding metadata to a Solr schema

2011-01-19 Thread David McLaughlin

Thanks Otis, yes it is the former and it definitely solves my problem for my
static metadata.

I realise now though that for dynamic values like transaction id, I probably
need a different method for storing this metadata though. Is there a
standard way of adding metadata to a Solr core and being able to set/get
this data at runtime?

Thanks,
David

On Wed, Jan 19, 2011 at 8:56 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> David,
>
> I'm not sure if you are asking about adding this to the schema.xml file or
> to
> the Solr schema and therefore the Solr index?
> If the former, you could put it in comments, then get the schema via HTTP
> (see
> Admin UI for the URL), and "grep" for your line from there.
> If the latter, this sounds like 2 fields.  Not sure if every document would
> have
> them, of you just need 1 doc with this data...
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: David McLaughlin 
> > To: solr-user@lucene.apache.org
> > Sent: Wed, January 19, 2011 2:13:30 PM
> > Subject: Adding metadata to a Solr schema
> >
> > Hi,
> >
> > I need to add some meta data to a schema file in Solr - such a  version
> and
> > current transaction id. I need to be able to query Solr to get  this
> > information. What would be the best way to do  this?
> >
> > Thanks,
> > David
> >
>

Re: Solr with many indexes

2011-01-19 Thread Joscha Feth

Hello Erick,

Thanks for your answer!

But I question why you *require* many different indexes. [...] including
> isolating one
> users'
> data from all others, [...]


Yes, thats exactly what I am after - I need to make sure that indexes don't
mix, as every user shall only be able to query his own data (index).

And even using lots of cores can be made to work if you don't pre-warm
> newly-opened
> cores, assuming that the response time when using "cold searchers" is
> adequate.
>

Could you explain that further or point me to some documentation? Are you
talking about: http://wiki.apache.org/solr/CoreAdmin#UNLOAD? if yes, LOAD
does not seem to be implemented, yet. Or has this something to do with
http://wiki.apache.org/solr/SolrCaching#autowarmCount only? About what time
per X documents are we talking here for delay if auto warming is disabled?
Is there more documentation about this setting?

Kind regards,
Joscha

Which QueryParser to use

2011-01-19 Thread kun xiong

Hi all
We are planning to move our search core from Lucene library to Solr, and
we are new here.

 We have a question :which parser we should choose?

Our original query for Lucene is kinda of complicated
Ex: *+((name1:A name2:B)^1000  (category1:C ^100 category:D ^10) ^100)
+(location1:E location2:F location3:G)~2*

Does the *dismax *query parser can handle this case, what's the alternative?

Or we can still use the *lucene *query parser without
setMinimumNumberShouldMatch,
which is not involved in lucene query parser.

Thanks

Kun

Re: Which QueryParser to use

2011-01-19 Thread Ahmet Arslan

> Hi all
>     We are planning to move our search core from
> Lucene library to Solr, and
> we are new here.
> 
>  We have a question :which parser we should choose?
> 
> Our original query for Lucene is kinda of complicated
> Ex: *+((name1:A name2:B)^1000  (category1:C ^100
> category:D ^10) ^100)
> +(location1:E location2:F location3:G)~2*
> 
> Does the *dismax *query parser can handle this case, what's
> the alternative?
> 
> Or we can still use the *lucene *query parser without
> setMinimumNumberShouldMatch,
> which is not involved in lucene query parser.

As I understand you were constructing your queries programmatically, without 
using Lucene's QueryParser, right? If yes how were you handling analysis of 
query terms? Can you tell the types of these fields (location,name)?

Re: using dismax

2011-01-19 Thread Grijesh


Markus,

Its not wt its qt, wt for response type,
Also qt is not for Query Parser its for Request Handler ,In solrconfig.xml
there are many Request Handlers can be Defined using "dismax" Query Parser
Or Using "lucene" Query Parser.

If you want to change Query parser then its "defType"  parameter for
defining Query Parser .
And you are right if defType=dismax ,then there must be "qf" parameter to be
given.

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/using-dismax-tp2280270p2292908.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which QueryParser to use

2011-01-19 Thread kun xiong

We construct our query by Lucene API before, as BooleanQuery, TermQuery
those kind of things.

The string I provided is value from Query.toString() methord. Type are all
String.

2011/1/20 Ahmet Arslan 

> > Hi all
> > We are planning to move our search core from
> > Lucene library to Solr, and
> > we are new here.
> >
> >  We have a question :which parser we should choose?
> >
> > Our original query for Lucene is kinda of complicated
> > Ex: *+((name1:A name2:B)^1000  (category1:C ^100
> > category:D ^10) ^100)
> > +(location1:E location2:F location3:G)~2*
> >
> > Does the *dismax *query parser can handle this case, what's
> > the alternative?
> >
> > Or we can still use the *lucene *query parser without
> > setMinimumNumberShouldMatch,
> > which is not involved in lucene query parser.
>
> As I understand you were constructing your queries programmatically,
> without using Lucene's QueryParser, right? If yes how were you handling
> analysis of query terms? Can you tell the types of these fields
> (location,name)?
>
>
>
>

Re: SolrCloud Feedback

2011-01-19 Thread Grijesh


Hi Mark,

I was just working on SolrCloud for my R&D and I got a question in my Mind.
Since in SolrCloud the configuration files are being shared on all Cloud
instances and If I have different configuration files for different cores
then how can I manage it by my Zookeeper managed SolrCloud.

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Feedback-tp2290048p2292933.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Out of Memory Error

2011-01-19 Thread Grijesh


By adding more server means add more searchers (slaves) on Load balancer not
talking about sharding.

Sharding is required when your index size will increase the size of about
50GB.

-
Thanx:
Grijesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Out-of-Memory-Error-tp2280037p2292944.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which QueryParser to use

2011-01-19 Thread Lalit Kumar 4

 
Sent on my BlackBerry® from Vodafone

-Original Message-
From: Ahmet Arslan 
Date: Thu, 20 Jan 2011 10:43:46 
To: solr-user@lucene.apache.org
Reply-To: "solr-user@lucene.apache.org" 
Subject: Re: Which QueryParser to use

> Hi all
>     We are planning to move our search core from
> Lucene library to Solr, and
> we are new here.
> 
>  We have a question :which parser we should choose?
> 
> Our original query for Lucene is kinda of complicated
> Ex: *+((name1:A name2:B)^1000  (category1:C ^100
> category:D ^10) ^100)
> +(location1:E location2:F location3:G)~2*
> 
> Does the *dismax *query parser can handle this case, what's
> the alternative?
> 
> Or we can still use the *lucene *query parser without
> setMinimumNumberShouldMatch,
> which is not involved in lucene query parser.

As I understand you were constructing your queries programmatically, without 
using Lucene's QueryParser, right? If yes how were you handling analysis of 
query terms? Can you tell the types of these fields (location,name)?

Re: Which QueryParser to use

2011-01-19 Thread kun xiong

Thar example string means our query is BooleanQuery containing
BooleanQuerys.

I am wondering how to write a complicated BooleanQuery for dismax, like (A
or B or C) and (D or E)

Or I have to use Lucene query parser.

2011/1/20 Lalit Kumar 4 

>
> Sent on my BlackBerry® from Vodafone
>
> -Original Message-
> From: Ahmet Arslan 
> Date: Thu, 20 Jan 2011 10:43:46
> To: solr-user@lucene.apache.org
> Reply-To: "solr-user@lucene.apache.org" 
> Subject: Re: Which QueryParser to use
>
> > Hi all
> > We are planning to move our search core from
> > Lucene library to Solr, and
> > we are new here.
> >
> >  We have a question :which parser we should choose?
> >
> > Our original query for Lucene is kinda of complicated
> > Ex: *+((name1:A name2:B)^1000  (category1:C ^100
> > category:D ^10) ^100)
> > +(location1:E location2:F location3:G)~2*
> >
> > Does the *dismax *query parser can handle this case, what's
> > the alternative?
> >
> > Or we can still use the *lucene *query parser without
> > setMinimumNumberShouldMatch,
> > which is not involved in lucene query parser.
>
> As I understand you were constructing your queries programmatically,
> without using Lucene's QueryParser, right? If yes how were you handling
> analysis of query terms? Can you tell the types of these fields
> (location,name)?
>
>
>
>

82 matches

Mail list logo