date:20090929

Solr terms search vs MySql FULLTEXT index and AGAINST

2009-09-29 Thread Vikash Kontia


I am using Solr terms for auto suggest and I have 4 millions document in
index and Its working fine. I want to know which will be more faster and
efficient  from 'MySql FULLTEXT index and AGAINST'  and Solr terms search.
Or Is there any other way in solr for auto suggest. I have separate
application server and solr server. So I cannot cross domain request from
browser at application server for JSON response.
Ref.

http://yuilibrary.com/forum/viewtopic.php?p=3203

-- 
View this message in context: 
http://www.nabble.com/Solr-terms-search-vs-MySql-FULLTEXT-index-and-AGAINST-tp25658300p25658300.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr and Garbage Collection

2009-09-29 Thread Fuad Efendi


> Actually the CPU usage of the solr servers is almost insignificant (it was
> like that before).

>>The time spent on collecting memory dropped from 11% to 3.81%


I even think that 3.81% from 5% is nothing (suspecting that SOLR uses 5%
CPU, mostly loading large field values in memory) :)))
(would be nice to load-stress-multithreaded except of waiting...)


Most Expensive Query: faceting on all fields with generic query like *:*

Fwd: "Only one usage of each socket address" error

2009-09-29 Thread Steinar Asbjørnsen

Seems like the post in the SolrNet group: http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1
helped me get trough.

Thanks you solr-user's for helping out too!

Steinar

Videresendt melding:

Fra: Steinar Asbjørnsen
Dato: 28. september 2009 17.07.15 GMT+02.00
Til: solr-user@lucene.apache.org
Emne: Re: "Only one usage of each socket address" error

I'm using the add(MyObject) command form ()in a foreach loop to add
my objects to the index.

In the catalina-log i cannot see anything that helps me out.
It stops at:
28.sep.2009 08:58:40
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {add=[12345]} 0 187
28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187
Whitch indicates nothing wrong.

Are there any other logs that should be checked?

What it seems like to me at the moment is that the foreach is
passing objects(documents) to solr faster then solr can add them to
the index. As in I'm eventually running out of connections (to
solr?) or something.

I'm running another incremental update that with other objects where
the foreachs isn't quite as fast. This job has added over 100k
documents without failing, and still going. Whereas the problematic
job fails after ~3k.

What I've learned trough the day tho, is that the index where my
feed is failing is actually redundant.

I.e I'm off the hook for now.

Still I'd like to figure out whats going wrong.

Steinar

There's nothing in that output that indicates something we can help
with over in solr-user land. What is the call you're making to
Solr? Did Solr log anything anomalous?

Erik

On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote:

I just posted to the SolrNet-group since i have the exact same(?)
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet-
group doesn't seem as active as this mailinglist).

The problem occurs when I'm running an incremental feed(self made)
of a index.

My post:
[snip]
Whats happening is that i get this error message (in VS):
"A first chance exception of type
'SolrNet.Exceptions.SolrConnectionException' occurred in
SolrNet.DLL"

And the web browser (which i use to start the feed says:
"System.Data.SqlClient.SqlException: Timeout expired. The timeout
period elapsed prior to completion of the operation or the server is
not responding."
At the time of writing my index contains 15k docs, and "lacks" ~700k
docs that the incremental feed should take care of adding to the
index.
The error message appears after 3k docs are added, and before 4k
docs are added.
I'm committing each 1%1000==0.
In addittion autocommit is set to:

More info:
From schema.xml:

I'm fetching data from a (remote) Sql 2008 Server, using
sqljdbc4.jar.

And Solr is running on a local Tomcat-installation.
SolrNet version: 0.2.3.0
Solr Specification Version: 1.3.0.2009.08.29.08.05.39

[/snip]
Any suggestions on how to fix this would be much apreceiated.

Regards,
Steinar

Showing few results for each category (facet)

2009-09-29 Thread Varun Gupta

Hi,

I am looking for a way to do the following in solr:
When somebody does a search, I want to show results by category (facet) such
that I display 5 results from each category (along with showing the total
number of results in each category which I can always do using the facet
search). This is kind of an overview of all the search results and user can
click on the category to see all the results pertaining to that category
(normal facet search with filter).

One way that I can think of doing this is by making as many queries as there
are categories and show these results under each category. But this will be
very inefficient. Is there any way I can do this ?

Thanks & Regards,
Varun Gupta

Usage of Sort and fq

2009-09-29 Thread bhaskar chandrasekar

Hi,
 
Can some one let me know how to use sort and fq parameters in Solr.
Any examples woould be appreciated.
 
Regards
Bhaskar

Re: Usage of Sort and fq

2009-09-29 Thread Avlesh Singh

/?q=*:*&fq:category:animal&sort=child_count%20asc

Search for all documents (of animals), and filter the ones that belong to
the category "animal" and sort ascending by a field called child_count that
contains number of children for each animal.

You can pass multiple fq's with more "&fq=..." parameters. Secondary,
tertiary sorts can be specified using comma (",") as the separator. i.e.
"sort=fieldA asc,fieldB desc, fieldC asc, ..."

Cheers
Avlesh

On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar
wrote:

> Hi,
>
> Can some one let me know how to use sort and fq parameters in Solr.
> Any examples woould be appreciated.
>
> Regards
> Bhaskar
>
>
>

Re: Showing few results for each category (facet)

2009-09-29 Thread Marian Steinbach

On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta  wrote:
> ...
>
> One way that I can think of doing this is by making as many queries as there
> are categories and show these results under each category. But this will be
> very inefficient. Is there any way I can do this ?

Hi Varun!

I think that doing multiple queries doesn't have to be inefficient,
since Solr caches subsequent queries for the same term and facets.

Imagine this as your first query:
- q: xyz
- facets: myfacet

and this as a second query:
- q:xyz
- fq: myfacet=a

Compared to the first query, the second query will be very fast, since
all the hard work ahs been done in query one and then cached.

At least that's my understanding. Please correct me if I'm wrong.

Marian

Problem getting Solr home from JNDI in Tomcat

2009-09-29 Thread Andrew Clegg


Hi all, I'm having problems getting Solr to start on Tomcat 6.

Tomcat is installed in /opt/apache-tomcat , solr is in
/opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr .
My config file is in /opt/solr/conf/solrconfig.xml .

I have a Solr-specific context file in
/opt/apache-tomcat/conf/Catalina/localhost/solr.xml which looks like this:






But when I start Solr and browse to it, it tells me:

java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
classpath or 'solr/conf/', cwd=/ at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162)
at org.apache.solr.core.Config.(Config.java:100) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
at
org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244)
at
org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604)
at
org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at
javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619) 

Weirdly, the exact same context file works fine on a different machine. I've
tried giving Context a docBase element (both absolute, and relative paths)
but it makes no difference -- Solr still isn't seeing the right home
directory. I also tried setting debug="1" but didn't see any more useful
info anywhere.

Any ideas? This is a total show-stopper for me as this is our production
server. (Otherwise I'd think about taking it down and hardwiring the Solr
home path into the server's context...)

Yours hopefully,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25662200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem getting Solr home from JNDI in Tomcat

2009-09-29 Thread Constantijn Visinescu

This might be a bit of a hack but i got this in the web.xml of my applicatin
and it works great.



   solr/home
   /Solr/WebRoot/WEB-INF/solr
   java.lang.String


On Tue, Sep 29, 2009 at 2:32 PM, Andrew Clegg wrote:

>
> Hi all, I'm having problems getting Solr to start on Tomcat 6.
>
> Tomcat is installed in /opt/apache-tomcat , solr is in
> /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr .
> My config file is in /opt/solr/conf/solrconfig.xml .
>
> I have a Solr-specific context file in
> /opt/apache-tomcat/conf/Catalina/localhost/solr.xml which looks like this:
>
> 
> value="/opt/solr" override="true" />
> allow="128\.40\.46\..*,127\.0\.0\.1" />
> 
>
> But when I start Solr and browse to it, it tells me:
>
> java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
> classpath or 'solr/conf/', cwd=/ at
>
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194)
> at
>
> org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162)
> at org.apache.solr.core.Config.(Config.java:100) at
> org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at
> org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at
>
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
> at
>
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
> at
>
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709)
> at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4356)
> at
> org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:1244)
> at
>
> org.apache.catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:604)
> at
>
> org.apache.catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:129)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:690) at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at
>
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
> at
>
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
>
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at
>
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at
>
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
> at
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
> at
>
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at
>
> org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
> at
>
> org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
> at
>
> org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
> at
>
> org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
> at
>
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at
>
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at
>
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
>
> Weirdly, the exact same context file works fine on a different machine.
> I've
> tried giving Context a docBase element (both absolute, and relative paths)
> but it makes no difference -- Solr still isn't seeing the right home
> directory. I also tried setting debug="1" but didn't see any more useful
> info anywhere.
>
> Any ideas? This is a total show-stopper for me as this is our production
> server. (Otherwise I'd think about taking it down and hardwiring the Solr
> home path into the server's context...)
>
> Yours hopefully,
>
> Andrew.
>
> --
> View this message in context:
> http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25662200.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Problem with Wildcard...

2009-09-29 Thread Jörg Agatz

Hi Users...

i have a Problem

I have a lot of fields, (type=text) for search in all fields i copy all
fields in the default text field and use this for default search.

Now i will search...

This is into a Field

"RI-MC500034-1"
when i search "RI-MC500034-1" i found it...
if i seacht "RI-MC5000*" i dosen´t

when i search "500034" i found it...
if i seacht "5000*" i dosen´t

what can i do to use the Wildcards?

KingArtus

Re: Measuring timing with debugQuery=true

2009-09-29 Thread Rahul R

Sorry for the delayed response
**
*How big are your documents?*
I have totally 1 million documents. I have totally 1950 fields in the index.
Every document would probably have values for around 20 - 50 fields.
*What is the total size of the index?*
1 GB

*What's the amout of RAM on your box? How big is the JVM heap (and how much
free memory is left on your system)?*
I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box,
I am able to allocate only 1 GB to the JVM. No other applications are
running on the system. So the entire 4GB is at the disposal of the
application. I am simulating load using a load tool (15 users)

*Can you show what this slow query looks like (the whole request)?*
q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true

q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true

Other information
Solr 1.3, JDK 1.5.0_14

regards
Rahul

On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley wrote:

> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R  wrote:
> > Yonik,
> > I understand that the network can be a bottle-neck but I am pretty sure
> that
> > it is not. I am operating on a 100 MBPS intranet... How do I ensure
> that
> > stored fields are cached by the OS ? Only the Solr caches within the JVM
> are
> > under my control.. The result set has around 10K documents of which I
> am
> > retrieving only 10..I am displaying a max of only 3 fields per
> document
> > in my result set. Can the reading time for these stored fields be so long
> ?
>
> It could be a seek per document if the index is too big to fit in the
> OS cache - but that still wouldn't be as slow as you report.
> Something is fishy here.
>
> How big are your documents?
> What is the total size of the index?
> What's the amout of RAM on your box?
> How big is the JVM heap (and how much free memory is left on your system)?
> Can you show what this slow query looks like (the whole request)?
>
> > I have totally around 1 million documents in my index Any
> thoughts
> > on why the FacetComponent does not take any time while the QueryComponent
> > takes around 2.4s.
>
> It could be a field that has very few unique values and faceting just
> completes quickly.
> Make sure you're actually getting faceting data back (that it's
> correctly turned on).
>
> -Yonik
> http://www.lucidimagination.com
>
> > I am doing a faceted and keyword query ie I have both 'q'
> > and 'fq' params in my query Thank you for your response.
> >
> > Regards
> > Rahul
> >
> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley <
> yo...@lucidimagination.com>
> > wrote:
> >>
> >> The response times in a Solr request don't include the time to read
> >> stored fields (since the response is streamed) and doesn't include the
> >> time to transfer/read the response (which can be increased by a
> >> slow/congested network link, or a slow client that doesn't read the
> >> response immediately).
> >>
> >> How many documents are you retrieving?  Reading stored fields for
> >> documents can be slow if they aren't cached by the OS since it's often
> >> a disk seek per document read for a large index.
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R  wrote:
> >> > Hello,
> >> > I am trying to measure why some of my queries take a long time. I am
> >> > using
> >> > EmbeddedSolrServer and with logging statements before and
> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found
> the
> >> > time to be around 16s. I added the debugQuery=true and the timing
> >> > component
> >> > for this reads as following:
> >> >
> >> > *
> >> >
> >> >
> timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent={time=2438.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}}}
> >> > *
> >> >
> >> > As you can see, this shows only 2.4s being used by the query. I can't
> >> > seem
> >> > to figure out where the rest of the time is being spent. This is
> within
> >> > my
> >> > office intranet and I don't think the request-res

Re: FileNotFoundException in Java replication handler backups

2009-09-29 Thread Shalin Shekhar Mangar

On Tue, Sep 29, 2009 at 3:19 AM, Mark Miller  wrote:

> Looks like a bug to me. I don't see the commit point being reserved in
> the backup code - which means its likely be removed before its done
> being copied. Gotto reserve it using the delete policy to keep around
> for the full backup duration. I'd file a JIRA issue.
>
>
Definitely a bug. Chris, please open an issue. I'll try to work up a patch.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Measuring timing with debugQuery=true

2009-09-29 Thread Rahul R

I just want to clarify here that I understand my memory allocation might be
less given the load on the system. The response times were only slightly
better when we ran the test on a Solaris box with 12CPU, 24G RAM and with
3.2 GB allocated for the JVM. I know that I have a performance
problem. My main concern is to identify the reasons for the inconsistency
between the timing information shown between the debugQuery output (2.4s)
and the entire time taken by the EmbeddedSolrServer.query(SolrQuery)
function (16s). I feel that if I can find out where the remaining 13.6s gets
used, then I can look to improve accordingly. Thank you.

Regards
Rahul

On Tue, Sep 29, 2009 at 7:12 PM, Rahul R  wrote:

> Sorry for the delayed response
>  **
> *How big are your documents?*
> I have totally 1 million documents. I have totally 1950 fields in the
> index. Every document would probably have values for around 20 - 50 fields.
>  *What is the total size of the index?*
> 1 GB
>
> *What's the amout of RAM on your box? How big is the JVM heap (and how
> much free memory is left on your system)?*
> I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box,
> I am able to allocate only 1 GB to the JVM. No other applications are
> running on the system. So the entire 4GB is at the disposal of the
> application. I am simulating load using a load tool (15 users)
>
> *Can you show what this slow query looks like (the whole request)?*
>
> q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
>
>
> q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
>
> Other information
> Solr 1.3, JDK 1.5.0_14
>
> regards
> Rahul
>
>   On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley <
> yo...@lucidimagination.com> wrote:
>
>> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R  wrote:
>> > Yonik,
>> > I understand that the network can be a bottle-neck but I am pretty sure
>> that
>> > it is not. I am operating on a 100 MBPS intranet... How do I ensure
>> that
>> > stored fields are cached by the OS ? Only the Solr caches within the JVM
>> are
>> > under my control.. The result set has around 10K documents of which
>> I am
>> > retrieving only 10..I am displaying a max of only 3 fields per
>> document
>> > in my result set. Can the reading time for these stored fields be so
>> long ?
>>
>> It could be a seek per document if the index is too big to fit in the
>> OS cache - but that still wouldn't be as slow as you report.
>> Something is fishy here.
>>
>> How big are your documents?
>> What is the total size of the index?
>> What's the amout of RAM on your box?
>> How big is the JVM heap (and how much free memory is left on your system)?
>> Can you show what this slow query looks like (the whole request)?
>>
>> > I have totally around 1 million documents in my index Any
>> thoughts
>> > on why the FacetComponent does not take any time while the
>> QueryComponent
>> > takes around 2.4s.
>>
>> It could be a field that has very few unique values and faceting just
>> completes quickly.
>> Make sure you're actually getting faceting data back (that it's
>> correctly turned on).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> > I am doing a faceted and keyword query ie I have both 'q'
>> > and 'fq' params in my query Thank you for your response.
>> >
>> > Regards
>> > Rahul
>> >
>> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley <
>> yo...@lucidimagination.com>
>> > wrote:
>> >>
>> >> The response times in a Solr request don't include the time to read
>> >> stored fields (since the response is streamed) and doesn't include the
>> >> time to transfer/read the response (which can be increased by a
>> >> slow/congested network link, or a slow client that doesn't read the
>> >> response immediately).
>> >>
>> >> How many documents are you retrieving?  Reading stored fields for
>> >> documents can be slow if they aren't cached by the OS since it's often
>> >> a disk seek per document read for a large index.
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>> >>
>> >>
>> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R  wrote:
>> >> > Hello,
>> >> > I am trying to measure why some of my queries take a long time. I am
>> >> > using
>> >> > EmbeddedSolrServer and with logging statements before and
>> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found
>> the
>> >> > time to be around 16s. I added the debugQuery=true and the timing
>> >> > component
>> >> > for this reads as following:
>> >> >
>> >> > *
>> >> >
>> >> >
>> timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.

Re: Problem getting Solr home from JNDI in Tomcat

2009-09-29 Thread Andrew Clegg

Constantijn Visinescu wrote:
> 
> This might be a bit of a hack but i got this in the web.xml of my
> applicatin
> and it works great.
> 
> 
> 
>solr/home
>/Solr/WebRoot/WEB-INF/solr
>java.lang.String
> 
> 
> 

That worked, thanks. You're right though, it is a bit of a hack -- I'd
prefer to set the path from *outside* the app so it won't get overwritten
when I upgrade.

Now I've got a completely different error:
"org.apache.lucene.index.CorruptIndexException: Unknown format version: -9".
I think it might be time for a fresh install...

Cheers,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25663931.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Measuring timing with debugQuery=true

2009-09-29 Thread Yonik Seeley

It's harder debugging stuff like this with custom code (you say that
you're using EmbeddedSolrServer) and different servlet containers.

Perahps try putting your config files and index into the example jetty
server, and then do a single request from curl or your web browser to
see if the times are still long.

-Yonik
http://www.lucidimagination.com



On Tue, Sep 29, 2009 at 9:42 AM, Rahul R  wrote:
> Sorry for the delayed response
>
> How big are your documents?
> I have totally 1 million documents. I have totally 1950 fields in the index.
> Every document would probably have values for around 20 - 50 fields.
> What is the total size of the index?
> 1 GB
> What's the amout of RAM on your box? How big is the JVM heap (and how much
> free memory is left on your system)?
> I have 4 GB RAM. I am using Weblogic 10, 32 Bit. Since it is a windows box,
> I am able to allocate only 1 GB to the JVM. No other applications are
> running on the system. So the entire 4GB is at the disposal of the
> application. I am simulating load using a load tool (15 users)
> Can you show what this slow query looks like (the whole request)?
> q=*%3A*&rows=0&facet=true&facet.mincount=1&facet.limit=2&f.S9942.facet.limit=100&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
> q=*%3A*&fq=S9942%3A%22TEXAS+INSTRUMENTS%22&rows=0&facet=true&facet.mincount=1&facet.limit=2&facet.field=S9942&facet.field=S6878&facet.field=S9156&facet.field=S0369&facet.field=S9926&facet.field=S1421&facet.field=S8990&facet.field=S6881&facet.field=S3552&debugQuery=true
>
> Other information
> Solr 1.3, JDK 1.5.0_14
>
> regards
> Rahul
>
> On Mon, Sep 28, 2009 at 6:48 PM, Yonik Seeley 
> wrote:
>>
>> On Mon, Sep 28, 2009 at 7:51 AM, Rahul R  wrote:
>> > Yonik,
>> > I understand that the network can be a bottle-neck but I am pretty sure
>> > that
>> > it is not. I am operating on a 100 MBPS intranet... How do I ensure
>> > that
>> > stored fields are cached by the OS ? Only the Solr caches within the JVM
>> > are
>> > under my control.. The result set has around 10K documents of which
>> > I am
>> > retrieving only 10..I am displaying a max of only 3 fields per
>> > document
>> > in my result set. Can the reading time for these stored fields be so
>> > long ?
>>
>> It could be a seek per document if the index is too big to fit in the
>> OS cache - but that still wouldn't be as slow as you report.
>> Something is fishy here.
>>
>> How big are your documents?
>> What is the total size of the index?
>> What's the amout of RAM on your box?
>> How big is the JVM heap (and how much free memory is left on your system)?
>> Can you show what this slow query looks like (the whole request)?
>>
>> > I have totally around 1 million documents in my index Any
>> > thoughts
>> > on why the FacetComponent does not take any time while the
>> > QueryComponent
>> > takes around 2.4s.
>>
>> It could be a field that has very few unique values and faceting just
>> completes quickly.
>> Make sure you're actually getting faceting data back (that it's
>> correctly turned on).
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> > I am doing a faceted and keyword query ie I have both 'q'
>> > and 'fq' params in my query Thank you for your response.
>> >
>> > Regards
>> > Rahul
>> >
>> > On Mon, Sep 28, 2009 at 1:20 AM, Yonik Seeley
>> > 
>> > wrote:
>> >>
>> >> The response times in a Solr request don't include the time to read
>> >> stored fields (since the response is streamed) and doesn't include the
>> >> time to transfer/read the response (which can be increased by a
>> >> slow/congested network link, or a slow client that doesn't read the
>> >> response immediately).
>> >>
>> >> How many documents are you retrieving?  Reading stored fields for
>> >> documents can be slow if they aren't cached by the OS since it's often
>> >> a disk seek per document read for a large index.
>> >>
>> >> -Yonik
>> >> http://www.lucidimagination.com
>> >>
>> >>
>> >>
>> >> On Sun, Sep 27, 2009 at 3:41 PM, Rahul R  wrote:
>> >> > Hello,
>> >> > I am trying to measure why some of my queries take a long time. I am
>> >> > using
>> >> > EmbeddedSolrServer and with logging statements before and
>> >> > after the EmbeddedSolrServer.query(SolrQuery) function, I have found
>> >> > the
>> >> > time to be around 16s. I added the debugQuery=true and the timing
>> >> > component
>> >> > for this reads as following:
>> >> >
>> >> > *
>> >> >
>> >> >
>> >> > timing:{time=2438.0,prepare={time=0.0,org.apache.solr.handler.component.QueryComponent={time=0.0},org.apache.solr.handler.component.FacetComponent={time=0.0},org.apache.solr.handler.component.MoreLikeThisComponent={time=0.0},org.apache.solr.handler.component.HighlightComponent={time=0.0},org.apache.solr.handler.component.DebugComponent={time=0.0}},process={time=2438.0,org.apache.solr.handler.component.QueryComponent=

${dataimporter.last_index_time} as an argument to newerThan in FileListEntityProcessor?

2009-09-29 Thread Bill Dueber

Is this possible? I can't figure out a syntax that works, and all the
examples show using last_index_time as an argument to an SQL query.

-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library

RE: Question on Access or viewing TermFrequency Vector via SOLR.

2009-09-29 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340

Grant,

Thanks for the link. Based on the example, I think this is what I need.
If effeciency is a problem, I will consider it. I see the note that
tv.df can be expensive.
I guess it all depends on how big the collection is.

I'm a proponent of not reinvientin the wheel if it has already been
invented
And can be easily integrated into my task.

I looked at the TermVecotrComponentExampleEnabled (Example output) and
It looks like it is what I needed.

-Peter


 


> -Original Message-
> From: Grant Ingersoll [mailto:gsing...@apache.org] 
> Sent: Monday, September 28, 2009 6:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Question on Access or viewing TermFrequency 
> Vector via SOLR.
> 
> 
> http://wiki.apache.org/solr/TermVectorComponent.  You may 
> want to hack  
> in your own capabilities to implement your own TermVectorMapper for  
> efficiency reasons.
> 
> On Sep 28, 2009, at 5:05 PM, Thung, Peter C CIV 
> SPAWARSYSCEN-PACIFIC,  
> 56340 wrote:
> 
> > Mark,
> >
> > Thanks.  I think this may be partially what I need.
> >
> > Basically, what I'm trying to figure out is the following
> > If someone enters a keyword say
> > Apple.
> > I would like to find all the documents that have the word apple In 
> > them, and then for each document, the number of times it showed
> > up in
> > each
> > Document.
> >
> > From the link you sent, (assuming I understand it 
> correctly), With the 
> > field name "name", it has the terms (values) within the field name 
> > "name" Of 1, 11, 120, 133, 184, etc.. With the respective counts of 
> > how many documents that match the term. (I have to wonder if it 
> > multiply counts documents if the term is in a document more 
> than once.
> >
> > It does not tell me which document matched a specific term, or the 
> > number of terms that are in a specific document, correct?
> >
> >
> > -Peter
> >
> >
> >
> > **
> > Peter Thung
> > Software Developer
> > IBS Project Technical Lead -Web Developer
> >
> > Code 56340  - Net-centric ISR Development Branch
> > Joint & National ISR Systems Division
> > Inteligence, Surveillance and Reconnaissance Department
> > US Navy Space & Naval Warfare Systems Center Pacific (SSC 
> PAC) Topside 
> > Campus, Bldg A33, room 0055 53560 Hull Street, San Diego, CA 92152
> >
> > UNCLASS Email: peter.th...@navy.mil
> > SIPRNET Email: thu...@spawar.navy.smil.mil
> > COMM (Primary): (619) 553-6513
> > COMM (Secondary):(619) 553-0777
> > FAX: (619) 553-1586
> > **
> >
> >
> >
> >> -Original Message-
> >> From: Mark Miller [mailto:markrmil...@gmail.com]
> >> Sent: Monday, September 28, 2009 1:50 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Question on Access or viewing TermFrequency 
> Vector via 
> >> SOLR.
> >>
> >>
> >> Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340 wrote:
> >>> is there a SOLR query that can access or view the
> >> TermFrequencies for
> >>> the various documents discovered, Or is the only wya to 
> >>> programmatically access this information. If so could 
> someon share 
> >>> an example and maybe a link for
> >> information on
> >>> how to do this?
> >>> Some sample queries?
> >>>
> >>> Thank you in advance.
> >>>
> >>>
> >>> -Peter
> >>>
> >>>
> >>>
> >>>
> >>>
> >> Close I can think of is: http://wiki.apache.org/solr/TermsComponent
> >>
> >> --
> >> - Mark
> >>
> >> http://www.lucidimagination.com
> >>
> >>
> >>
> >>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
>

Re: Create new core on the fly

2009-09-29 Thread djain101


Hi,

We are also facing the same issue. Is the LOAD action implemented yet? If
not then what should we do to achieve the same functionality?

Thanks,
djain



ryantxu wrote:
> 
> The LOAD method will load a core from a schema/config file -- it will 
> not need to be in multicore.xml  (the persist=true option should 
> serialize this change into multicore.xml)
> 
> Henri's latest patch implements LOAD, but it needs some clean up to 
> apply cleanly to the current trunk.
> 
> ryan
> 
> 
> Doug Steigerwald wrote:
>> Is it going to be possible (soon) to register new Solr cores on the 
>> fly?  I know the LOAD action is yet to be implemented, but will that let 
>> you create new cores that are not listed in the multicore.xml?  We're 
>> occasionally going to have to create new cores and would like to not 
>> have to stop/start Solr do to do this.
>> 
>> We want to be able to create the core structure on the filesystem and 
>> register that core, or make changes to the multicore.xml file and tell 
>> Solr to reload the cores and pick up the new ones.
>> 
>> Thanks.
>> Doug
>> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25666388.html
Sent from the Solr - User mailing list archive at Nabble.com.

[ANN] Carrot2 version 3.1.0 released

2009-09-29 Thread Stanislaw Osinski

Dear All,

[Apologies for cross-posting.]

This is just to let you know that we've released version 3.1.0 of Carrot2
Search Results Clustering Engine.

The 3.1.0 release comes with:

* Experimental support for clustering Chinese Simplified content (based on
Lucene's Smart Chinese Analyzer)
* Document Clustering Workbench usability improvements
* Suffix Tree Clustering algorithm rewritten for better performance and
clustering quality
* Apache Solr clustering plugin (to be available in Solr 1.4, Grant's blog
post:
http://www.lucidimagination.com/blog/2009/09/28/solrs-new-clustering-capabilities/
)

Release notes:
http://project.carrot2.org/release-3.1.0-notes.html

On-line demo:
http://search.carrot2.org

Download:
http://download.carrot2.org

Project website:
http://project.carrot2.org


Thanks,

Staszek

--
Stanislaw Osinski, http://carrot2.org

Index backup with new replication?

2009-09-29 Thread KaktuChakarabati


Hey,
I noticed with new in-process replication, it is not as straightforward to
have
(production serving) solr index snapshots for backup (it used to be a
natural byproduct
of the snapshot taking process.)
I understand there are some command-line utilities for this (abc..)
Can someone please explain how to use these to take a snapshot
of a solr index, assuming it is being used in production? what are some
guidelines? should I stop
other processes that might be issuing updates and/or comitts while taking it
or is it atomic (e.g hard link )?

would be nice to have this in wiki too i think for the benefit of other
users,
having regular backup snapshots seems critical..

Thanks,
-Chak
-- 
View this message in context: 
http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ${dataimporter.last_index_time} as an argument to newerThan in FileListEntityProcessor?

2009-09-29 Thread Shalin Shekhar Mangar

On Tue, Sep 29, 2009 at 8:14 PM, Bill Dueber  wrote:

> Is this possible? I can't figure out a syntax that works, and all the
> examples show using last_index_time as an argument to an SQL query.
>
>
It is possible but it doesn't work right now. I've created an issue and I
will give a patch shortly.

https://issues.apache.org/jira/browse/SOLR-1473

-- 
Regards,
Shalin Shekhar Mangar.

Re: Create new core on the fly

2009-09-29 Thread Shalin Shekhar Mangar

On Tue, Sep 29, 2009 at 10:01 PM, djain101  wrote:

>
> Is the LOAD action implemented yet?
>

Yes, see http://wiki.apache.org/solr/CoreAdmin

-- 
Regards,
Shalin Shekhar Mangar.

Re: Create new core on the fly

2009-09-29 Thread djain101

Thanks Shalin for quick response. On the wiki link you mentioned, it is
saying "not implemented yet!". Can you please confirm again? If yes, then in
which release it is available?

Appreciate your quick response.

Regards,
Dharmveer

Shalin Shekhar Mangar wrote:
> 
> On Tue, Sep 29, 2009 at 10:01 PM, djain101 
> wrote:
> 
>>
>> Is the LOAD action implemented yet?
>>
> 
> Yes, see http://wiki.apache.org/solr/CoreAdmin
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25669128.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Usage of Sort and fq

2009-09-29 Thread Matt Weber


A description and examples of both parameters can be found here:

http://wiki.apache.org/solr/CommonQueryParameters

Thanks,

Matt Weber

On Sep 29, 2009, at 4:10 AM, Avlesh Singh wrote:


/?q=*:*&fq:category:animal&sort=child_count%20asc

Search for all documents (of animals), and filter the ones that  
belong to
the category "animal" and sort ascending by a field called  
child_count that

contains number of children for each animal.

You can pass multiple fq's with more "&fq=..." parameters. Secondary,
tertiary sorts can be specified using comma (",") as the separator.  
i.e.

"sort=fieldA asc,fieldB desc, fieldC asc, ..."

Cheers
Avlesh

On Tue, Sep 29, 2009 at 3:51 PM, bhaskar chandrasekar
wrote:


Hi,

Can some one let me know how to use sort and fq parameters in Solr.
Any examples woould be appreciated.

Regards
Bhaskar

Re: Create new core on the fly

2009-09-29 Thread Shalin Shekhar Mangar

On Wed, Sep 30, 2009 at 12:42 AM, djain101  wrote:

>
> Thanks Shalin for quick response. On the wiki link you mentioned, it is
> saying "not implemented yet!". Can you please confirm again? If yes, then
> in
> which release it is available?
>

Ah, I'm sorry. You are right. Load is not implemented yet. The other way to
achieve this is with the "create" and "unload" commands but you do have to
specify the instanceDir, config, schema and dataDir for "create".

There is some work in progress towards this feature which is targeted for
1.5 - see http://wiki.apache.org/solr/LotsOfCores

-- 
Regards,
Shalin Shekhar Mangar.

Re: Showing few results for each category (facet)

2009-09-29 Thread Matt Weber

So, you want to display 5 results from each category and still know  
how many results are in each category.  This is a perfect situation  
for the field collapsing patch:


https://issues.apache.org/jira/browse/SOLR-236
http://wiki.apache.org/solr/FieldCollapsing

Here is how I would do it.

Add a field to your schema called category or whatever.  Then while  
indexing you populate that field with whatever category the document  
belongs in.  While executing a search, collapse the results on that  
field with a max collapse of 5.  This will give you at most 5 results  
per category.  Now, at the same time enable faceting on that field and  
DO NOT use the collapsing parameter to recount the facet vales.  This  
means that the facet counts will be reflect the non-collapsed  
results.  This facet should only be used to get the count for each  
category, not displayed to the user.  On your search results page that  
gets the collapsed results, you can put a link that says "Show all X  
results from this category" where X is the value you pull out of the  
facet.  When a user clicks that link you basically do the same search  
with field collapsing disabled, and a filter query on the specific  
category they want to see, for example:  &fq=category:people.


Hope this helps.

Thanks,

Matt Weber

On Sep 29, 2009, at 4:55 AM, Marian Steinbach wrote:

On Tue, Sep 29, 2009 at 11:36 AM, Varun Gupta  
 wrote:

...

One way that I can think of doing this is by making as many queries  
as there
are categories and show these results under each category. But this  
will be

very inefficient. Is there any way I can do this ?



Hi Varun!

I think that doing multiple queries doesn't have to be inefficient,
since Solr caches subsequent queries for the same term and facets.

Imagine this as your first query:
- q: xyz
- facets: myfacet

and this as a second query:
- q:xyz
- fq: myfacet=a

Compared to the first query, the second query will be very fast, since
all the hard work ahs been done in query one and then cached.

At least that's my understanding. Please correct me if I'm wrong.

Marian

Re: Index backup with new replication?

2009-09-29 Thread Chris Harris

The documentation could maybe be improved, but the basics of backup
snapshots with the in-process (Java-based) replication handler
actually seem pretty straightforward to me, now that I understand it:

1. You can make a snapshot whenever you want by hitting
http://master_host:port/solr/replication?command=backup

2. You can have automatically triggered snapshots at commit time or
optimize time by putting a backupAfter tag in the replication handler
section of your solrconfig.xml.

(See http://wiki.apache.org/solr/SolrReplication)

In neither case do you need to stop Solr or stop modifying your index
while the backup is in progress.

Does anything in particular seem not straightforward? I guess there's
no built-in way to purge old indexes from disk; that's a little
inconvenient.

If you want to use the command-line tools, I think those should be
totally compatible with the new (Java) replication tools. I don't know
as much about them, though.

2009/9/29 KaktuChakarabati :
>
> Hey,
> I noticed with new in-process replication, it is not as straightforward to
> have
> (production serving) solr index snapshots for backup (it used to be a
> natural byproduct
> of the snapshot taking process.)
> I understand there are some command-line utilities for this (abc..)
> Can someone please explain how to use these to take a snapshot
> of a solr index, assuming it is being used in production? what are some
> guidelines? should I stop
> other processes that might be issuing updates and/or comitts while taking it
> or is it atomic (e.g hard link )?
>
> would be nice to have this in wiki too i think for the benefit of other
> users,
> having regular backup snapshots seems critical..
>
> Thanks,
> -Chak
> --
> View this message in context: 
> http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Questions on RandomSortField

2009-09-29 Thread Chris Hostetter


: The question was either non-trivial or heavily uninteresting! No replies yet

it's pretty non-trivial, and pretty interesting, but i'm also pretty 
behind on my solr-user email.

I don't think there's anyway to do what you wanted without a custom 
plugin, so your efforts weren't in vain ... if we add the abiliity to sort 
by a ValueSource (aka function ... there's a Jira issue for this 
somewhere) then you could also do witha combination of functions so that 
anything in your category gets flattened to an extremely high constant, 
and everything else has a real score -- then a secondary sort on a random 
field would effectively only randomize the things in your category ... but 
we're not there yet.

: Hoss, I have a small question (RandomSortField bears your signature) - Any
: reason as to why RandomSortField#hash() and RandomSortField#getSeed()
: methods are private? Having them public would have saved myself from
: "owning" a copy in my class as well.

just a general principle of API future-proofing: keep internals private 
unless you explicitly think through how subclasses will use them.

I haven't thought it through all the way, but do you really need to copy 
everything?  couldn't you get the SortField/Comparator from super and 
only delegate to it if the categories both match your specific categoryId? 



-Hoss

Re: XSD for Solr Response Format Version 2.2

2009-09-29 Thread Chris Hostetter


: I am working on an XSD document for all the types in the response xml
: version 2.2
: 
: Do you think there is a need for this?

we haven't had one yet, and it doesn't seem like it's really caused any 
problems for people (plus the lack of response to this question suggests 
no one is super excited about it) but that doesn't mean it wouldn't be 
useful if you want to submit it.


-Hoss

Re: Create new core on the fly

2009-09-29 Thread djain101


Hi Shalin,

Can you please elaborate, why we need to do unload after create? So, if we
do a create, will it modify the solr.xml everytime? Can it be avoided in
subsequent requests for create?

Also, if we want to implement Load, can you please give some directions to
implement load action?  

Thanks,
Dharmveer



Shalin Shekhar Mangar wrote:
> 
> On Wed, Sep 30, 2009 at 12:42 AM, djain101 
> wrote:
> 
> Ah, I'm sorry. You are right. Load is not implemented yet. The other way
> to
> achieve this is with the "create" and "unload" commands but you do have to
> specify the instanceDir, config, schema and dataDir for "create".
> 
> There is some work in progress towards this feature which is targeted for
> 1.5 - see http://wiki.apache.org/solr/LotsOfCores
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Create-new-core-on-the-fly-tp14585788p25671905.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query performance

2009-09-29 Thread Chris Hostetter


: Does the following query has any performance impact over
: the second query?

: +title:lucene +(title:lucene -name:sid)

: +(title:lucene -name:sid)

the second should in theory be faster then the first just because of 
reduced number of comparisons needed -- but wether or not you would 
actually notice a difference is mainly going to depend on your data.


-Hoss

Re: How to configure Solr 1.3 on Websphere 6.1

2009-09-29 Thread Chris Hostetter

: I have been trying to deploy Solr on websphere but no luck yet.
: I was trying to deploy the war file under "dist" folder, but I kept getting
: errors. (recent one is that it couldn't find the configuration file). When I

Did you start by going through the tutorial using the instance of jetty 
included in the release?

your error was most likelye related to the solrconfig.xml and 
schema.xml files -- which are not included in the war.  they are specific 
to *your* use cases, not part of the app.  examples are provided, and the 
tutorial shows you where to find them and how to use them.

try the tutorial, get up to speed using jettty, check out the wiki, 
and then you'll probably find it much easier to make sense of running solr 
in various servlet containers...

http://lucene.apache.org/solr/tutorial.html
http://wiki.apache.org/solr/SolrInstall


-Hoss

Re: Multiple DisMax Queries spanning across multiple fields

2009-09-29 Thread Chris Hostetter

: For a particular requirement we have - we need to do a query that is a
: combination of multiple dismax queries behind the scenes.  (Using solr 1.4
: nightly ).
...
: Creating a custom QParser works right away  as below.   
...
: Curious to see if we have an alternate method to implement the same / any
: other alternate suggestions to the problem itself.

if your sets of q params are coming from the same source as your sets 
of qf params (ie: some complicated client code) then i probably would 
have just written a parser that had special markup for indicating a 
DisMaxQuery and let the client pass a complex string (that way you won't 
have to worry about changing the logic in your QParser if you get a 
request to structure the DisMaxQeries in a diffenret super query, the 
client can just do the restrcuturing).

But if you're still in the spirit of the dismaax handler (query strings 
come from clients, qf and pf come from index owner) then i think you made 
teh right call.



-Hoss

Re: Get access to CoreContainer

2009-09-29 Thread Jason Rutherglen

Yah, I just found it, and was going to reply to my own message with
that exactly!

My next question is how to get the port the request was on?

On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller  wrote:
> Jason Rutherglen wrote:
>> Howdy,
>>
>> I was wondering what the best way is to access the current
>> instance of CoreContainer? It seems like the only way to do this
>> is to extend CoreAdminHandler. I'd prefer a way via a way to
>> access CoreContainer from SolrCore or RequestHandlerBase.
>>
>> The use case is, I want to implement a SearchHandler that by
>> default, searches all of the local cores by automatically
>> inserting a shards param of the form
>> "localhost:8080/solr/core0,localhost:8080/solr/core1" into the
>> request. I'll be dynamically creating and unloading cores and so
>> do not want to edit solrconfig each time a core changes.
>>
>> Thanks!
>>
> SolrCore.getCoreDescriptor().getCoreContainer()
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: Get access to CoreContainer

2009-09-29 Thread Mark Miller

Jason Rutherglen wrote:
> Howdy,
>
> I was wondering what the best way is to access the current
> instance of CoreContainer? It seems like the only way to do this
> is to extend CoreAdminHandler. I'd prefer a way via a way to
> access CoreContainer from SolrCore or RequestHandlerBase.
>
> The use case is, I want to implement a SearchHandler that by
> default, searches all of the local cores by automatically
> inserting a shards param of the form
> "localhost:8080/solr/core0,localhost:8080/solr/core1" into the
> request. I'll be dynamically creating and unloading cores and so
> do not want to edit solrconfig each time a core changes.
>
> Thanks!
>   
SolrCore.getCoreDescriptor().getCoreContainer()

-- 
- Mark

http://www.lucidimagination.com

Get access to CoreContainer

2009-09-29 Thread Jason Rutherglen

Howdy,

I was wondering what the best way is to access the current
instance of CoreContainer? It seems like the only way to do this
is to extend CoreAdminHandler. I'd prefer a way via a way to
access CoreContainer from SolrCore or RequestHandlerBase.

The use case is, I want to implement a SearchHandler that by
default, searches all of the local cores by automatically
inserting a shards param of the form
"localhost:8080/solr/core0,localhost:8080/solr/core1" into the
request. I'll be dynamically creating and unloading cores and so
do not want to edit solrconfig each time a core changes.

Thanks!

Re: Sorting/paging problem

2009-09-29 Thread Chris Hostetter


: 2009-09-23T19:25:03.400Z
: 
: 2009-09-23T19:25:19.951
: 
: 2009-09-23T20:10:07.919Z

is that a cut/paste error, or did you really get a date back from Solr 
w/o the trailing "Z" ?!?!?!

...

: So, not only is the date sorting wrong, but the exact same document
: shows up on the next page, also still out of date order. I've seen the
: same document show up in 4-5 pages in some cases. It's always the last
: record on the page, too. If I change the page size, the problem seems to

that is really freaking weird.  can you reproduce this in a simple 
example?  maybe an index that's small enough (and doesn't contain 
confidential information) that you could zip up and post online?



-Hoss

Re: Get access to CoreContainer

2009-09-29 Thread Mark Miller

Unfortunately, because they don't want you counting on access to the
servlet request due to embedded Solr and what not, to get that type of
info you have to override and use your own SolrDispatchFilter:

  protected void execute( HttpServletRequest req, SolrRequestHandler
handler, SolrQueryRequest sreq, SolrQueryResponse rsp) {

// a custom filter could add more stuff to the request before
passing it on.
// for example: sreq.getContext().put( "HttpServletRequest", req );


Jason Rutherglen wrote:
> Yah, I just found it, and was going to reply to my own message with
> that exactly!
>
> My next question is how to get the port the request was on?
>
> On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller  wrote:
>   
>> Jason Rutherglen wrote:
>> 
>>> Howdy,
>>>
>>> I was wondering what the best way is to access the current
>>> instance of CoreContainer? It seems like the only way to do this
>>> is to extend CoreAdminHandler. I'd prefer a way via a way to
>>> access CoreContainer from SolrCore or RequestHandlerBase.
>>>
>>> The use case is, I want to implement a SearchHandler that by
>>> default, searches all of the local cores by automatically
>>> inserting a shards param of the form
>>> "localhost:8080/solr/core0,localhost:8080/solr/core1" into the
>>> request. I'll be dynamically creating and unloading cores and so
>>> do not want to edit solrconfig each time a core changes.
>>>
>>> Thanks!
>>>
>>>   
>> SolrCore.getCoreDescriptor().getCoreContainer()
>>
>> --
>> - Mark
>>
>> http://www.lucidimagination.com
>>
>>
>>
>>
>> 


-- 
- Mark

http://www.lucidimagination.com

Re: Get access to CoreContainer

2009-09-29 Thread Jason Rutherglen

I'll just allow the user to pass in the port via a param for now.

Thx!

On Tue, Sep 29, 2009 at 4:13 PM, Mark Miller  wrote:
> Unfortunately, because they don't want you counting on access to the
> servlet request due to embedded Solr and what not, to get that type of
> info you have to override and use your own SolrDispatchFilter:
>
>  protected void execute( HttpServletRequest req, SolrRequestHandler
> handler, SolrQueryRequest sreq, SolrQueryResponse rsp) {
>
>    // a custom filter could add more stuff to the request before
> passing it on.
>    // for example: sreq.getContext().put( "HttpServletRequest", req );
>
>
> Jason Rutherglen wrote:
>> Yah, I just found it, and was going to reply to my own message with
>> that exactly!
>>
>> My next question is how to get the port the request was on?
>>
>> On Tue, Sep 29, 2009 at 4:01 PM, Mark Miller  wrote:
>>
>>> Jason Rutherglen wrote:
>>>
 Howdy,

 I was wondering what the best way is to access the current
 instance of CoreContainer? It seems like the only way to do this
 is to extend CoreAdminHandler. I'd prefer a way via a way to
 access CoreContainer from SolrCore or RequestHandlerBase.

 The use case is, I want to implement a SearchHandler that by
 default, searches all of the local cores by automatically
 inserting a shards param of the form
 "localhost:8080/solr/core0,localhost:8080/solr/core1" into the
 request. I'll be dynamically creating and unloading cores and so
 do not want to edit solrconfig each time a core changes.

 Thanks!

>>> SolrCore.getCoreDescriptor().getCoreContainer()
>>>
>>> --
>>> - Mark
>>>
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Re: q.alt matching no documents

2009-09-29 Thread Chris Hostetter


: I've been using q.alt=-*:* because *:* is said to be the most efficient way 
of 
: querying for every document. is -*:* the most efficient way of querying for 
: no document?

I don't think so ... solr internally reverse pure negative queries so that 
they are combined with a matchalldocsquery that is positive ... which 
means your final query would look like (*:* -*:*) ... there's no toehr 
query optimization that would happen, so that query sould produce a 
DisjunctionScorer that can't ever "skipTo" past any docs.

once it gets cached it shouldn't matter -- but hey, you asked.

The most efficient way i can think of (besides a new Query class like John 
suggested) is along the lines of what Erik mentioned...

 

...
q.alt=match_nothing:0

...it has to be a real field or the query parsing code will freak out, but 
making it indexed=false will ensure that there will *never* be any terms 
in that field so the term "0" will never be there so the entire query 
process will short circut out almost immediately (it won't even construct 
a Scorer let alone iterate over any docs)

-Hoss

Re: Writing optimized index to different storage?

2009-09-29 Thread Chris Hostetter


: Is it possible to tell Solr or Lucene, when optimizing, to write the files
: that constitute the optimized index to somewhere other than
: SOLR_HOME/data/index or is there something about the optimize that requires
: the final segment to be created in SOLR_HOME/data/index?

For what purpose?

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss

Re: Index backup with new replication?

2009-09-29 Thread KaktuChakarabati


Yep, super straight-forward, thanks a bunch!

Guess I missed this piece of the wiki, looks like its going through alot of
updates towards
solr 1.4 release..

thanks,
-Chak

ryguasu wrote:
> 
> The documentation could maybe be improved, but the basics of backup
> snapshots with the in-process (Java-based) replication handler
> actually seem pretty straightforward to me, now that I understand it:
> 
> 1. You can make a snapshot whenever you want by hitting
> http://master_host:port/solr/replication?command=backup
> 
> 2. You can have automatically triggered snapshots at commit time or
> optimize time by putting a backupAfter tag in the replication handler
> section of your solrconfig.xml.
> 
> (See http://wiki.apache.org/solr/SolrReplication)
> 
> In neither case do you need to stop Solr or stop modifying your index
> while the backup is in progress.
> 
> Does anything in particular seem not straightforward? I guess there's
> no built-in way to purge old indexes from disk; that's a little
> inconvenient.
> 
> If you want to use the command-line tools, I think those should be
> totally compatible with the new (Java) replication tools. I don't know
> as much about them, though.
> 
> 2009/9/29 KaktuChakarabati :
>>
>> Hey,
>> I noticed with new in-process replication, it is not as straightforward
>> to
>> have
>> (production serving) solr index snapshots for backup (it used to be a
>> natural byproduct
>> of the snapshot taking process.)
>> I understand there are some command-line utilities for this (abc..)
>> Can someone please explain how to use these to take a snapshot
>> of a solr index, assuming it is being used in production? what are some
>> guidelines? should I stop
>> other processes that might be issuing updates and/or comitts while taking
>> it
>> or is it atomic (e.g hard link )?
>>
>> would be nice to have this in wiki too i think for the benefit of other
>> users,
>> having regular backup snapshots seems critical..
>>
>> Thanks,
>> -Chak
>> --
>> View this message in context:
>> http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25667145.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-backup-with-new-replication--tp25667145p25672927.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem getting Solr home from JNDI in Tomcat

2009-09-29 Thread Chris Hostetter


: Hi all, I'm having problems getting Solr to start on Tomcat 6.

which version of Solr?

: Tomcat is installed in /opt/apache-tomcat , solr is in
: /opt/apache-tomcat/webapps/solr , and my Solr home directory is /opt/solr .

if "solr is in /opt/apache-tomcat/webapps/solr" means that you put the 
solr.war in /opt/apache-tomcat/webapps/ and tomcat expanded it into  
/opt/apache-tomcat/webapps/solr then that is your problem -- tomcat isn't 
even looking at your context file (it only looks at the context files to 
ersolve URLs that it cant resolve looking in the webapps directory)

This is why the examples of using context files on the wiki talk about 
keeping the war *outside* of the webapps directory, and using docBase in 
your Context declaration...
  http://wiki.apache.org/solr/SolrTomcat



-Hoss

Re: Problem getting Solr home from JNDI in Tomcat

2009-09-29 Thread Chris Hostetter


: Now I've got a completely different error:
: "org.apache.lucene.index.CorruptIndexException: Unknown format version: -9".
: I think it might be time for a fresh install...

I've added a FAQ for this...

http://wiki.apache.org/solr/FAQ#What_does_.22CorruptIndexException:_Unknown_format_version.22_mean_.3F

What does "CorruptIndexException: Unknown format version" mean ?

This happens when the Lucene code in Solr used to read the index files 
from disk encounters index files in a format it doesn't recognize.

The most common cause is from using a version of Solr+Lucene that is older 
then the version used to create that index. 


-Hoss

Re: Questions on RandomSortField

2009-09-29 Thread Avlesh Singh

Thanks Hoss!
The approach that I explained in my subsequent email works like a charm.

Cheers
Avlesh

On Wed, Sep 30, 2009 at 3:45 AM, Chris Hostetter
wrote:

>
> : The question was either non-trivial or heavily uninteresting! No replies
> yet
>
> it's pretty non-trivial, and pretty interesting, but i'm also pretty
> behind on my solr-user email.
>
> I don't think there's anyway to do what you wanted without a custom
> plugin, so your efforts weren't in vain ... if we add the abiliity to sort
> by a ValueSource (aka function ... there's a Jira issue for this
> somewhere) then you could also do witha combination of functions so that
> anything in your category gets flattened to an extremely high constant,
> and everything else has a real score -- then a secondary sort on a random
> field would effectively only randomize the things in your category ... but
> we're not there yet.
>
> : Hoss, I have a small question (RandomSortField bears your signature) -
> Any
> : reason as to why RandomSortField#hash() and RandomSortField#getSeed()
> : methods are private? Having them public would have saved myself from
> : "owning" a copy in my class as well.
>
> just a general principle of API future-proofing: keep internals private
> unless you explicitly think through how subclasses will use them.
>
> I haven't thought it through all the way, but do you really need to copy
> everything?  couldn't you get the SortField/Comparator from super and
> only delegate to it if the categories both match your specific categoryId?
>
>
>
> -Hoss
>
>

Number of terms in a SOLR field

2009-09-29 Thread Fergus McMenemie

Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no
changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?

Regards Fergus.

Re: Number of terms in a SOLR field

2009-09-29 Thread Andrzej Bialecki


Fergus McMenemie wrote:

Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no

changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?


Luke uses brute force approach - it traverses all terms, and counts 
terms per field. This is easy to implement yourself - just get 
IndexReader.terms() enumeration and traverse it.



--
Best regards,
Andrzej Bialecki <><
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

47 matches

Mail list logo