date:20090325

Re: lucene-java version mismatches

2009-03-25 Thread Paul Libbrecht


could I suggest that the maven repositories are populated next-time a
release of solr-specific-lucenes are made?
But they are? It is inside the org.apache.solr group since those  
lucene jars

are released by Solr -- http://repo2.maven.org/maven2/org/apache/solr/


Nope,

http://repo1.maven.org/maven2/org/apache/solr/solr-lucene-core/1.3.0/

has no sources.
Only the solr-specific ones have.

paul

smime.p7s
Description: S/MIME cryptographic signature

Status of an update request

2009-03-25 Thread Pierre-Yves LANDRON


Hello,

When I send an update or a commit to solr via curl, the response I get is 
formated in HTML ; I can't find a way to have a machine readable response file.
Here what is said on the subject in the solr config file :
The response format differs from solr1.1 formatting and returns a standard 
error code.
 To enable solr1.1 behavior, remove the /update handler or change its path
What I want, however, is an accurate description of the error and not just a 
standard Apache error code.
Is there a way to obtain an XML response file from solr ?

Thanks,
Kind regards,
P-YL

_
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx

Re: lucene-java version mismatches

2009-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2009 at 12:30 PM, Paul Libbrecht p...@activemath.orgwrote:

 could I suggest that the maven repositories are populated next-time a
 release of solr-specific-lucenes are made?

 But they are? It is inside the org.apache.solr group since those lucene
 jars
 are released by Solr -- http://repo2.maven.org/maven2/org/apache/solr/


 Nope,

 http://repo1.maven.org/maven2/org/apache/solr/solr-lucene-core/1.3.0/

 has no sources.
 Only the solr-specific ones have.


Ah, I see. Solr's build uses the lucene binaries which are checked into the
SVN. So sources are a little more difficult to bundle. Either we'd need to
check in the lucene source jars as well or the ant build would need to check
out the lucene code with the same revision number and make a source jar.

Please open an issue in the jira. It might be difficult for me to find time
for this right now but we can decide on an acceptable approach. Also note
that lucene's revision number is mentioned in the CHANGES.txt

-- 
Regards,
Shalin Shekhar Mangar.

Anyone use solr admin and Opera?

2009-03-25 Thread ristretto.rb

Hello,  I'm a happy Solr user.  Thanks for the excellent software!!
Hopefully this is a good question, I have indeed looked around the FAQ
and google and such first.
I have just switched from Firefox to Opera for web browsing.  (Another story)
When I use the solr/admin the home page and stats works fine, but
searches return unformatted results
all run together.  If I get source, I see it is XML, and in fact, the
source is more readable then page
itself.  Perhaps I need a stylesheet, or something.  Are there there
any other Opera users that have gotten
past this problem.

Thanks
gene

numeric range facets

2009-03-25 Thread Ashish P


Similar to getting range facets for date where we specify start, end and gap.
Can we do the same thing for numeric facets where we specify start, end and
gap.
-- 
View this message in context: 
http://www.nabble.com/numeric-range-facets-tp22698330p22698330.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: get all facets

2009-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2009 at 7:30 AM, Ashish P ashish.ping...@gmail.com wrote:


 Can I get all the facets in QueryResponse??


You can get all the facets that are returned by the server. Set facet.limit
to the number of facets you want to retrieve.

See
http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/SolrQuery.html#setFacetLimit(int)
-- 
Regards,
Shalin Shekhar Mangar.

Re: numeric range facets

2009-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2009 at 3:26 PM, Ashish P ashish.ping...@gmail.com wrote:


 Similar to getting range facets for date where we specify start, end and
 gap.
 Can we do the same thing for numeric facets where we specify start, end and
 gap.


No. But you can do this with multiple queries by using facet.field with fq
parameters. If you are using the trunk then it should be possible to do this
with one query using the new multi-select facet feature.

See
http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c

-- 
Regards,
Shalin Shekhar Mangar.

Re: Anyone use solr admin and Opera?

2009-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2009 at 1:33 PM, ristretto.rb ristretto...@gmail.comwrote:

 Hello,  I'm a happy Solr user.  Thanks for the excellent software!!
 Hopefully this is a good question, I have indeed looked around the FAQ
 and google and such first.
 I have just switched from Firefox to Opera for web browsing.  (Another
 story)
 When I use the solr/admin the home page and stats works fine, but
 searches return unformatted results
 all run together.  If I get source, I see it is XML, and in fact, the
 source is more readable then page
 itself.  Perhaps I need a stylesheet, or something.  Are there there
 any other Opera users that have gotten
 past this problem.


I'd be interested in this too. Safari/Chrome also have the same problem,
they don't render raw xml nicely.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Status of an update request

2009-03-25 Thread Shalin Shekhar Mangar

On Wed, Mar 25, 2009 at 12:42 PM, Pierre-Yves LANDRON
pland...@hotmail.comwrote:


 Hello,

 When I send an update or a commit to solr via curl, the response I get is
 formated in HTML ; I can't find a way to have a machine readable response
 file.
 Here what is said on the subject in the solr config file :
 The response format differs from solr1.1 formatting and returns a standard
 error code.
  To enable solr1.1 behavior, remove the /update handler or change its path
 What I want, however, is an accurate description of the error and not just
 a standard Apache error code.
 Is there a way to obtain an XML response file from solr ?


If the update command executes successfully, then the response is XML. In
case of error, the error page is generated by the servlet container which is
HTML I guess. Not sure what can be done about that. Perhaps Solr can have
its own error pages which output XML with the stack trace information and
the correct HTTP return codes?

-- 
Regards,
Shalin Shekhar Mangar.

Deleting documents

2009-03-25 Thread Rui Pereira

I'm trying to delete documents based on the following type of update
requests:
deletequerytopologyid:3140/queryquerytopologyid:3142/query/delete

This doesn't cause any changes on index and if I try to read the response,
the following error ocurs:

13:32:35,196 ERROR [STDERR] 25/Mar/2009 13:32:35
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {} 0 16
13:32:35,196 ERROR [STDERR] 25/Mar/2009 13:32:35
org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: missing content stream
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:49)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:182)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:446)
at java.lang.Thread.run(Unknown Source)
13:32:35,196 ERROR [STDERR] 25/Mar/2009 13:32:35
org.apache.solr.core.SolrCore execute
INFO: [] webapp=/apache-solr-nightly path=/update
params={deletequerytopologyid:3142/query/delete=} status=400
QTime=16

Thanks in advance,
   Rui Pereira

Copy solr indexes from 2 solr instance

2009-03-25 Thread prerna07


Hi,

Issue 1:
I have 2 solr instances, i need to copy indexes from solr1 instance to solr2
without restarting the solr. 
Please suggest how will this work. Both solr are on multicore setup.

Issue2:
I deleted all indexes from solr and reloaded my core, solr admin return 0
results. 
The size of index folder under data directory of core has still number of
files?

Issue3:
Can I copy/paste data folder in running core of solr.

Thanks,
Prerna
-- 
View this message in context: 
http://www.nabble.com/Copy-solr-indexes-from-2-solr-instance-tp22702100p22702100.html
Sent from the Solr - User mailing list archive at Nabble.com.

speeding up indexing with a LOT of indexed fields

2009-03-25 Thread Britske

hi,

I'm having difficulty indexing a collection of documents in a reasonable
time.
it's now going at 20 docs / sec on a c1.xlarge instance of amazon ec2 which
just isnt enough.
This box has 8GB ram and the equivalent of 20 xeon processors.

these document have a couple of stored, indexed, multi and single-valued
fields, but the main problem lies in it having about 1500 indexed fields of
type sint. Range [0,1] (Yes, I know this is a lot)

I'm looking for some guidance as what strategies to try out to improve
throughput in indexing. I could slam in some more servers (I will) but my
feeling tells me I can get more out of this.

some additional info:
- I'm indexing to 10 cores in parallel. This is done because :
- at query time, 1 particular index will always fullfill all requests
so we can prune the search space to 1/10th of its original size.
- each document as represented in a core is actually 1/10th of a
'conceptual' document (which would contain up to 15000 indexed fields) if I
indexed to 1 core. Indexing as 1 doc containing 15.000 indexed fields proved
to give far worse results in searching and indexing than the solution i'm
going with now.
- the alternative of simply putting all docs with 1500 indexed field
each in the same core isn't really possible either, because this quickly
results in OOM-errors when sorting on a couple of fields. (even though 9/10
th of all docs in this case would not have the field sorted on, they would
still end up in a lucene fieldCache for this field)

- to be clear: the 20 docs / second means 2 docs / second / core. Or 2
'conceptual' docs / second overall.

- each core has maxBufferedDocs ~20 and mergeFactor~10 . (I actually set
them differently for each partition so that merges of different partitions
don't happen altogether. This seemed to help a bit)

- running jvm with -server -Xmx6000M -Xms6000M -XX:+UseParallelGC
-XX:+CMSPermGenSweepingEnabled -XX:MaxPermSize=128M to leave room for
diskcaching.

- I'm spreading the 10 indices over 2 physical disks. 5 to /dev/sda1 5 to
/dev/sdb

observations:
- within minutes after feeding the server reaches it's max ram.
- until then the processors are running on ~70%
- although I throw in a commit at random intervals (between 600 to 800 secs,
again so not to commit al partitions at the same time) the jvm just stays
eating all the ram.
- not a lot seems to be happening on disk (using dstat) when the ram hasn't
maxed out. Obviously, aftwerwards the disk is flooded with swapping.

questions:
- is there a good reason why all ram keeps occupied even though I commit
regularly? Perhaps fieldcaches get populated when indexing? I guess not, but
I'm not sure what else could explain this

- would splitting the 'conceptual docs' in even more partitions help at
indexing time? from an application standpoint it's possible, it just
requires some work and it's hard to compare figures so I'd like to know if
it's worth it .

- how is a flush different from a commit and would it help in getting the
ram-usage down?

- because all 15.000 indexed fields look very similar in structure (they are
all sints [0,1] to start with, I was looking for more efficient ways to
get them in an index using some low-level indexing operations. For example:
for a given document X and Y, and indexed fields 1,2.., i,...,N if X.a Y.a
than this ordening in a lot of cases holds for fields 2,...,N. Because of
these special properties I could possibly create a sorting algorithm that
takes advantage of this and thus would make indexing faster.
Would even considering this path be something that may be useful, because
obviously it would envolve some work to make it work, and presumably a lot
more work to get it to go faster than out of the box ?

- lastly: should I be able to get more out of this box or am I just
complaining ;-)

Thanks for making it to here,
and hoping to receive some valuable info,

Cheers,
Britske
--
View this message in context:
http://www.nabble.com/speeding-up-indexing-with-a-LOT-of-indexed-fields-tp22702364p22702364.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: speeding up indexing with a LOT of indexed fields

2009-03-25 Thread Otis Gospodnetic

Britske,

Here are a few quick ones:

- Does that machine really have 10 CPU cores? If it has significantly less,
you may be beyond the indexing sweet spot in terms of indexer threads vs. CPU
cores

- Your maxBufferedDocs is super small. Comment that out anyway. use
ramBufferedSizeMB and set it as high as you can afford. No need to commit very
often, certainly no need to flush or optimize until the end.

There is a page about indexing performance on either Solr or Lucene Wiki that
will help.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message
From: Britske gbr...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, March 25, 2009 10:05:17 AM
Subject: speeding up indexing with a LOT of indexed fields