Re: How does one sort facet queries?

2010-02-19 Thread gwk

On 2/19/2010 2:15 AM, Kelly Taylor wrote:

All sorting of facets works great at the field level (count/index)...all good
there...but how is sorting accomplished with range queries? The solrj
response doesn't seem to maintain the order the queries are sent in, and the
order is not in index or count order. What's the trick?

http://localhost:8983/solr/select?q=someterm
   rows=0
   facet=true
   facet.limit=-1
   facet.query=price:[* TO 100]
   facet.query=price:[100 TO 200]
   facet.query=price:[200 TO 300]
   facet.query=price:[300 TO 400]
   facet.query=price:[400 TO 500]
   facet.query=price:[500 TO 600]
   facet.query=price:[600 TO 700]
   facet.query=price:[700 TO *]
   facet.mincount=1
   collapse.field=dedupe_hash
   collapse.threshold=1
   collapse.type=normal
   collapse.facet=before

   
The trick I use is to use LocalParams to give eacht facet query a well 
defined name. Afterwards you can loop through the names in whatever 
order you want.

so basically facet.query={!key=price_0}[* TO 100] etc.

N.B. the facet queries in your example will lead to some documents to be 
counted double (i.e. when the price is exactly 100, 200, 300).


Regards,

gwk


Re: replications issue

2010-02-19 Thread giskard
Ciao,

Uhm after some time a new index in data/index on the slave has been written
with the ~size of the master index.

the configure on both master slave is the same one on the solrReplication wiki 
page
enable/disable master/slave in a node

requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
str name=enable${enable.master:false}/str 
str name=replicateAftercommit/str
str name=confFilesschema.xml,stopwords.txt/str
 /lst
 lst name=slave
str name=enable${enable.slave:false}/str 
   str name=masterUrlhttp://localhost:8983/solr/replication/str
   str name=pollInterval00:00:60/str
 /lst
/requestHandler

When the master is started, pass in -Denable.master=true and in the slave pass 
in -Denable.slave=true. Alternately , these values can be stored in a 
solrcore.properties file as follows

#solrcore.properties in master
enable.master=true
enable.slave=false

Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto:

 giskard,
 
 Is this on the master or on the slave(s)?
 Maybe you can paste your replication handler config for the master and your 
 replication handler config for the slave.
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 
 
 From: giskard gisk...@autistici.org
 To: solr-user@lucene.apache.org
 Sent: Thu, February 18, 2010 12:16:37 PM
 Subject: replications issue
 
 Hi all,
 
 I've setup solr replication as described in the wiki.
 
 when i start the replication a directory called index.$numebers is created 
 after a while
 it disappears and a new index.$othernumbers is created
 
 index/ remains untouched with an empty index.
 
 any clue?
 
 thank you in advance,
 Riccardo
 
 --
 ciao,
 giskard

--
ciao,
giskard





Question regarding wildcards and dismax

2010-02-19 Thread Roland Villemoes
Hi all,

We have a web application build on top of Solr, and we are using a lot of 
facets - everything works just fine.
When the user first hits the searchpage - we would like to do a get all query 
to the a result, and thereby get all facets so we can build up the user 
interface from this result/facets.

So I would like to do a q=*:* on the search. But since I have switched to the 
dismax requesthandler this does not work anymore. ?

My request/url looks like this:


a)   /solr/da/mysearcher/?q=*:*   Does not work

b)  /solr/da/select?q=*:*  Does work


But I really need to use a) since I control boosting/ranking in the definition.
Furthermore when the user drill down the search result, by selecting from the 
facets, I still need to get the full searchresult, like:

/solr/da/mysearcher/?q=*:*fq=color:red Does not work.

Can anyone help here? I think that the situation for my web application here is 
quite normal (Get a full resultset to build facets, then let the user du a 
drill down etc)


Thanks a lot in advance


med venlig hilsen/best regards

Roland Villemoes
Tel: (+45) 22 69 59 62
E-Mail: mailto:r...@alpha-solutions.dk

Alpha Solutions A/S
Borgergade 2, 3.sal, 1300 København K
Tel: (+45) 70 20 65 38
Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/

** This message including any attachments may contain confidential and/or 
privileged information intended only for the person or entity to which it is 
addressed. If you are not the intended recipient you should delete this 
message. Any printing, copying, distribution or other use of this message is 
strictly prohibited. If you have received this message in error, please notify 
the sender immediately by telephone, or e-mail and delete all copies of this 
message and any attachments from your system. Thank you.



Re: Question regarding wildcards and dismax

2010-02-19 Thread gwk
Have a look at the q.alt parameter 
(http://wiki.apache.org/solr/DisMaxRequestHandler#q.alt) which is used 
for exactly this issue. Basically putting q.alt=*:* in your query means 
you can leave out the q parameter if you want all documents to be selected.


Regards,

gwk

On 2/19/2010 11:28 AM, Roland Villemoes wrote:

Hi all,

We have a web application build on top of Solr, and we are using a lot of 
facets - everything works just fine.
When the user first hits the searchpage - we would like to do a get all query 
to the a result, and thereby get all facets so we can build up the user interface from 
this result/facets.

So I would like to do a q=*:* on the search. But since I have switched to the 
dismax requesthandler this does not work anymore. ?

My request/url looks like this:


a)   /solr/da/mysearcher/?q=*:*   Does not work

b)  /solr/da/select?q=*:*  Does work


But I really need to use a) since I control boosting/ranking in the definition.
Furthermore when the user drill down the search result, by selecting from the 
facets, I still need to get the full searchresult, like:

/solr/da/mysearcher/?q=*:*fq=color:red Does not work.
   





range of scores : queryNorm()

2010-02-19 Thread Smith G
Hello ,
   I have observed that even if we change boosting
drastically, scores are being normalized at the end because of
queryNorm value. Is there anything ( regarding to the queryNorm) that
we can rely on ? like score will always be under 10 or some fixed
value ? The main objective is to provide scores in a fixed range to
the partner. So have you been experienced anything like this? Is it
possible to do so ?.
Have you been experienced any strange situation like for a
particular query, result scores were really high compared to routine?
if yes,I would like to know  the factor that effected scores
drastically, because it may help me to proceed or understand the
cases.

Thanks.


Re: Range Searches in Collections

2010-02-19 Thread cjkadakia

Unfortunately the number of fees is unknown so we couldn't add the fields
into the solr schema until runtime. The work-around we did was create an
additional column in the view I'm pulling from for the index to determine
each record's minimum fee and throw that into the column. A total hack,
but now I can simply sort on the minFee and problem (hackingly) solved! :)

Otis Gospodnetic wrote:
 
 Hm, yes, it sounds like your fees field has multiple values/tokens, one
 for each fee.  That's full-text search for you. :)
 How about having multiple fee fields, each with just one fee value?
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 
 
 
 
 From: cjkadakia cjkada...@sonicbids.com
 To: solr-user@lucene.apache.org
 Sent: Thu, February 18, 2010 7:58:23 PM
 Subject: Range Searches in Collections
 
 
 Hi, I'm trying to do a search on a range of floats that are part of my
 solr
 schema. Basically we have a collection of fees that are associated with
 each document in our index.
 
 The query I tried was:
 
 q=fees:[3 TO 10]
 
 This should return me documents with Fee values between 3 and 10
 inclusively, which it does. However, I need it to check for ALL items in
 this collection, not just one that satisfies it. Currently, this is
 returning me documents with fee values above 10 and below 3 as long as it
 contains at least one other within.
 
 Any suggestions on how to accomplish this?
 -- 
 View this message in context:
 http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27648470.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 

-- 
View this message in context: 
http://old.nabble.com/Range-Searches-in-Collections-tp27648470p27653341.html
Sent from the Solr - User mailing list archive at Nabble.com.



highlighting fragments EMPTY

2010-02-19 Thread adeelmahmood

hi
i am trying to get highlighting working and its turning out to be a pain.
here is my schema

field name=id type=string indexed=true stored=true required=true
/ 
field name=title type=string indexed=true stored=true  / 
field name=pi type=string indexed=true stored=true / 
field name=status type=string indexed=true stored=true / 

here is the catchall field (default field for search as well)
field name=content type=text indexed=true stored=false
multiValued=true/

here is how I have setup the solrconfig file
!-- example highlighter config, enable per-query with hl=true --
 str name=hl.fltitle pi status/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.title.hl.alternateFieldcontent/str
 str name=f.pi.hl.alternateFieldcontent/str
 str name=f.status.hl.alternateFieldcontent/str
 
 str name=f.title.hl.fragmenterregex/str !-- defined below --
 str name=f.pi.hl.fragmenterregex/str !-- defined below --
 str name=f.status.hl.fragmenterregex/str !-- defined below --  

after this when I search for lets say
http://localhost:8983/solr/select?q=submithl=true
I get these results in highlight section
lst name=highlighting
  lst name=FP1934 / 
  lst name=FP1934-PR02 / 
  lst name=FP1934-PR03 / 
  lst name=FP0526 / 
  lst name=FP0385 / 
  /lst
with no reference to the actual string .. this number thats being returned
is the id of the records .. and is also the unique identifier .. why am I
not getting the string fragments with search terms highlighted

thanks for ur help
-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: highlighting fragments EMPTY

2010-02-19 Thread Jan
All of your fields seem to be of a string type, that's why the highlighting 
doesn't work. 

The highlighting fields must be tokenized before you can do the highlighting on 
them. 

Jan.


--- On Fri, 2/19/10, adeelmahmood adeelmahm...@gmail.com wrote:

From: adeelmahmood adeelmahm...@gmail.com
Subject: highlighting fragments EMPTY
To: solr-user@lucene.apache.org
Date: Friday, February 19, 2010, 4:46 PM


hi
i am trying to get highlighting working and its turning out to be a pain.
here is my schema

field name=id type=string indexed=true stored=true required=true
/ 
field name=title type=string indexed=true stored=true  / 
field name=pi type=string indexed=true stored=true / 
field name=status type=string indexed=true stored=true / 

here is the catchall field (default field for search as well)
field name=content type=text indexed=true stored=false
multiValued=true/

here is how I have setup the solrconfig file
!-- example highlighter config, enable per-query with hl=true --
     str name=hl.fltitle pi status/str
     !-- for this field, we want no fragmenting, just highlighting --
     str name=f.name.hl.fragsize0/str
     !-- instructs Solr to return the field itself if no query terms are
          found --
     str name=f.title.hl.alternateFieldcontent/str
 str name=f.pi.hl.alternateFieldcontent/str
 str name=f.status.hl.alternateFieldcontent/str
     
 str name=f.title.hl.fragmenterregex/str !-- defined below --
 str name=f.pi.hl.fragmenterregex/str !-- defined below --
 str name=f.status.hl.fragmenterregex/str !-- defined below --    
    
after this when I search for lets say
http://localhost:8983/solr/select?q=submithl=true
I get these results in highlight section
lst name=highlighting
  lst name=FP1934 / 
  lst name=FP1934-PR02 / 
  lst name=FP1934-PR03 / 
  lst name=FP0526 / 
  lst name=FP0385 / 
  /lst
with no reference to the actual string .. this number thats being returned
is the id of the records .. and is also the unique identifier .. why am I
not getting the string fragments with search terms highlighted

thanks for ur help
-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27654005.html
Sent from the Solr - User mailing list archive at Nabble.com.




  

Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Yonik Seeley
On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton glen.new...@gmail.com wrote:
 You may consider using LuSql[1] to create the indexes, if your source
 content is in a JDBC accessible db. It is quite a bit faster than
 Solr, as it is a tool specifically created and tuned for Lucene
 indexing.

Any idea why it's faster?
AFAIK, the main purpose of DIH is indexing databases too.  If DIH is
much slower, we should speed it up!

-Yonik
http://www.lucidimagination.com


Re: long warmup duration

2010-02-19 Thread Stefan Neumann
Hey,

I am quite confused with your configuration. It seems to me, that your
caches are extremly small for 30 million documents (128) and during
warmup you only put up to 20 docs in it. Please correct me if I
misunderstand anything.

In my opinion your warm up duration is not that impressiv, since we
currently disabled warmup, the new searcher is registered only in a few
seconds.

Actually, I would not drop these cache numbers. With a cache of 30k
documents we had a hitraion of 60%, decreasing this size the hitratio
decreased as well. With a hitratio of currently 30% it seems to be
better to disable caching anyway. Of course we would love to use caching
;-).

with best regards,

Stefan


Antonio Lobato wrote:
 Drop those cache numbers.  Way down.  I warm up 30 million documents in about 
 2 minutes with the following configuration:
 
   documentCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 cleanupThread=true /
 
   queryResultCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /
 
   fieldValueCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /
 
   filterCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /
 
 Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
 so:
 lst str name=qdate:[NOW-2DAYS TO NOW]/str str name=start0/str 
 str name=rows100/str str name=sortdate desc/str/lst
 
 Don't warm facets that have a large amount of terms or you will kill your 
 warm up time.
 
 Hope this helps!
 
 On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:
 
 Hi all,

 we are facing extremly increasing warmup times the last 15 days, which
 we are not able to explain, since the number of documents and their size
 is stable. Before the increase we can commit our changes in nearly 20
 minutes, now it is about 2 hours.

 We were able to identify the warmup of the caches (queryresultCache and
 filterCache) as the reason. We tried to decrease the number of warmup
 elements from 3 to 1 without any impact.

 What influences the runtime during the warmup? Is there any possibility
 to boost the warmup?

 I attach some more information and statistics.

 Thanks a lot for your help.

 Stefan


 Solr:1.3
 Documents:   4.000.000
 -Xmx 12G
 index size/disc 4.7G

 config:

 queryResultWindowSize100/queryResultWindowSize
 queryResultMaxDocsCached200/queryResultMaxDocsCached

 No queries configured for warming.

 CACHES:
 ===

 name:   queryResultCache
 class:  org.apache.solr.search.LRUCache
 version:1.0
 description:LRU Cache(maxSize=20,
  initialSize=3,
autowarmCount=1,
  regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
 stats:

 lookups:15958
 hits :  9589
 hitratio:   0.60
 inserts:16211
 evictions:  0
 size:   16169
 warmupTime :1960239
 cumulative_lookups: 436250
 cumulative_hits:260678
 cumulative_hitratio:0.59
 cumulative_inserts: 174066
 cumulative_evictions:   0


 name:filterCache
 class:   org.apache.solr.search.LRUCache
 version: 1.0
 description: LRU Cache(maxSize=20,
initialSize=3,
  autowarmCount=3,
  regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
 stats:   
 lookups: 6313622
 hits:   6304004
 hitratio: 0.99
 inserts: 42266
 evictions: 0
 size: 40827
 warmupTime: 1268074
 cumulative_lookups: 118887830
 cumulative_hits: 118605224
 cumulative_hitratio: 0.99
 cumulative_inserts: 296134
 cumulative_evictions: 0



 
 

-- 

Stefan Neumann
Dipl.-Ing.

freiheit.com technologies gmbh
Straßenbahnring 22 / 20251 Hamburg, Germany
fon   +49 (0)40 / 890584-0
fax   +49 (0)40 / 890584-20
HRB Hamburg 70814

1CB2 BA3C 168F 0C2B 6005 FC5E 3EBA BCE2 1BF0 21D3
Geschäftsführer: Claudia Dietze, Stefan Richter, Jörg Kirchhof



Re: Run Solr within my war

2010-02-19 Thread Pulkit Singhal
Using EmbeddedSolrServer is a client side way of communicating with
Solr via the file system. Solr has to still be up and running before
that. My question is more along the lines of how to put the server
jars that perform the core functionality and bundle them to start up
within a war which is also the application war for the program that
will communicate as the client with the Solr server.

On Thu, Feb 18, 2010 at 5:49 PM, Richard Frovarp rfrov...@apache.org wrote:
 On 2/18/2010 4:22 PM, Pulkit Singhal wrote:

 Hello Everyone,

 I do NOT want to host Solr separately. I want to run it within my war
 with the Java Application which is using it. How easy/difficult is
 that to setup? Can anyone with past experience on this topic, please
 comment.

 thanks,
 - Pulkit



 So basically you're talking about running an embedded version of Solr like
 the EmbeddedSolrServer? I have no experience on this, but this should
 provide you the correct search term to find documentation on use. From what
 little code I've seen to run test cases against Solr, it looks relatively
 straight forward to get running. To use you would use the SolrJ library to
 communicate with the embedded solr server.

 Richard



Re: @Field annotation support

2010-02-19 Thread Pulkit Singhal
Ok then, is this the correct class to support the @Field annotation?
Because I have it on the path but its not working.

org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class

2010/2/18 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 solrj jar

 On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal
 pulkitsing...@gmail.com wrote:
 Hello All,

 When I use Maven or Eclipse to try and compile my bean which has the
 @Field annotation as specified in http://wiki.apache.org/solr/Solrj
 page ... the compiler doesn't find any class to support the
 annotation. What jar should we use to bring in this custom Solr
 annotation?




 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Re: Run Solr within my war

2010-02-19 Thread Richard Frovarp

Pulkit Singhal wrote:

Using EmbeddedSolrServer is a client side way of communicating with
Solr via the file system. Solr has to still be up and running before
that. My question is more along the lines of how to put the server
jars that perform the core functionality and bundle them to start up
within a war which is also the application war for the program that
will communicate as the client with the Solr server.
  
I could be way wrong, but my interpretation is that EmbeddedSolrServer 
provides a way to embed Solr into an application without requiring that 
anything else is running.


http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

If you are looking for a method of your application doing SolrJ calls to 
Solr, without having to install a separate Solr instance, 
EmbeddedSolrServer would meet your needs. You'd have to use a few other 
functions to load the core and register it, but it's doable without 
having anything else running.


Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Tom Burton-West

Hi Glen,

I'd love to use LuSql, but our data is not in a db.  Its 6-8TB of files
containing OCR (one file per page for about 1.5 billion pages) gzipped on
disk which are ugzipped, concatenated, and converted to Solr documents
on-the-fly.  We have multiple instances of our Solr document producer script
running. At this point we can run enough producers, so that the rate at
which Solr can ingest and index documents is our current bottleneck and so
far the bottleneck we see for indexing appears to be disk I/O for
Solr/Lucene during merges.

Is there any obvious relationship between the size of the ramBuffer and how
much heap you need to give the JVM, or is there some reasonable method of
finding this out by experimentation?
We would rather not find out by decreasing the amount of memory allocated to
the JVM until we get an OOM.

Tom



I've run Lucene with heap sizes as large as 28GB of RAM (on a 32GB
machine, 64bit, Linux) and a ramBufferSize of 3GB. While I haven't
noticed the GC issues mark mentioned in this configuration, I have
seen them in the ranges he discusses (on 1.6 update 18).

You may consider using LuSql[1] to create the indexes, if your source
content is in a JDBC accessible db. It is quite a bit faster than
Solr, as it is a tool specifically created and tuned for Lucene
indexing. But it is command-line, not RESTful like Solr. The released
version of LuSql only runs single machine (though designed for many
threads), the new release will allow distributing indexing across any
number of machines (with each machine building a shard). The new
release also has plugable sources, so it is not restricted to JDBC.

-Glen
[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql


-- 
View this message in context: 
http://old.nabble.com/What-is-largest-reasonable-setting-for-ramBufferSizeMB--tp27631231p27658384.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multicore Example

2010-02-19 Thread Pascal Dimassimo

Are you sure that you don't have any java processes that are still running?

Did you change the port or are you still using 8983?


Lee Smith-6 wrote:
 
 Hey All
 
 Trying to dip my feet into multicore and hoping someone can advise why the
 example is not working.
 
 Basically I have been working with the example single core fine so I have
 stopped the server and restarted with the new command line for multicore
 
 ie, java -Dsolr.solr.home=multicore -jar start.jar
 
 When it launches I get this error:
 
 2010-02-19 11:13:39.740::WARN:  EXCEPTION
 java.net.BindException: Address already in use
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at etc
 
 Any ideas what this can be because I have stopped the first one.
 
 Thank you if you can advise.
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Multicore-Example-tp27659052p27659102.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multicore Example

2010-02-19 Thread Dave Searle
Do you have something else using port 8983 or 8080?

Sent from my iPhone

On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote:

 Hey All

 Trying to dip my feet into multicore and hoping someone can advise  
 why the example is not working.

 Basically I have been working with the example single core fine so I  
 have stopped the server and restarted with the new command line for  
 multicore

 ie, java -Dsolr.solr.home=multicore -jar start.jar

 When it launches I get this error:

 2010-02-19 11:13:39.740::WARN:  EXCEPTION
 java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at etc

 Any ideas what this can be because I have stopped the first one.

 Thank you if you can advise.




Re: Seattle Hadoop/Lucene/NoSQL Meetup; Wed Feb 24th, Feat. MongoDB

2010-02-19 Thread Nick Dimiduk
Reminder: this month's Seattle Hadoop Meetup is this Wednesday. Don't forget
to RSVP!

On Tue, Feb 16, 2010 at 6:09 PM, Bradford Stephens 
bradfordsteph...@gmail.com wrote:

 Greetings,

 It's time for another awesome Seattle Hadoop/Lucene/Scalability/NoSQL
 Meetup!

 As always, it's at the University of Washington, Allen Computer
 Science building, Room 303 at 6:45pm. You can find a map here:
 http://www.washington.edu/home/maps/southcentral.html?cse

 Last month, we had a great talk from Steve McPherson of Razorfish on
 their usage of Hadoop. This month, we'll have Richard Kreuter from
 MongoDB talking about, well, MongoDB. As well as assorted discussion
 on the Hadoop ecosystem.

 If you can, please RSVP here (not required, but very nice):
 http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/

 My cell # is 904-415-3009 if you have questions/get lost.

 Cheers,
 Bradford

 --
 http://www.drawntoscalehq.com -- Big Data for all. The Big Data Platform.

 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science



Re: Multicore Example

2010-02-19 Thread Lee Smith
How can I find out ??


On 19 Feb 2010, at 19:26, Dave Searle wrote:

 Do you have something else using port 8983 or 8080?
 
 Sent from my iPhone
 
 On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote:
 
 Hey All
 
 Trying to dip my feet into multicore and hoping someone can advise  
 why the example is not working.
 
 Basically I have been working with the example single core fine so I  
 have stopped the server and restarted with the new command line for  
 multicore
 
 ie, java -Dsolr.solr.home=multicore -jar start.jar
 
 When it launches I get this error:
 
 2010-02-19 11:13:39.740::WARN:  EXCEPTION
 java.net.BindException: Address already in use
   at java.net.PlainSocketImpl.socketBind(Native Method)
   at etc
 
 Any ideas what this can be because I have stopped the first one.
 
 Thank you if you can advise.
 
 



Strange performance behaviour when concurrent requests are done

2010-02-19 Thread Marc Sturlese

Hey there,
I have been doing some stress with a 2 physical CPU (with 4 cores each)
server.
After some reading about GC performance tunning I have configured it this
way:

/usr/lib/jvm/java-6-sun/bin/java -server -Xms7000m -Xmx7000m
-XX:ReservedCodeCacheSize=10m -XX:NewSize=1000m -XX:MaxNewSize=1000m
-XX:SurvivorRatio=16 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSParallelRemarkEnabled -XX:+CMSClassUnloadingEnabled -XX:PermSize=35m
-XX:MaxPermSize=35m
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Djava.endorsed.dirs=/opt/tomcat-shard-00/common/endorsed

My java version is:
java version 1.6.0_12
Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)

My index is optimized with compound file and in readOnly mode. I just have a
solr core in my Solr box.

I have launched different test against an index and for my surprise the
results are:

test1:
number of concurrent threads: 2
throughput: 15
response ms: 130

test2:
number of concurrent threads: 3
throughput: 22.3
average response ms: 130

test3:
number of concurrent threads: 4
throughput: 28
average response ms: 140

test4:
number of concurrent threads: 5
throughput: 26.8
average response ms: 190

test5:
number of concurrent threads: 6
throughput: 22
average response ms: 270

All requests are launched to the same IndexSearcher (no reloads or warmings
are done during the test)
I have activated the debug in the JVM to see when a GC happens. It is
happening every 3 seconds and it takes 20ms aprox in test1,test2,test3.
In test4 and test5 it happens every 3 seconds aswell and takes 40ms. So,
looks like GC is not delaying the average 
response time of the requests.
The machine has 4 cores and it is really not stressed in terms of CPU,
neighter IO (I am using ssd disk).

Given this scenario, how is it possible that changing from 5 concurrent
threads to 6 the average response time is almost double?
(or from 4 to 5 is not double but it still significantly more)
I think GC can't be the cause given the numbers I have mencioned.
As far as I always have understood Lucene IndexSearcher deals perfectly with
concurreny but it's seems that there's something there that blocks
when there is more that 2 requests at the same time.

Compound file optimization gives better response times but could in any way
be bad for performance?

I am so confused about this... can someone explain me if this is normal or
why does it happens? I mean, if lucene or Solr has some blocking thing?

Thanks in advance

-- 
View this message in context: 
http://old.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp27659695p27659695.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Documents disappearing

2010-02-19 Thread Ankit Bhatnagar
Try inspecting your index with luke


Ankit


-Original Message-
From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
Sent: Friday, February 19, 2010 2:22 PM
To: solr-user@lucene.apache.org
Subject: Documents disappearing


Hi,

I have encounter a situation that I can't explain. We are indexing documents
that are often duplicates so we activated deduplication like this:

processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupestrue/bool
  str name=signatureFieldsignature/str
  str name=fieldstitle,text/str
  str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor

What I can't explain is that when I look at the documents count in the log,
I see documents disappearing.

11:24:23 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0
14:04:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10
14:17:07 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42
14:25:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1
14:47:12 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12
15:17:22 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13
15:47:31 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19
16:17:42 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13
16:38:17 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10
16:39:10 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1
16:47:40 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46
16:51:24 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74
17:02:13 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102
17:17:41 INFO  - [myindex] webapp=null path=null
params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8

11:24 was the time at which Solr was started that day. Around 13:30, we
started the indexation.

At some point during the indexation, I notice that a batch a documents were
resend (i.e, documents with the same id field were sent again to the index).
And according to the log, NO delete was sent to Solr.

I understand that if I send duplicates (either documents with the same id or
with the same signature), the count of documents should stay the same. But
how can we explain that it is lowering? What are the possible causes of this
behavior?

Thanks! 
-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multicore Example

2010-02-19 Thread Dave Searle
Are you on windows? Try netstat -a

Sent from my iPhone

On 19 Feb 2010, at 20:02, Lee Smith l...@weblee.co.uk wrote:

 How can I find out ??


 On 19 Feb 2010, at 19:26, Dave Searle wrote:

 Do you have something else using port 8983 or 8080?

 Sent from my iPhone

 On 19 Feb 2010, at 19:22, Lee Smith l...@weblee.co.uk wrote:

 Hey All

 Trying to dip my feet into multicore and hoping someone can advise
 why the example is not working.

 Basically I have been working with the example single core fine so I
 have stopped the server and restarted with the new command line for
 multicore

 ie, java -Dsolr.solr.home=multicore -jar start.jar

 When it launches I get this error:

 2010-02-19 11:13:39.740::WARN:  EXCEPTION
 java.net.BindException: Address already in use
  at java.net.PlainSocketImpl.socketBind(Native Method)
  at etc

 Any ideas what this can be because I have stopped the first one.

 Thank you if you can advise.





Re: Multicore Example

2010-02-19 Thread Shawn Heisey
Assuming you are on a unix variant with a working lsof, use this.  This 
probably won't work correctly on Solaris 10:


lsof -nPi | grep 8983
lsof -nPi | grep 8080

On Windows, you can do this in a command prompt.  It requires elevation 
on Vista or later.  The -b option was added in WinXP SP2 and Win2003 
SP1, without it you can't see the program name that's got the port open:


netstat -b  ports.txt
ports.txt

Shawn


On 2/19/2010 1:01 PM, Lee Smith wrote:

How can I find out ??


On 19 Feb 2010, at 19:26, Dave Searle wrote:

   

Do you have something else using port 8983 or 8080?
 




RE: Documents disappearing

2010-02-19 Thread Pascal Dimassimo

Using LukeRequestHandler, I see:

int name=numDocs7725/int
int name=maxDoc28099/int
int name=numTerms758826/int
long name=version1266355690710/long
bool name=optimizedfalse/bool
bool name=currenttrue/bool
bool name=hasDeletionstrue/bool
str name=directory
org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index
/str

I will copy the index to my local machine so I can open it with luke. Should
I look for something specific?

Thanks!


ANKITBHATNAGAR wrote:
 
 Try inspecting your index with luke
 
 
 Ankit
 
 
 -Original Message-
 From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
 Sent: Friday, February 19, 2010 2:22 PM
 To: solr-user@lucene.apache.org
 Subject: Documents disappearing
 
 
 Hi,
 
 I have encounter a situation that I can't explain. We are indexing
 documents
 that are often duplicates so we activated deduplication like this:
 
 processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   bool name=overwriteDupestrue/bool
   str name=signatureFieldsignature/str
   str name=fieldstitle,text/str
   str
 name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
 /processor
 
 What I can't explain is that when I look at the documents count in the
 log,
 I see documents disappearing.
 
 11:24:23 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0
 14:04:24 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10
 14:17:07 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42
 14:25:42 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1
 14:47:12 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12
 15:17:22 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13
 15:47:31 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19
 16:17:42 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13
 16:38:17 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10
 16:39:10 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1
 16:47:40 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46
 16:51:24 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74
 17:02:13 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102
 17:17:41 INFO  - [myindex] webapp=null path=null
 params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8
 
 11:24 was the time at which Solr was started that day. Around 13:30, we
 started the indexation.
 
 At some point during the indexation, I notice that a batch a documents
 were
 resend (i.e, documents with the same id field were sent again to the
 index).
 And according to the log, NO delete was sent to Solr.
 
 I understand that if I send duplicates (either documents with the same id
 or
 with the same signature), the count of documents should stay the same. But
 how can we explain that it is lowering? What are the possible causes of
 this
 behavior?
 
 Thanks! 
 -- 
 View this message in context:
 http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multicore Example

2010-02-19 Thread Lee Smith
Thanks Shawn

I am actually running it on mac

It does not like those unix commands ??

Any further advice ?

Lee

On 19 Feb 2010, at 20:32, Shawn Heisey wrote:

 Assuming you are on a unix variant with a working lsof, use this.  This 
 probably won't work correctly on Solaris 10:
 
 lsof -nPi | grep 8983
 lsof -nPi | grep 8080
 
 On Windows, you can do this in a command prompt.  It requires elevation on 
 Vista or later.  The -b option was added in WinXP SP2 and Win2003 SP1, 
 without it you can't see the program name that's got the port open:
 
 netstat -b  ports.txt
 ports.txt
 
 Shawn
 
 
 On 2/19/2010 1:01 PM, Lee Smith wrote:
 How can I find out ??
 
 
 On 19 Feb 2010, at 19:26, Dave Searle wrote:
 
   
 Do you have something else using port 8983 or 8080?
 
 



Re: Multicore Example

2010-02-19 Thread K Wong
The point that these guys are trying to make is that if another
program is using the port that Solr is trying to bind to then they
will both fight over the exclusive use of the port.

Both the netstat and lsof command work fine on my Mac (Leopard 10.5.8).

Trinity:~ kelvin$ which netstat
/usr/sbin/netstat
Trinity:~ kelvin$ which lsof
/usr/sbin/lsof
Trinity:~ kelvin$

If you use MacPorts, you can also find out port information using 'nmap'.

If something is already using the port Solr is trying to use then you
need to configure Solr to use a different port.

K



On Fri, Feb 19, 2010 at 12:51 PM, Lee Smith l...@weblee.co.uk wrote:
 Thanks Shawn

 I am actually running it on mac

 It does not like those unix commands ??

 Any further advice ?

 Lee

 On 19 Feb 2010, at 20:32, Shawn Heisey wrote:

 Assuming you are on a unix variant with a working lsof, use this.  This 
 probably won't work correctly on Solaris 10:

 lsof -nPi | grep 8983
 lsof -nPi | grep 8080

 On Windows, you can do this in a command prompt.  It requires elevation on 
 Vista or later.  The -b option was added in WinXP SP2 and Win2003 SP1, 
 without it you can't see the program name that's got the port open:

 netstat -b  ports.txt
 ports.txt

 Shawn


 On 2/19/2010 1:01 PM, Lee Smith wrote:
 How can I find out ??


 On 19 Feb 2010, at 19:26, Dave Searle wrote:


 Do you have something else using port 8983 or 8080?






Solr 1.5 in production

2010-02-19 Thread Asif Rahman
What is the prevailing opinion on using solr 1.5 in a production
environment?  I know that many people were using 1.4 in production for a
while before it became an official release.

Specifically I'm interested in using some of the new spatial features.

Thanks,

Asif

-- 
Asif Rahman
Lead Engineer - NewsCred
a...@newscred.com
http://platform.newscred.com


Re: Solr 1.5 in production

2010-02-19 Thread Grant Ingersoll

On Feb 19, 2010, at 4:54 PM, Asif Rahman wrote:

 What is the prevailing opinion on using solr 1.5 in a production
 environment?  I know that many people were using 1.4 in production for a
 while before it became an official release.
 
 Specifically I'm interested in using some of the new spatial features.

These aren't fully baked yet (still need some spatial filtering capabilities 
which I'm getting close to done with, or close enough to submit a patch 
anyway), but feedback would be welcome.  The main risk, I suppose, is that any 
new APIs could change.  Other than that, the usually advice applies:  Test it 
out in your environment and see if it meets your needs.  On the spatial stuff, 
we'd definitely appreciate feedback on performance, functionality, APIs, etc.

-Grant

Re: long warmup duration

2010-02-19 Thread Antonio Lobato
You can disable warming, and a new searcher will register (almost) 
instantly, no matter the size.  However, once you run your first search, 
you will be warming your searcher, and it will block for a long, long 
time, giving the end user a frozen page.


Warming is just another word for running a set of queries before the 
searcher is pushed to the front end.  Naturally if you disable warming, 
your searcher will register right away.  I wouldn't recommend it 
though.  If I disable warming on my documents, my new searchers would 
register instantly, but my first search on my web page would be stuck 
for 50 seconds or so.


As for the cache size, caching does a cache on entry data, not 
documents.  That's what warming is for.


On 2/19/2010 12:17 PM, Stefan Neumann wrote:

Hey,

I am quite confused with your configuration. It seems to me, that your
caches are extremly small for 30 million documents (128) and during
warmup you only put up to 20 docs in it. Please correct me if I
misunderstand anything.

In my opinion your warm up duration is not that impressiv, since we
currently disabled warmup, the new searcher is registered only in a few
seconds.

Actually, I would not drop these cache numbers. With a cache of 30k
documents we had a hitraion of 60%, decreasing this size the hitratio
decreased as well. With a hitratio of currently 30% it seems to be
better to disable caching anyway. Of course we would love to use caching
;-).

with best regards,

Stefan


Antonio Lobato wrote:
   

Drop those cache numbers.  Way down.  I warm up 30 million documents in about 2 
minutes with the following configuration:

   documentCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 cleanupThread=true /

   queryResultCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /

   fieldValueCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /

   filterCache
 class=solr.FastLRUCache
 size=128
 initialSize=10
 autowarmCount=20
 cleanupThread=true /

Mind you, I also use Solr 1.4.  Also, setup a decent warming query or two, as 
so:
lst  str name=qdate:[NOW-2DAYS TO NOW]/str  str name=start0/str  str 
name=rows100/str  str name=sortdate desc/str/lst

Don't warm facets that have a large amount of terms or you will kill your warm 
up time.

Hope this helps!

On Feb 17, 2010, at 8:55 AM, Stefan Neumann wrote:

 

Hi all,

we are facing extremly increasing warmup times the last 15 days, which
we are not able to explain, since the number of documents and their size
is stable. Before the increase we can commit our changes in nearly 20
minutes, now it is about 2 hours.

We were able to identify the warmup of the caches (queryresultCache and
filterCache) as the reason. We tried to decrease the number of warmup
elements from 3 to 1 without any impact.

What influences the runtime during the warmup? Is there any possibility
to boost the warmup?

I attach some more information and statistics.

Thanks a lot for your help.

Stefan


Solr:   1.3
Documents:  4.000.000
-Xmx12G
index size/disc 4.7G

config:

queryResultWindowSize100/queryResultWindowSize
queryResultMaxDocsCached200/queryResultMaxDocsCached

No queries configured for warming.

CACHES:
===

name:   queryResultCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=1,
regenerator=org.apache.solr.search.solrindexsearche...@36eb7331)
stats:

lookups:15958
hits :  9589
hitratio:   0.60
inserts:16211
evictions:  0
size:   16169
warmupTime :1960239
cumulative_lookups: 436250
cumulative_hits:260678
cumulative_hitratio:0.59
cumulative_inserts: 174066
cumulative_evictions:   0


name:   filterCache
class:  org.apache.solr.search.LRUCache
version:1.0
description:LRU Cache(maxSize=20,
  initialSize=3,
  autowarmCount=3,  
regenerator=org.apache.solr.search.solrindexsearche...@9818f80)
stats:  
lookups:6313622
hits:   6304004
hitratio: 0.99
inserts: 42266
evictions: 0
size: 40827
warmupTime: 1268074
cumulative_lookups: 118887830
cumulative_hits: 118605224
cumulative_hitratio: 0.99
cumulative_inserts: 296134
cumulative_evictions: 0



   


 
   


filter result by catalog

2010-02-19 Thread Kevin Osborn
So, I am looking at better ways to filter a resultset by catalog. So, I have an 
index of products. And based on the user, I want to filter the search results 
to what they are allowed to see. I will probably have up to 200 or so different 
catalogs.



  

Re: highlighting fragments EMPTY

2010-02-19 Thread adeelmahmood

well ok I guess that makes sense and I tried changing my title field to text
type and then highlighting worked on it .. but
1) as far as not merging all fields in catchall field and instead
configuring the dismax handler to search through them .. do you mean then
ill have to specify the field I want to do the search in .. e.g.
q=somethinghl.fl=title or q=somethingelsehl.fl=status .. and another thing
is that I have abuot 20 some fields which I am merging in my catch all
fields .. with that many fields do you still think its better to use dismax
or catchall field ???

2) secondly for highlighting q=title:searchterm also didnt worked .. it only
works if I change the type of title field to text instead of string .. even
if I give the full string in q param .. it still doesnt highlights it unless
like I said I change the field type to text ...  so why is that .. and if
thats just how it is and I have to change some of my fields to text .. then
my question is that solr will analyze them first their own field and then
copy them to the catchall field while doing the analysis one more time ..
since catchall field is also text .. i guess this is just more of a
understanding question

thanks for all u guys help


Ahmet Arslan wrote:
 
 hi
 i am trying to get highlighting working and its turning out
 to be a pain.
 here is my schema
 
 field name=id type=string indexed=true
 stored=true required=true
 / 
 field name=title type=string indexed=true
 stored=true  / 
 field name=pi type=string indexed=true
 stored=true / 
 field name=status type=string indexed=true
 stored=true / 
 
 here is the catchall field (default field for search as
 well)
 field name=content type=text indexed=true
 stored=false
 multiValued=true/
 
 here is how I have setup the solrconfig file
 !-- example highlighter config, enable per-query with
 hl=true --
      str name=hl.fltitle pi
 status/str
      !-- for this field, we want no
 fragmenting, just highlighting --
      str
 name=f.name.hl.fragsize0/str
      !-- instructs Solr to return
 the field itself if no query terms are
           found --
      str
 name=f.title.hl.alternateFieldcontent/str
  str
 name=f.pi.hl.alternateFieldcontent/str
  str
 name=f.status.hl.alternateFieldcontent/str
      
  str
 name=f.title.hl.fragmenterregex/str !--
 defined below --
  str
 name=f.pi.hl.fragmenterregex/str !--
 defined below --
  str
 name=f.status.hl.fragmenterregex/str !--
 defined below --    
     
 after this when I search for lets say
 http://localhost:8983/solr/select?q=submithl=true
 I get these results in highlight section
 lst name=highlighting
   lst name=FP1934 / 
   lst name=FP1934-PR02 / 
   lst name=FP1934-PR03 / 
   lst name=FP0526 / 
   lst name=FP0385 / 
   /lst
 with no reference to the actual string .. this number thats
 being returned
 is the id of the records .. and is also the unique
 identifier .. why am I
 not getting the string fragments with search terms
 highlighted
 
 You need to change type of fields (title, pi, staus) from string to text
 (same as content field). 
 
 There should be a match/hit on that field in order to create highlighted
 snippets.
 
 For example q=title:submit should return documents so that snippet of
 title can be generated.
 
 FYI: You can search title, pi, status at the same time using
 http://wiki.apache.org/solr/DisMaxRequestHandler without copying all of
 them into a catch all field.
 
 
 
 
 
 
 
 

-- 
View this message in context: 
http://old.nabble.com/highlighting-fragments-EMPTY-tp27654005p27661657.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: spellcheck.build=true has no effect

2010-02-19 Thread darniz

Hello
Can someone please correct me or acknowlege me is this the correct
behaviour.

Thanksdarniz

darniz wrote:
 
 Hello All.
 After doing a lot of research i came to this conclusion please correct me
 if i am wrong.
 i noticed that if you have buildonCommit and buildOnOptimize as true in
 your spell check component, then the spell check builds whenever a commit
 or optimze happens. which is the desired behaviour and correct. 
 please read on.
 
 I am using Index based spell checker and i am copying make and model to my
 spellcheck field. i index some document and the make and model are being
 copied to spellcheck field when i commit.
 Now i stopped my solr server and 
 I added one more filed bodytype to be copied to my spellcheck field.
 i dont want to reindex data so i issued a http request to rebuild my
 spellchecker
 spellcheck=truespellcheck.build=truespellcheck.dictionary=default.
 Looks like the above command has no effect, the bodyType is not being
 copied to spellcheck field.
 
 The only time the spellcheck filed has bodyType value copied into it is
 when i have to do again reindex document and do a commmit.
 
 Is this the desired behaviour.
 Adding buildOncommit and buildOnOptimize will force the spellchecker to
 rebuild only if a commit or optimize happens
 Please let me know if there are some configurable parameters so that i can
 issue the http command rather than indexing data again and again.
 
 
 thanks
 darniz
 
 

-- 
View this message in context: 
http://old.nabble.com/spellcheck.build%3Dtrue-has-no-effect-tp27648346p27661847.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: filter result by catalog

2010-02-19 Thread Otis Gospodnetic
So, hello Kevin,

So what have you tried so far?  I see from 
http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl
 you've tried the acl field approach.
How about the bitset approach described there?


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Kevin Osborn osbo...@yahoo.com
 To: Solr solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 6:06:51 PM
 Subject: filter result by catalog
 
 So, I am looking at better ways to filter a resultset by catalog. So, I have 
 an 
 index of products. And based on the user, I want to filter the search results 
 to 
 what they are allowed to see. I will probably have up to 200 or so different 
 catalogs.



Re: Documents disappearing

2010-02-19 Thread Otis Gospodnetic
Pascal,

Look at that difference between numDocs and maxDocs.  That delta represents 
deleted docs.  Maybe there is something deleting your docs after all!

Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Pascal Dimassimo thesuper...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 3:50:26 PM
 Subject: RE: Documents disappearing
 
 
 Using LukeRequestHandler, I see:
 
 7725
 28099
 758826
 1266355690710
 false
 true
 true
 
 org.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/opt/solr/myindex/data/index
 
 
 I will copy the index to my local machine so I can open it with luke. Should
 I look for something specific?
 
 Thanks!
 
 
 ANKITBHATNAGAR wrote:
  
  Try inspecting your index with luke
  
  
  Ankit
  
  
  -Original Message-
  From: Pascal Dimassimo [mailto:thesuper...@hotmail.com] 
  Sent: Friday, February 19, 2010 2:22 PM
  To: solr-user@lucene.apache.org
  Subject: Documents disappearing
  
  
  Hi,
  
  I have encounter a situation that I can't explain. We are indexing
  documents
  that are often duplicates so we activated deduplication like this:
  
  
  class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   true
   true
   signature
   title,text
   
  name=signatureClassorg.apache.solr.update.processor.Lookup3Signature
  
  
  What I can't explain is that when I look at the documents count in the
  log,
  I see documents disappearing.
  
  11:24:23 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=0 status=0 QTime=0
  14:04:24 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=4065 status=0 QTime=10
  14:17:07 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=6499 status=0 QTime=42
  14:25:42 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7629 status=0 QTime=1
  14:47:12 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=10140 status=0 QTime=12
  15:17:22 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=10861 status=0 QTime=13
  15:47:31 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=9852 status=0 QTime=19
  16:17:42 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=8112 status=0 QTime=13
  16:38:17 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=10
  16:39:10 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=1
  16:47:40 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=46
  16:51:24 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=74
  17:02:13 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=102
  17:17:41 INFO  - [myindex] webapp=null path=null
  params={event=newSearcherq=*:*wt=dismax} hits=7725 status=0 QTime=8
  
  11:24 was the time at which Solr was started that day. Around 13:30, we
  started the indexation.
  
  At some point during the indexation, I notice that a batch a documents
  were
  resend (i.e, documents with the same id field were sent again to the
  index).
  And according to the log, NO delete was sent to Solr.
  
  I understand that if I send duplicates (either documents with the same id
  or
  with the same signature), the count of documents should stay the same. But
  how can we explain that it is lowering? What are the possible causes of
  this
  behavior?
  
  Thanks! 
  -- 
  View this message in context:
  http://old.nabble.com/Documents-disappearing-tp27659047p27659047.html
  Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
 
 -- 
 View this message in context: 
 http://old.nabble.com/Documents-disappearing-tp27659047p27660077.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: What is largest reasonable setting for ramBufferSizeMB?

2010-02-19 Thread Otis Gospodnetic
Glen may be referring to LuSql indexing with multiple threads?
Does/can DIH do that, too?


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 11:41:07 AM
 Subject: Re: What is largest reasonable setting for ramBufferSizeMB?
 
 On Fri, Feb 19, 2010 at 5:03 AM, Glen Newton wrote:
  You may consider using LuSql[1] to create the indexes, if your source
  content is in a JDBC accessible db. It is quite a bit faster than
  Solr, as it is a tool specifically created and tuned for Lucene
  indexing.
 
 Any idea why it's faster?
 AFAIK, the main purpose of DIH is indexing databases too.  If DIH is
 much slower, we should speed it up!
 
 -Yonik
 http://www.lucidimagination.com



Re: replications issue

2010-02-19 Thread Otis Gospodnetic
Hello,

You are replicating every 60 seconds?  I hope you don't have a large index with 
lots of continuous index updates on the master, as replicating every 60 
seconds, while doable, may be a bit too frequent (depending on index size, 
amount of changes, cache settings, etc.).

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: giskard gisk...@autistici.org
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 4:11:56 AM
 Subject: Re: replications issue
 
 Ciao,
 
 Uhm after some time a new index in data/index on the slave has been written
 with the ~size of the master index.
 
 the configure on both master slave is the same one on the solrReplication 
 wiki 
 page
 enable/disable master/slave in a node
 
 
   
 ${enable.master:false} 
 commit
 schema.xml,stopwords.txt
 
 
 ${enable.slave:false} 
   http://localhost:8983/solr/replication
   00:00:60
 
 
 
 When the master is started, pass in -Denable.master=true and in the slave 
 pass 
 in -Denable.slave=true. Alternately , these values can be stored in a 
 solrcore.properties file as follows
 
 #solrcore.properties in master
 enable.master=true
 enable.slave=false
 
 Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto:
 
  giskard,
  
  Is this on the master or on the slave(s)?
  Maybe you can paste your replication handler config for the master and your 
 replication handler config for the slave.
  
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
  Hadoop ecosystem search :: http://search-hadoop.com/
  
  
  
  
  
  From: giskard 
  To: solr-user@lucene.apache.org
  Sent: Thu, February 18, 2010 12:16:37 PM
  Subject: replications issue
  
  Hi all,
  
  I've setup solr replication as described in the wiki.
  
  when i start the replication a directory called index.$numebers is created 
 after a while
  it disappears and a new index.$othernumbers is created
  
  index/ remains untouched with an empty index.
  
  any clue?
  
  thank you in advance,
  Riccardo
  
  --
  ciao,
  giskard
 
 --
 ciao,
 giskard



Re: optimize is taking too much time

2010-02-19 Thread Otis Gospodnetic
Hello,

Solr will never optimize the whole index without somebody explicitly asking for 
it.
Lucene will merge index segments on the master as documents are indexed.  How 
often it does that depends on mergeFactor.

See:
http://search-lucene.com/?q=mergeFactor+segment+mergefc_project=Lucenefc_project=Solrfc_type=mail+_hash_+user


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: mklprasad mklpra...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 1:02:11 AM
 Subject: Re: optimize is taking too much time
 
 
 
 
 Jagdish Vasani-2 wrote:
  
  Hi,
  
  you should not optimize index after each insert of document.insted you
  should optimize it after inserting some good no of documents.
  because in optimize it will merge  all segments to one according to
  setting
  of lucene index.
  
  thanks,
  Jagdish
  On Fri, Feb 12, 2010 at 4:01 PM, mklprasad wrote:
  
 
  hi
  in my solr u have 1,42,45,223 records having some 50GB .
  Now when iam loading a new record and when its trying optimize the docs
  its
  taking 2 much memory and time
 
 
  can any body please tell do we have any property in solr to get rid of
  this.
 
  Thanks in advance
 
  --
  View this message in context:
  
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27561570.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
  
  
 
 Yes,
 Thanks for reply 
 i have removed the optmize() from  code. but i have a doubt ..
 1.Will  mergefactor internally do any optmization (or) we have to specify
 
 2. Even if solr initaiates optmize if i have a large data like 52GB will
 that takes huge time?
 
 Thanks,
 Prasad
 
 
 
 -- 
 View this message in context: 
 http://old.nabble.com/optimize-is-taking-too-much-time-tp27561570p27650028.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: filter result by catalog

2010-02-19 Thread Kevin Osborn
Yes I thought about both methods. The ACL method is easier, but has some 
scalability issues. We use the bitset method in another product, but there are 
some complexity and resource problems.

This is a new project so I am revisiting the issue to see if anyone had any 
better ideas.

On Fri Feb 19th, 2010 6:18 PM PST Otis Gospodnetic wrote:

So, hello Kevin,

So what have you tried so far?  I see from 
http://www.search-lucene.com/m?id=839141.906...@web81107.mail.mud.yahoo.com||acl
 you've tried the acl field approach.
How about the bitset approach described there?


Otis 
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
 From: Kevin Osborn osbo...@yahoo.com
 To: Solr solr-user@lucene.apache.org
 Sent: Fri, February 19, 2010 6:06:51 PM
 Subject: filter result by catalog
 
 So, I am looking at better ways to filter a resultset by catalog. So, I have 
 an 
 index of products. And based on the user, I want to filter the search 
 results to 
 what they are allowed to see. I will probably have up to 200 or so different 
 catalogs.